Impact of feature selection on system identification by means of NARX-SVM

. Support Vector Machines (SVM) are widely used in many fields of science, including system identification. The selection of feature vector plays a crucial role in SVM-based model building process. In this paper, we investigate the influence of the selection of feature vector on model’s quality. We have built an SVM model with a non-linear ARX (NARX) structure. The modelled system had a SISO structure, i.e. one input signal and one output signal. The output signal was temperature, which was controlled by a Peltier module. The supply voltage of the Peltier module was the input signal. The system had a non-linear characteristic. We have evaluated the model’s quality by the fit index. The classical feature selection of SVM with NARX structure comes down to a choice of the length of the regressor vector. For SISO models, this vector is determined by two parameters: n u and n y . These parameters determine the number of past samples of input and output signals of the system used to form the vector of regressors. In the present research we have tested two methods of building the vector of regressors, one classic and one using custom regressors. The results show that the vector of regressors obtained by the classical method can be shortened while maintaining the acceptable quality of the model. By using custom regressors, the feature vector of SVM can be reduced, which means also the reduction in calculation time.


Introduction
A general definition describes a system as a set of elements separated from the environment together with their interactions.In every system, input and output signals can be distinguished.The relationship between these signals can be described mathematically and is called a model.In order to reveal the model, it is necessary to use the system identification.
There are two approaches in dynamic system modelling: first-principles modelling and data-driven modelling.The first approach is based on the laws of material, momentum and energy balances.These models are also called the white-box.The accuracy of these types of models is usually very high, but they require experience in selection of a high number of parameters.
An alternative to first-principles models are datadriven models (also called a system identification).Datadriven models are not based on a priori knowledge but on measured experimental data.This method is much more practical due to the fact that most systems (e.g.industrial processes) are too complex to be understood at a fundamental level.The most important advantages of data-driven models are as follows:  modelling possibility without in-depth knowledge of the system,  flexibility in the choice of a model's structure,  convenience of their implementation in the form of a computer code.
These advantages make the system identification an integral part of modern control engineering.
On the other hand, the most serious disadvantages of data-driven models are:  they require to conduct an identification experiment.
The experiment has to be properly planned so that the data collected enabled a full description of the analysed system,  these models are non-transparent, i.e. they do not reflect the interior physical mechanisms of the process.Due to the lack of transparency data-driven models are termed as black-box models.In the recent years, numerous scientists [1,2] suggested using the method of artificial intelligence in the black-box identification.This suggestion resulted in numerous works in which different techniques were presented, such as artificial neural networks (ANN), fuzzy-neural networks (FNN), wavelets, [3,4].Among the neural networks used for black-box identification, the most frequently reported networks are the multi-layer perceptron (MLP), fuzzyneural networks (FNN), radial basis function (RBF), Runge-Kutta (RK) method, and digital recurrent network (DRN).Few authors made use of support vector machines (SVM) [1,5] as an identification tool.Usually, the authors also ignore the rules of choosing regressors and parameters of both ANN and SVM.
In the case of the NARX-SVM black-box model, selection from regressors are the feature selection problem.The problem with the choice of the feature vector is particularly important due to the fact that not all features always carry the significant information.Although choosing the optimal set of features is a difficult task, its solution brings many benefits that manifest themselves as:  reducing the dataset to avoid curse of dimensionality,  better estimation accuracy,  application of more effective learning techniques,  it helps to improve the network generalisation capability by removing insignificant features interfering with each other.The goal of this study is to analyse how the model quality will change after modification of the SVM feature vector.The modification will be based on the use of custom regressors.

Methods for regressor selection
There are many classes of identification methods for nonlinear objects.One of them is a quite commonly used methods based on recursive input-output models, and using different types of parameterised nonlinear expansions, such as ANN and SVM.In particular, nonlinear autoregressive with exogenous input (NARX) models are used in identification.It has been proven in numerous studies [6] that non-linear objects can be modelled with high accuracy using the NARX structure.However, if the structure of the model is not known a priori, the number of regression candidates for NARX structure can be very large.There are several techniques for selecting regressor candidates for NARX-SVM models.In general, they are based on forward/backward regression techniques which progressively increase the structure of the model using appropriate indicators.The most popular ones are based on the Prediction Error Minimisation (PEM) techniques: the Forward-Regression Orthogonal Estimator (FROE) and the Fast Recursive Algorithm (FRA).An alternative to PEM methods are The Simulation Error Minimisation (SEM) methods together with its modification of The Simulation Error Minimisation with Pruning (SEMP) which increase in robustness model interference and have small requirements for the input excitation.The disadvantages of these methods are the high requirements for computational power and the time needed to obtain the results.
In our research we used the methods of regressor selection described in [5,7] together with the modifications shown in this paper.The main task of the system is to maintain the temperature of the aluminium plate within the range of 0-50°C.The Peltier module is an actuator.In order to enable a continuous control of the module, we have used a power bridge.It converts the control signal u generated by the analogue output of a DAQ board with a range of -10...+10 VDC into a PWM signal controlling the supply voltage of the Peltier module in the range of 0...15 VDC.The signal u is the system input.The power bridge also allows polarity to be changed so that the aluminium plate is heated or cooled as required.In order to enable effective cooling of the aluminium plate, we have used an air cooler to transfer the heat from the warm side of the Peltier module.The output signal of the system, denoted as y, is the temperature of the aluminium plate.

Materials and methods
The system under investigation is an example of a non-linear object.This is confirmed by the static characteristics (Fig. 2).The characteristics can be divided into two parts at the point u = 0.The static gain of the system is lower in the first part of the characteristics than in the second part.Negative values of u mean cooling the aluminium plate, while positive values mean heating.The general block diagram of the set feature vector covering both the input and output is shown in Fig. 3.The learning stage was conducted in a prediction mode, while validation was made in both prediction and simulation modes.We have optimised the SVM parameters (i.e.C and γ) according to the procedure described in detail in [7].The evaluation of model fitness to data measured performed using the fit index, calculated pursuant to the formula (a higher value means a better model quality) where:  y(t) is the measured plant output,  ˆ() yt is the simulated plant output,  y is the mean of the measured plant output.

Data collection
The collecting of data was carried out according to the procedure described previously [5].During the experiment we were collecting data according to the block diagram shown in Fig 4.
The input variable u and the output variable y of the plant have been administered and recorded using a PC with Matlab and Simulink (MathWorks) software.In the

Results
We have investigated NARX models with different number of regressors (number of lags) assuming that   1, ,30 uy nn  .It is a classical method of building the regressor vector.Fig. 6 shows the value of fit index for all investigated models.The fit index was calculated for the validation dataset in a simulation mode.

 
.Table 1 presents the value of the fit index for learning and validation datasets, while Fig. 7 shows the model response for both datasets.In the next step, we have built several models with custom regressors.Our goal was to reduce the SVM feature vector by eliminating some lags according to a couple of different scenarios.This resulted in models fromM1to M8.Model M1 was built with only odd regressors.The parameters of SVM were 1880 nSVM  , 5000 C  , 0.0001   .The feature vector consisted of 20 elements and can be expressed as ( 1) All models built with custom regressors (M1-M8) had lower value of the fit index for learning dataset in a prediction mode than the model M0.The average difference was 0.52 percentage points.
In the case of the validation dataset in the prediction mode, almost all models (except for M5) had a higher value of fit index than M0.The average difference (not including M5) was 3.48 percentage points.In the case of the validation dataset in the simulation mode, all models had a lower value of fit index than the M0 model, with an average difference of 3.98 percentage points.Considering the value of fit index for the validation dataset (in the simulation mode) the best models were the M7 and M8.

Conclusion
The goal of this study was to analyse the impact of custom regressors on the model quality in system identification.We have proven that the vector of regressors obtained by a classical method can be shortened, while maintaining the acceptable quality of the model.In the case of the two best models, i.e.M7 and M8, a significant reduction in the feature vector (from 40 to 8 elements, i.e. by 80%) resulted in a negligible decrease in the quality of the model, which was equal to 2.5% only.

Fig. 1
Fig. 1 presents the picture of the main components of the plant.The key elements that we have used in our research are:  data acquisition (DAQ) board National Instruments NI PCI-6226,  power bridge Wobit SDD187,  Peltier module TEC1-12706,  aluminium plate 54x85x6 mm,  temperature sensor Pt100,  air cooler with aluminium heat sink, six ϕ6 mm heatpipes, and 140 mm fan,  Matlab&Simulink software.

Fig. 1 .
Fig. 1.Picture of the main components of the plant.

Fig. 3 .
Fig. 3. General block diagram of the input vector and the output of the SVM in a prediction or simulation modes.

Fig. 6 .
Fig. 6.Impact of the number of lags on model quality (validation dataset in the simulation mode).The best model (red circle), denoted as M0, had number of lags equal to 20

Table 1 .Fig. 7 .
Fig. 7.Output of the plant, as well as the corresponding simulated outputs of the model (simulation mode).

Table 2 .
Model M3 was built with every second even regressors created from signal u and every second odd regressors created from signal y.Value of the fit index for M1-M8 models.