Evaluation of low degree polynomial kernel support vector machines for modelling Pore-water pressure responses

Pore-water pressure (PWP) is influenced by climatic changes, especially rainfall. These changes may affect the stability of, particularly unsaturated slopes. Thus monitoring the changes in PWP resulting from climatic factors has become an important part of effective slope management. However, this monitoring requires field instrumentation program, which is resource and labour expensive. Recently, soft computing modelling has become an alternative. Low degree polynomial kernel support vector machine (SVM) was evaluated in modelling the PWP changes. The developed model used pore-water pressure and rainfall data collected from an instrumented slope. Wrapper technique was used to select input features and k-fold cross validation was used to calibrate the model parameters. The developed model showed great promise in modelling the pore-water pressure changes. High correlation, with coefficient of determination of 0.9694 between the predicted and observed changes was obtained. The one degree polynomial SVM model yielded competitive result, and can be used to provide lead time records of PWP which can aid in better slope management.


Introduction
Climatic factors, such as rainfall and evaporation play an important role in pore-water pressure (PWP) changes, which in turn is vital to slope studies.In tropical region, climatic conditions are typically characterized by variable rainfall, frequent temperature variations and high evapotranspiration.Changes in climatic factors are mostly accompanied by changes in PWP, and sometimes leading to excessively high levels of PWP.High levels or positive PWP levels may lead to failure in slopes [1,2].Because of such changes, it is important to monitor PWP changes, particularly to rainfall.Damiano and Mercogliano [3] recommended that for effective monitoring of hydrological behaviour of slope, field insitu records of PWP amongst other factors should be monitored.Evidently the knowledge of PWP will aid in making objective judgment with regard to slope management.
Pore-water pressure monitoring typically entails field instrumentation of the slope in question [4,5].This field instrumentation is expensive, time consuming and requires expertise.Modelling of PWP responses can provide an alternative to the instrumentation.Although it will not fully replace the instrumentation but it can provide an enhancement in that few lead time responses can be obtained.PWP modelling has been carried out using several variations of artificial neural networks (ANN) [6][7][8][9][10].Albeit the ANN models boosted a remarkable correlation between observed and predicted records of PWP, ANN algorithm has some flaws that makes it a less preferable modelling algorithm these days.ANN is comparatively more complex, can get stuck in local optima and prone to over-fitting during training.The support vector machines (SVM), unlike ANN, is less complex as it has a simple geometric interpretation and its solution is always global [11].SVM has been shown to outperform ANN in many engineering fields, as evident in some studies [12][13][14].
One of the most challenging aspects to the use of SVM is deciding which kernel to use.Different kernels suit different problems.In most cases literature do point the way as to which kernel to use, seldom, this may not be the case, as SVM usage is lacking in some studies.SVM was used by Lamorski et al. [15] in developing a point pedotransfer function, they compared a radial basis function (RBF) kernel and a linear kernel, and found the RBF more suitable to their data.Zhu et al. [16] found the sigmoid kernel most suitable for infrared data.The polynomial kernel has the advantage of behaving as two different kernels in one (polynomial and linear), since at a point during parameter optimization, it can assume the form of linear kernel, if the right parameter search space is selected.Furthermore a low degree polynomial has the advantage of fast training, while preserving the nonlinear capabilities of a nonlinear kernel.This study therefore aims to evaluate the capability of SVM with low degree polynomial kernel in modelling the nonlinear responses of PWP to rainfall.

Support Vector Machines Theory
Support vector machines has established itself as a robust method of modelling in the field of machine learning.Unless one has a valid reason for using another method, SVM should be the first technique to be employed.
Support vector machines, developed by Vapnik [17], originates from the frame work of statistical learning theory.Originally applied to classification problems where outputs are hard limited and then later extended to regression problems, and termed as support vector regression (SVR).In SVR, one seek an estimator of some real valued targets.Kecman [18] explain that the problem may be formulated as below.Given a training data sets : 1  , where input x are n dimensional vector, and y are real valued ( ) responses.The SVR function that represents the optimal separating hyper-plane may be introduced via linear regression, as shown in Eq. 1.
( ) With w, being a weight vector and b a bias.Typically a loss function to measure the approximation error is required.Vapnik introduced (Eq.2) a loss function of the form; This function is insensitive to errors within a certain ε zone, a zone within which all predicted errors are tolerable and are considered as 0. The objective of the SVM algorithm is to find an estimator of the real valued targets, by minimizing the empirical risk and at the same time maximizing the width of the insensitive zone 2 w .This is formulated as Eq. 3, under the constraints in Eq. 4.
( ) The ξ*, ξ are slack variables that indicates distance of miss classified points from the insensitive zone, C, is a regularization parameter that needs to be tuned, along with ε.The optimization problem in Eq. 3 is solved first by forming a primal variable Langrangian.This is then readily solved in the dual space.A standard optimization algorithm is used to solve the resulting Hessian matrix.
Ultimately, learning results in the Langrange multipliers α, and the number of free α is equal to number of support vectors (SV).The obtained α values are used to find the optimal weights w and optimal bias b, which are in turn used to determine the optimum separating hyper-plane or estimator of Eq. 1.
For nonlinear regression problems, where no linear separating plane exist.Inputs are mapped to higher dimensional space.This is done by use of kernel functions.Kernel maps input features from a given dimension to a higher dimensional feature space.With this, nonlinearly separable problems can be readily solved as linear problems in high dimensional space.Basic kernel functions include, the Linear, Polynomial, Sigmoid and Gaussian.Only basic explanation of SVM formulation is offered here.Kecman [18] provides, detailed explanation on SVM formulations.

Polynomial Kernel
Polynomial kernel, given in Eq. 5, is less frequently used, compared to say RBF kernel.It generally does not give better accuracy.However, low degree polynomial (d ≤ 3) trained and tested in similar fashion as RBF perform only slightly less, but their training and testing time is much faster, and they can also be trained using linear SVM methods [19].Additionally, polynomial kernels are found to perform better in natural language predictions [20].

( ) (
) Where, γ, r and d, are kernel parameters that need to be tuned.

Field data
The data used was part of the data from a slope instrumentation program, setup to monitor the PWP responses in a slope.The slope is located in Universiti Teknologi PETRONAS (UTP) in Malaysia.The slope was instrumented with tensiometers (fitted with transducers and data logger) and rain gauge.Part of the data from the crest at 0.6 meter depth was used for this study.Its properties are shown in Table 1.The highest water table around the area was found to be about 1.4 meters below the ground level (slope toe), as monitored from four monitoring wells sited around the slope area.Results from undisturbed sample of the slope material, indicated that the slope soil material is sandy clay in the top one meter, to clay from about two meters.

Input Features Determination
The selection of model representative attributes constitutes great challenge in soft computing modelling.Selecting the optimum number of attributes or input features require a good knowledge of the system to be modelled.The knowledge of the physical system of PWP has been documented in literatures.Rahardjo et al. [5] showed that five antecedent records of rainfall are needed to model the PWP responses to rainfall.Mustafa et al. [10], showed that five antecedent PWP records will be needed to develop a good model of PWP.A combination of both of these input features is considered herein.Therefore five antecedent records of each of PWP and rainfall were considered for analysis and determination of the needed features.However, only a subset of these may be relevant.
The state of the art input feature selection involves the use of either, the filter method or wrapper method and to a lesser extent, the embedded method, which has comparatively limited applications [21,22].Filter method aims to find the relevant features, outside the framework the algorithm, by evaluating the relevance of the features using a statistical measure.The wrapper method aims to evaluate prediction power of the features within a chosen algorithm.Although filter method is computationally cheaper, it provides a broad selection of features, which are not specific to any algorithm.On the other hand, the wrapper method yields features which are algorithm and data specific.In practice this is the most ideally sought option [23].

Wrapper method of Input determination
The wrapper method requires the selection of a search algorithm and evaluation function.The search algorithm explores the feature search space, using operators tailored to the search engine, which significantly aid in how the search space is manipulated by the search algorithm.The evaluation function, employs the use of a learning scheme, which generally happens to be the algorithm, which the features are to be used within to develop the model in question.A learning scheme is used as the feature evaluation algorithm.By using a learning scheme, K-fold cross validation (CV) may be used to estimate the accuracy of the learning scheme for any selected set of sub features, this way the selected sub features can be evaluated.An evaluation measure such as root mean squared error (RMSE), coefficient of determination (R 2 ), mean absolute error (MAE), etc. are used as accuracy measure of the CV.In essence this serves as the objective function of the search engine.
Several search algorithms, such as evolutionary algorithm (EA), greedy search etc. may be used as the search engine with its objective set as the performance of the evaluation function.The search engine is tasked with exploring the feature search space.Detailed use of the wrapper method can be found in the literature of [23].Chandrashekar and Sahin [21] explained some search methods as heuristic search algorithms, methods such as that of genetic algorithm (GA) used as a search engine for feature subsets selection, since these methods do not follow the usual methods of forward or backward selection.They went on to explain that the chromosomes of the GA can be used to represent if a feature is selected or not.With the objective function being set as the performance of the prediction algorithm, which uses the selected chromosomes as input features.In this study such method will be used with GA being employed as the search algorithm.The GA uses the result to determine which amongst the selected feature subsets has less relevance to the predictor algorithm.Upon that the GA creates a better subset by manipulation within it, aided by the following operators: single point crossover, bit-flip mutation, generational replacement with elitism.The wrapper attribute selection will be implemented using Waikato environment for knowledge analysis (WEKA) [24].

SVM model Development
Basic SVM model development may be implemented with four basic steps as follows; • Identifying the kernel to be used in the model.Although, many studies choose to use the RBF on the basis of being the most common.It is wise to compare the performance of few kernels or consult literature on similar data.
• Identify the kernel parameters to be optimized in addition to the SVM meta-parameters (C, ε).Use a suitable optimization technique, such as the GA, Particle swamp optimisation PSO), grid search etc. to find the optimum of the meta-parameters.
• Use the optimum parameters to train the model on part of the data (training data).
• Test the model on unseen data sets (test set), to evaluate its generalization performance.Appropriate accuracy measures are used for the evaluation.The SVM within the wrapper algorithm, is implemented using WEKA's library SVM [25] and, model development is implemented using library for support vector machines (LIBSVM) [26] 4 Results and Discussions

Data Analysis
The wrapper method was employed in this study to determine the optimal subset of input features.Evolutionary algorithm was used as the search engine in the wrapper method.The feature space with 11 sets of features (chromosomes) was exploited through manipulation within the GA to select subsets.SVM was used as the evaluation function, with the data evaluated over fivefold grid search.The performance of the CV was evaluated using root mean squared error (RMSE), and is shown in Table 2 Where U represents PWP, and r is rainfall.The subscript indexes, are lag positions of the variables.E.g.U t-1 represents one antecedent (30 minutes lag) PWP.
The RMSE of best feature combinations (Chromosome combination-CC), that appears in the fivefold CV showed lowest values for combination of two, four, seven and nine.Thus the result of the wrapper input determination, showed Eq. 6 as an excellent set of parameters for modelling PWP.
( ) Although other feature combination yielded similar result to this 3 features, however, they have more number of features and that makes them a less preferable choice for the model build.Fewer features with the same fitness level will ensure a simpler, less complex model.

SVM Calibration and Testing
The training data was scaled between 0 and 1. Base on the objective of the study, the polynomial kernel was selected.The input features obtained from the wrapper input determination were used as the input for the model development.Grid search with fivefold CV was used to search the optimum model meta-parameters.The grid search was carried out in three stages.First, a coarse grid search, with the following range 1 and 0 was carried out, and then followed by two steps of finer grid search.Table 3 shows the obtained meta-parameters for the model.The combination of parameters from Table 3, gave the best of results during a fivefold CV.
The calibrated parameters from the grid search were used to train the SVM model, which was then tested on the test set.The result of the calibration and testing is shown in Table 3.The results presented in Table 3 are results of analysis carried out using scaled data sets.The polynomial kernel yielded very good results as can be seen from the performance measures in Table 3.A slightly better performance was obtained during the test stages, compared to results from the calibration stages.This can be attributed to data variability.The variance of the data in the test stages is higher than that in the calibration stage, making the training data much smoother, and thus better result in the calibration stage.This is consistent with effects of data variability [27].Thus the generalization ability of the Model was tested, and the estimation errors are shown in Table 3.The estimation errors obtained are low, and very close to the approximation errors from the fivefold CV.The generalization performance of the model was further brought to light in the individual event based Model predictions, shown in the scatter plot of Figure 2. The result obtained has an R 2 value of 0.9694, indicating not only a strong agreement, but near perfect correlation.This is an indication of good generalization ability of the Model.
The relationship between the observed and predicted values, is again shown in a better picture, in the time series plot of the two events (shown in Figure 3).The Model mimicked the reality, the complex trend, including several points where a rainfall event has pushed the PWP to high values.Rainfall events push PWP to extreme values, and most often, these are the points that tend to be difficult predict.
Generally in time series modelling, predicted values tend to lag the observed, such lag was not observed here, indicating the robustness of the Model.Nevertheless, some points were difficult to predict, and are unfortunately mis-estimated.Some of these points have a sudden unprecedented high rise in negative PWP.These are considered as outliers, as there was no prior event that could trigger such rise.However, no analysis of outlier is offered in this study.The most pronounced of such points are clearly shown in Figure 2.They have deviated the most from the model fit line, indicating less correlation between the Model predictions and observed records.
Support vector machines with polynomial kernel, has mimicked the nonlinear responses of PWP to rainfall events quite well as indicated by some measures explained above.The results are competitive even in comparison to the neural network model trained with eight input features and nine months of data [6].

Conclusions
A simple one degree polynomial kernel, has performed well in predicting the complex nonlinear behaviour of PWP, with high correlation recorded between observed and predicted records.This is aided with the excellent choice of input features.As it aided in obtaining a comparatively simple model (three inputs features), and that is what the low degree polynomial is all about simplicity.The low degree Polynomial kernel SVM model, thus can be readily employed to model PWP responses.However, this by no means constitute polynomial kernel as the best kernel to be used for PWP modelling, as this study did not compare results of the polynomial Model with other kernels.In fact a linear kernel Model, could perform nearly as good, since a linear kernel is equivalent to a polynomial with r =0, d =1, and γ =1.In the calibration of this Model only γ was different (0.03).Therefore the developed model is quite close to a linear kernel model.Thus there is the need to check other basic kernels to ensure the best choice.
In addition to its proven advantages, such as fast implementation and training, the result of the developed Model are competitive and the low degree polynomial kernel SVM, with just one degree can serve as an excellent option in modelling PWP responses.

, 9 Figure 1 .Figure 2 .
Figure 1.Scatter plot of the fivefold Cross validation and observed PWP, with Line of fit, indicating model fitness to observed data

Table 2 .
Root mean squared error (RMSE) of cross validation results, using different chromosome combinations-as implement with the wrapper algorithm.

Table 3
Performance metrics (for calibration and testing) of the developed Polynomial kernel ModelFigure 1. Shows a scatter plot from the calibration of the model.The fivefold CV calibration result was compared with the observed training result.The accuracy measure obtained was good, with R 2 of 0.9845, and low values of MAE and RMSE as well.The model fit line indicates the best fit attained by the model (not the perfect fit line which indicates, perfect agreement), since unlike the perfect fit line, the model fit shows the correlation extent, and what is important is to have all points on the line.The calibration results shows how good the developed Model did in data training.However, in machine learning, a model may yield good approximation result, during training but may fail to yield even close to that result on test data.What is vital is for the model to perform well on unseen data, and that is the generalization power of the model.