SVM-Based Geospatial Prediction of Soil Erosion Under Static and Dynamic Conditioning Factors

Land degradation caused by soil erosion remains an important global issue due to its adverse consequences on food security and environment. Geospatial prediction of erosion through susceptibility analysis is very crucial to sustainable watershed management. Previous susceptibility studies devoid of some crucial conditioning factors (CFs) termed dynamic CFs whose impacts on the accuracy have not been investigated. Thus, this study evaluates erosion susceptibility under the influence of both non-redundant static and dynamic CFs using support vector machine (SVM), remote sensing and GIS. The CFs considered include drainage density, lineament density, length-slope and soil erodibility as non-redundant static factors, and land surface temperature, soil moisture index, vegetation index and rainfall erosivity as the dynamic factors. The study implements four kernel tricks of SVM with sequential minimal optimization algorithm as a classifier for soil erosion susceptibility modeling. Using area under the curve (AUC) and Cohen’s kappa index (k) as the validation criteria, the results showed that polynomial function had the highest performance followed by linear and radial basis function. However, sigmoid SVM underperformed having the lowest AUC and k values coupled with higher classification errors. The CFs’ weights were implemented for the development of soil erosion susceptibility map. The map would assist planners and decision makers in optimal land-use planning, prevention of soil erosion and its related hazards leading to sustainable watershed management. * Corresponding author: raza.mustafa@utp.edu.my , abdulkadirts4u@gmail.com


Introduction
Soil erosion has been recognized as one of the serious geohazards in recent time that threatens soil sustainability.It is an environmental problem that degrades land, threatens agricultural productivity and hydrologic systems.It has both on-and off-site effects with severe environmental and socioeconomic impacts [1].Erosion phenomenon is often caused by human activities (e.g.agricultural intensification, urbanization, indiscriminate deforestation) and natural activities (tectonic and climatic changes) [2,3].These factors and processes vary spatially and temporally from one location to another.Hence, site-specific studies are often applied for erosion assessment [4,5].Erosion process involves complex interactions of different biophysical and anthropogenic factors such as soil properties, topography, climatic condition, land-use and its management practices [6].In recent time, soil erosion has become a serious environmental challenge in Cameron Highlands due to its terrain characteristics, urbanization, intensive agricultural activities, indiscriminate deforestation among others [3,[7][8][9][10].Erosion impacts such as deterioration of water quality [15], landslide on steep slopes [11,12], siltation of rivers and reservoirs leading to reduced hydropower generation at Ringlet dam and sometimes flooding [13] are currently being experienced [13,14].Sustainable management practices that will dampen these challenges require indices to quantify soil erosion, analyze its spatial distribution, identify critical locations and evaluate its susceptibility for geospatial prediction of active/potential erosion zones [15,16].
Susceptibility mapping through geospatial prediction evaluates the relative probability of erosion occurrence at a certain location compared to other locations under the influence of a set of conditioning factors (CFs).It classifies the watershed into zones of different degrees of susceptibility [17].A literature survey showed that most of the previous susceptibility studies used majorly static factors out of which some are redundant [17].These set of factors remain unchanged for a relatively long period of time even during different rainfall cycle.However, some dynamic factors associated with rainfall cycles were not considered.Pradhan [18] highlighted that the choice and quality of CFs, and the efficacy of the modeling techniques could influence the susceptibility accuracy.The current research considers some shortcomings in literature as per the efficiency of techniques and non-consideration of some crucial CFs.Thus, this study evaluates erosion susceptibility under the influence of both nonredundant static and dynamic CFs using support vector machine (SVM), remote sensing and GIS in Cameron Highlands.The CFs considered include drainage density (DrainDen), lineament density (LinDen), length-slope (LS-factor) and soil erodibility (k-factor) as nonredundant static factors, and land surface temperature (LST), soil moisture index (SMI), normalized difference vegetation index (NDVI) and rainfall erosivity (R-factor) as the dynamic factors.LST, SMI and R-factor are some of the scarce dynamic factors whose impacts on erosion susceptibility analysis are yet to be investigated despite their influence in triggering erosion.More so, SVM learning algorithms are rarely used in soil erosion susceptibility analysis despite its high prediction capability as confirmed in flood susceptibility assessments [19].Cameron Highlands in Pahang State, Malaysia was preferred for this study due to its complex topographic characteristics and the recent encroachment of development in the hilly zones resulting into incessant occurrence of soil erosion.

Erosion conditioning factors for susceptibility analysis
Selection of erosion CFs has been identified as the crucial starting points of susceptibility studies.The factors were selected based on their significance in triggering erosion.In accordance with Magliulo [17], non-redundant CFs were selected to avoid overweighting of https://doi.org/10.1051/matecconf/201820304004ICCOEE 2018 the model results.Hence, four static CFs (LinDen, DrainDen, LS-factor, and k-factor), and four dynamic CFs (LST, R-factor, SMI and NDVI) were considered.Desmet and Govers [20] highlighted that LS-factor is suitable for soil erosion modeling, and can capture complex topography.The most popular method proposed by Wischmeier and Smith [21] was used for evaluating LS-factor in ArcMap ® from hydrologically corrected digital elevation model (DEM) [22].Lineament is a linear or curvilinear feature in a landscape expressing an underlying geological structure such as faults.Studies have shown that LinDen has a correlation with soil erosion [23,24] and was considered as one of the CFs [16].LineDen was extracted from multispectral Landsat-8 with the use of ArcMap ® 10, ENVI 5.3 and Geomatica 2016 software.Drainden is one of the dominant static factors in soil erosion [25] and it is often considered in susceptibility mapping by many researchers [16,[26][27][28].It was derived from DEM 5m resolution using Spatial Analyst Tools of ArcMap ® 10.Soil erodibility is a critical component in predicting soil erosion [29].It was derived from soil map obtained from Department of Agriculture in conjunction with digital soil map of the world (DSMW) developed by U.S. food and agriculture organization (FAO) using William's approach.The detailed expressions and descriptions for calculating component factors can be found in Wawer, et al. [30] while the required soil properties were obtained from DSMW database.Subsequently, k-factor map was developed in ArcMap environment.
The dynamic CFs were extracted from atmospherically and radiometrically corrected Landsat 8 image.Bands 4 and 5 of Landsat 8 were used for the extraction of NDVI map while the thermal infrared bands (i.e., Bands 10 and 11) were implemented for LST and SMI.The detailed procedures for the extraction of LST, NDVI and SMI provided in Abdulkadir, et al. [31] while that of R-factor can found in Abdulkadir, et al. [32].Rainfall records from seventeen weather stations within and around Cameron Highlands were obtained from the Department of Irrigation and Drainage.This was used for the evaluation of R-factor values and development of rainfall erosivity map in ArcMap.Having prepared the CFs' maps, they were ensured to have an identical projection (UTM 47 North, WGS 84), grid sizes (30x30m) and the same numbers of columns by rows (1025x1112) with the soil erosion inventory map.The inventory map in Fig. 1a has equal number of erosion-present and erosion-absent locations.The former is coded as (1) and the latter as (0) for the implementation in SVM.The non-eroded locations were randomly selected using flat terrain as the guide.This information was used for the extraction of the corresponding values of all the CFs for onward analysis in SVM modeling technique.The erosion-present locations identified were 159 with corresponding 159 erosion-absent locations for the SVM modelling.

Support vector machine for susceptibility analysis
SVM is a supervised learning binary classifier technique that is based on statistical learning theory and structural risk minimization principle [33].The technique is relatively new and its applications with remote sensing are getting popular in the recent time.Through the generation of hyperplanes, SVM reshapes the non-linear problems into linear and simple processable one with the uses of mathematical functions known as kernel tricks.Training dataset (xi, yi) of both response (erosion present/absent) and predictor variables (i.e., CFs) were used to map original input into high dimensional feature space.The training dataset with pairs of (xi, yi) for xi є Rn, yi є (1, -1) for i = 1, 2, 3...m. x is the array of erosion CFs for susceptibility analysis.Separating hyperplane formation from training dataset is the basis for this technique [19,34].According to Marjanović, et al. [35], hyperplanes are generated in the original space of n coordinates (xi parameters in vector x) between the points of two distinct classes.The two classes represent the erosion-present and erosion-absent pixels for the training dataset.SVM was adopted to achieve optimal separating hyperplanes that can classify the training data into erosion-present and erosion-absent pixels.The nature of https://doi.org/10.1051/matecconf/201820304004ICCOEE 2018 environmental modelling like soil erosion is often non-linear due to the complexity in geology and topography among others.Thus, application of kernel tricks is highly inevitable.Four available SVM kernel functions (linear-LNR, polynomial-Poly, sigmoid-Sigmd and radial basis function-RBF) were implemented to examine their efficacy.In this study, widely used sequential minimal optimization (SMO) and library SVM classifiers were adopted for the classification training and regression.These packages can handle both linear and nonlinear SVM.The SMO is an optimization algorithm for the training that uses heuristics to partition the training problems into smaller units that can be solved analytically.
This study comprises of nine attributes (i.e., the 8 CFs and a response variable denoted as 0 and 1).With a total of three hundred and eighteen (318) instances, corresponding data values for all CFs were extracted from their geodatabase maps with aid of erosion inventory map as discussed in Section 2.1.The instances composed of erosion-present and erosionabsent cases with their corresponding values for all the CFs.The dataset was partitioned into 70% and 30% for the training and testing respectively [19,33,36] using unsupervised algorithms for data pre-processing in WEKA software.In most cases, selection of kernel parameters that will produce optimum model results is often difficult, especially when dealing with non-linear SVM.Thus, parameter optimization procedure is essential.The gridsearch technique was used to select the optimal kernel parameters for cost parameter C, degree of polynomial d and tolerance ɣ.Subsequently, 10-folds cross-validation was performed for the training of each of the four kernel functions to evaluate the success rates of the model.Then, test dataset was introduced to evaluate the prediction rates.The area under the curve (AUC), kappa index (k), success rates, prediction rates and classification errors were used to assess the performance of the model [19,36].The SVM weights for the CFs were used for developing erosion susceptibility map for Cameron Highlands.

Results and Discussion
This Section presents the discussion of the results obtained for the geospatial prediction of soil erosion in Cameron Highlands.The CFs geodatabase and inventory maps obtained are presented in Fig. 1.In the beginning, the default kernel parameters of SVM-SMO application were used for the modeling of the training and testing datasets.The results of the analysis showed a success rate (SR) of 92.48% indicating the proportion of correctly classified instances with respective mean absolute error (MAE) and root mean squared error (RMSE) of 0.0752 and 0.2743.The confusion matrix for the SVM classification produced a k value of 0.8467 showing that there was almost perfect interrater agreement.Table 1 shows the detailed typical classification accuracy for SVM-LNR containing true positive rate (TP), false positive (FP), precision, recall and AUC.The TP and FP rates provide information on the instances that are correctly and falsely classified respectively.The "precision" is the ratio of instances that are truly classified to the total instances classified as that class.Meanwhile, "recall" gives the ratio of instances classified as a given class to the actual total instances in that class (i.e., TP rate).The area under the ROC curve (i.e., AUC) for SVM analysis was obtained to be 92.5%.This indicates that SVM-LNR performed excellently in erosion susceptibility.Applying this trained model on the test dataset that has not been involved in the training yielded 87.76% correct classification which is the prediction rate (PR) for the model.The accompanied MAE and RMSE were 0.1224 and 0.3499 respectively.Similarly, the dataset was analyzed for SVM-Poly, SVM-Sigmd and SVM-RBF with default kernel parameters.Success rates, prediction rates, kappa indexes, AUCs and classification errors obtained for the training and testing dataset are presented in Table 2.The results indicated that SVM-LNR had the highest success and prediction rates followed by SVM-Poly, SVM-RBF and SVM-Sgmd.In several related susceptibility studies, AUC has been the main model validation parameter [18,[36][37][38][39][40][41].Hence, SVM-LNR had the best performance followed by SVM-Poly, SVM-RBF and SVM-Sigmd with respect to AUC and kappa index.The kernel parameters were optimized to evaluate their influences on the classification accuracy.Table 3 shows the results of the grid-search analysis to obtained optimized kernel parameters for C, d and γ.With the optimized kernel parameters, classification model results obtained are presented in Table 4.The results indicated that SVM-LNR had the highest success rate followed by SVM-Poly, SVM-RBF and SVM-Sgmd.However, success rates alone cannot justify the performance of the model as it only gives information on the performance with the training dataset.Therefore, prediction rates were examined to evaluate the models' performance on the test dataset that was not involved in the training.The SVM classification results showed that SVM-Poly had the highest prediction rates followed by SVM-LNR, SVM-RBF and SVM-Sigmd.With respect to AUC, SVM-Poly had the highest performance followed by SVM-LNR, SVM-RBF and SVM-Sigmd accordingly for training and testing datasets.This slightly conformed with the outcome of Tehrany, et al. [33] who tested the capability of different SVM kernels for flood susceptibility study.In their own study, SVM-RBF and SVM-LNR outperformed SVM-Poly and SVM-Sigmd.However, the results agreed with some literature on SVM landslide susceptibility studies especially for the works of Jebur, et al. [36], Imeson and Lavee [42] and Marjanović, et al. [35].The analysis shows that SVM-Sigmd performed woefully even with optimized kernel parameters having the lowest values of AUC and k coupled with higher classification errors.AUC values of about 0.5 obtained for the SVM-Sigmd indicate a classification by chance and this is not acceptable [19,37].Furthermore, the k value of about 2.0 indicates poor agreement between the model and the reality [43].

MATEC Web of
Due to the outstanding performance of SVM-LNR, it was therefore considered in the development of erosion susceptibility mapping.The SMO algorithm for linear kernel attributes' weights for the CFs is depicted in Equation 1.This was used to evaluate the relative importance of the CFs involved in the analysis.The result shows that all CFs were significant in triggering soil erosion and had positive influences on its occurrence with exception of NDVI that had a negative influence.Specifically, LS-factor had the largest influence followed by LinDen.The weights were used in ArcMap ® environment to develop soil erosion susceptibility map as shown in Fig. 1j.The map shows the predicted geospatial distribution of soil erosion across Cameron Highlands watershed.This map was classified  1j.The map shows that most of the locations within the watershed are under moderate category with scanty locations in a very high susceptibility category.The output map would serve as a guide for sustainable watershed management against the future occurrence of soil erosion.

Conclusion
This present study applied SVM technique for geospatial prediction of soil erosion in Cameron Highlands.The study considered some vital CFs which often neglected in many susceptibility analyses along with other non-redundant static CFs.As a result, DrainDen, LinDen, LS-and k-factor were selected as non-redundant static factors, and LST, SMI, NDVI and R-factor as dynamic factors.The prime focus of the current research was to evaluate the applicability of rarely used SVM on soil erosion susceptibility under non-redundant static and dynamic CFs.The results showed SVMs modeling techniques adopted in this study have considerably good classification performances except for SVM-Sigmd.This suggests that the watershed characteristics cannot be adequately modeled with simple techniques due to complexity, dynamism and non-linearity nature of watersheds characteristics.The CFs' weights obtained in the analysis showed that all the CFs involved in building the model were very significant in triggering soil erosion in the study area.Also, all the CFs involved in the analysis had positive relationship with erosion process except NDVI.The produced susceptibility map would assist watershed managers in sustainable optimal land-use planning to mitigate on-and off-site environmental and economic impacts of erosion.

Table 1 .
Detailed SVM classification accuracy by class for the training dataset

Table 2 .
Classification summary for all kernel functions for the training and testing MATEC Web of Conferences 203, 04004 (2018) https://doi.org/10.1051/matecconf/201820304004ICCOEE 2018 into five classes: very low, low, moderate, high and very high classes as in Fig.

Table 3 .
Optimized kernel parameters