Streambank Erosion Prediction for Natural Channel using Artificial Neural Network Autoregressive Exogenous (ANNARX) Model

This study aims to develop a streambank erosion prediction model using Artificial Neural Network Autoregressive Exogenous (ANNARX) for natural channels. ANNARX is one type of ANN models and it is a supervised network that trains spasmodic data sets. Field data of 494 data extracted from two (2) rivers in Selangor, namely Sg. Bernam and Sg. Lui were used in the training and testing phases. Total of eleven (11) independent variables are used as input variables in the input layer and the ratio between erosion rates, ? to the near-bank velocity, Ub as the output variable. The functional relationships were derived using Buckingham Pi Theorem in the dimensional analysis. A supervised learning technique was employed and the target output is streambank erosion rates, ?b. The established models were validated to assess their performances in predicting the rates of streambank erosion using 176 data. Validation of the newly developed streambank erosion rates equation has been conducted using data obtained from this study. The performance of the derived model was tested using discrepancy ratio and graphical analysis. Discrepancy ratio (DR) is the ratio of predicted values to the measured values and these values are deemed accurate if the data lie between 0.5 to 2.0 limit. Total of 8 models have been developed in the predictive model. Analysis confirmed that models developed using ANNARX are capable to achieve coefficient correlations (r-squared) values above 0.9 and successfully predict the measured data at accuracy above 90%.


Introduction
The studies on streambank erosion prediction have evolved for the past 30 years and attempts to develop streambank erosion equations have taken place since then. However, the predictive equation to quantify streambank erosion rates is still lacking. One of the earliest approaches addressing streambank erosion of alluvial channel on the rate of erosion has been conducted by [16]. An equation to predict the channel shift has been derived using a simple two-dimensional (2D) shallow water flow model. The rate of streambank erosion is taken to be proportionally to the excess of near-bank depth-averaged streamwise flow velocity over the cross-sectional mean velocity. Recent studies however shown the contribution of streambank erosion is significant towards the evolution of river and floodplain morphology. It is vital since the study of streambank erosion help to quantify the rate of erosion due to fluvial entrainment or bank stability. A study conducted by [13] successfully derived an analytical solution for calculating the rate of erosion that integrates both basal erosion and bank failure processes. It includes the effects of hydraulic force, bank geometry, bank materials properties and probability of bank failure, with minimum physical characteristics of the bank properties. Knowing the effect of such factors to the contribution of the rate of bank erosion, knowledge of the rates, patterns and controls on river bank erosion events is a prerequisite to a complete understanding of the fluvial system.
Artificial Neural Network (ANN) was developed as intelligent and information storage models and their parameters are calculated in a manner that resembles natural process. An important advantage using ANN models compared to stochastic models is that they do not require the data to be distributed normally. A nonstationary effect present with various uncertainties, in morphological changes in stream and others can be captured and processed by the inner structure of ANN. ANN can be used as a useful tool to provide hydraulic and environmental engineers with sufficient details for design purposes and management practices [2]. Recently this approach has been accepted by many fields of science and engineering in several fields of applications, such as in the study of [2, 4, 5, 11, and 35]. Work of previous researchers show ANN application related to sediment transport study [2], prediction of discharge characteristics [5,35], and predicting watershed classification [11]. The evidence of previous work in relation of streambank erosion prediction using ANN is still lacking. [2] presents a new sediment transport model using general regression neural networks (GRNN) that are applicable for both natural and man-made channels. GRNN is a supervised network that trains quickly sparse data sets. Field data (499 data) extracted from rivers in Selangor, Perak and Kedah are used in the training and testing phases. The model is further tested using hydraulics and sediment data from river in the United Stated namely Sacramento, Atchafalaya, Colorado, Mississippi, Middle Loup, Mountain Creek, Niobrara, Saskatchewan, Oak Creek, Red, Rio Grande rivers and Chop Irrigation Canal. Four independent variables, namely, relative roughness on the bed (R/d50), ratio of shear velocity and fall velocity (U*/ws), ratio of shear velocity to average velocity (U*/V), and Froude number (V/√gy) are used as inputs and the total sediment load, QT as the output variable. The proposed GRNN model had accurately predicted 89% of the river data sets (local and foreign rivers) with 90% of the predicted values lie in the discrepancy ratio of 0.5 -2.0. The results of analysis (both physical and graphical) have indicated that the proposed sediment transport model predicts more accurately sediment transport for both local and foreign rivers. [3] applied ANNARX neural network in the 5 hours flood prediction model in the case study of Kuala Lumpur. The study proposed flood prediction model to overcome the nonlinearity problem and proposed an advanced neural network technique for the prediction of flood water level 5 hours in advance. The input and output parameters used in this model are based on real-time data obtained from Department of Irrigation and Drainage (DID) of Malaysia. Results showed that the improve ANNARX model successfully predicted flood water 5 hours ahead of time and significant improvement can be observed from the original ANNARX. model. The performance of the improved ANNARX neural network model was at 89.82% accuracy. Most of the previous contributions related to ANN were employed in the analysis of sediment transport studies, hydrology and flood studies, rainfall/runoff studies, and reservoir inflows. There were no analysis been conducted prior to streambank erosion prediction using ANN. Based on the limited streambank analyses using ANN model, this paper aims at proposing a new streambank erosion rates model for natural channels. The artificial neural network autoregressive exogenous (ANNARX) algorithm used in the analyses provides pioneer findings of ANN analysis with regards to streambank erosion field studies. The parameters selected in the ANNARX will be a benchmark to future studies in streambank erosion.

Artificial Neural Network Autoregressive Exogenous (ANNARX)
Artificial Neural Network Autoregressive Exogenous (ANNARX) is a combination model of Artificial Neural Network (ANN) embedded in the Autoregressive Exogenous (ARX) system. The ANN function is built in the NARX where it produces a predictive model using training and testing in a NARX time-based system. It is an intelligent mathematical model consisting of a highly connected structure similar to brain neurons. It is made up of a number of neurons arranged in different layers, an input layer, and output layer and one or more hidden layers. The input layer receives and process input signals and send output signals to other neurons in the network. Each neuron can be connected to other neuron and has an activation function and a threshold function which can be continuous, linear or nonlinear functions. The signal passing through a neuron is transformed by weights and this modifies the function and the output signal. As the process continues, modifying the weights will modify the output. Once the architecture of the network is defined, weights are calculated to represent desired output through a learning process where ANN is trained to obtain expected results. The types of ANN network architecture is differ in terms of the learning process and training strategy. The classes of network architecture include single-layer feed forward network, multi-layer feed forward network, and recurrent network. The applications of NARX network are wide ranging and these include their uses as predictive model as well as for nonlinear filtering of which the target output is a noise-free version on the input signal. The NARX network can also be used in modeling nonlinear dynamic system.

Selections of streambank erosion variables
The first step is identifying selected dimensional variables influencing the streambank erosion rates. Based on the physical considerations on river bank erosion processes, the primary factors governing the rate of river bank erosion (ξ) along a channel can be divided into five (5) major categories; bank geometry, hydraulic, soil capacity (resistance to erosion), grain resistance and other factors. The parameters for bank geometry consist of channel width, B, water depth, D, bank height, hb, bank angle, β, and channel slope, S o .
The parameters for hydraulic consist of streambank erosion rate, ξ, streambank near-bank velocity, u b , and boundary shear stress, τ o . The parameters for soil capacity (resistance to erosion) include critical shear stress, τ c , porosity, p, and plasticity index, PI. The grain resistance includes mean particle diameter, d 50 , fall velocity, ɷ, shear velocity, U * , and concentration of suspended load to equilibrium suspended concentration, C. Other variables include gravity acceleration, g, water density, ρ w , streambank particle density, ρ s . To predict the rate of streambank erosion along a channel, it is necessary to collect data from fieldwork investigation with regards to the controlling parameters in streambank erosion. Due to the measurement limitations, two parameters, namely, plasticity index (PI) and porosity (p) have been be removed for the model. Two models have been derived by Buckingham Pi Theorem method using  Table 1 shows the distribution of streambank monitoring data.

Sensitivity analysis
Sensitivity analysis applied to the derived functional relationship (Equation 1) generated from dimensional analysis to check on the reliability of the data proposed in the model development. Each independent variable was plotted against dependent variable to evaluate the relationship and the significance of streambank erosion rates, ξ with to the independent variables. The significance between the dependent variable to the independent variables can be measured using the value of coefficient correlation (r-squared) for the proposed mathematical functions. Table 2 shows the results of the sensitivity analysis between independent variables to the dependent variables. From the scatter plots, only five variables gave high correlation to the dependent variable. The highest correlation obtained from ratio of fall velocity to the near-bank velocity, with r-squared value of 0.41 using power function. Total of 11 scatter plots produced in the sensitivity analysis between each independent variable to the dependent variable. The ratio of critical shear stress to the near-bank velocity yields correlation at 0.399, followed by the ratio of mean particle size to the nearbank velocity yields correlation at 0.390 of r-squared.
It can be evidenced that the ratio of fall velocity increases linearly with regards to the dimensionless erosion rates. The value of erosion rates, ranges between 1.42 x 10 -8 to 1.31 x 10 -6 and the values of independent variables were confined from 0.12 to 1.12. This trend of variables indicated high significant value of coefficient correlation (r-squared) which is equal to 0.412. Other independent variables gave significant results with of rsquared ranging from 0.3 -0.4. Fig. 1 shows an example of plot between independent variable to the dependent variable. Table 2. Results of sensitivity analysis of each independent variables.

Development of streambank erosion model using ANNARX
A typical multi-layer ANN with multi hidden layers was used in this study. A set of data was first fed directly in the ANN network through the input neurons and subsequently the multi-unit perceptron produces the predicted results in the output layer. The hidden unit plays an important role in determining good correlation of results. A total of 398 data has been used in the ANN analysis. A number of 278 data were used for the training of the network, and the remaining 120 was used for the validation and testing. Fig. 3 shows the proposed multi-layer in the network. During this process, random weights are applied to the inputs. As the iteration progress, network will firstly give a high value of errors, and then the hidden units are assigned to adjust another sets of the weight obtained.
The hidden units are assigned to adjust another set of weights to obtain a desirable output. If these processes continue with stopping, the weights will be adjusted in a manner that the ANN network will start memorizing the data. Therefore, it is important to assign early stopping to ensure that the iteration does not progress fully. Fig. 4 shows the sample of mean squared error plot against the epoch representing the weight adjustment in reducing the errors.
Early stopping indicated that the condition assigning the model to stop analyzing and assigning another sets of data (validation data). ANN will then identify these new sets of data as unusual values and would not be able to generalize or adapt the new values. At this point, the value of errors starts increasing. This observation was further supported by the prediction plots, which reveals that the residuals are of small magnitude. However, there are also other important considerations to test the randomness of the residuals, such as the histogram test [1].
Total of 8 models generated from the derived functional relationship. Number of hidden units was adjusted to training data set and testing data set to achieve high accuracy model.
The accuracy of the developed model was evaluated using degree of determination (r-squared). Fig.  5 shows the prediction plot for model 1 with 10 hidden units. The model predicted 90.7% (r-squared of 0.902) accuracy. It can be evidenced that the prediction trend of the erosion rates plot resembled the measured values assigned in the model. Table 3 shows the result of all developed model using ANNARX method for all 8 models. For model 1, the best prediction was obtained using hidden unit of 10, with r-squared value of 0.907, total of 90.7% prediction accuracy. All 8 models were tested with various numbers of hidden units, 10,12,14,15,20,23,25 and 40. The test was limit to 40 hidden units due to increasing numbers of hidden units, the model predicted undesirable results. 3 models produced accuracy more than 80%. For model 2, the best prediction obtained with 12 numbers of hidden units. The model predicted 81.2% accuracy (r-squared of 0.812). Model 5 predicted 82.3% of accuracy, and model 7 generated with 81.9% of accuracy.

Evaluations of the developed equations
The accuracy of the developed models are measured based on two (2) elements, (i) discrepancy ratio, and (ii) graphical analysis method. Discrepancy ratio (D.R.) is the ratio of the predicted values to the measured values. These values are deemed accurate if the data lie within 0.5 -2.0 limit [2]. It is a measure of the accuracy of the developed model, if all data are scattered around the line of perfect agreement, therefore the model predicted the measured data perfectly. Visual analysis through graphical representation is used to confirm the trends and accuracy of the predicted models. The data distribution for individual derived empirical models are plotted to facilitate the graphical analysis.
In the model performance, all models exhibit accuracy above 60%. Model 1 yields the highest accuracy at 90.8%. The performance of model 7 yields accuracy at 74.4 %, with 25 hidden units. Both model 2 and 5 yield performance at 73.5% accuracy. The models developed proven to have significant relationship between the dependent parameter against the independent parameters in accordance to the results from the sensitivity analysis.
The second element evaluates the analysis on graphical plots. Based on the plot of predicted data against measured data, the trend of the data distribution for each developed relationship is assessed by noting the values that lies within the line of perfect agreement.
The summary of all the developed models for all derived models using artificial neural network are shown in Table 4. The validation for ANNARX model with 10 hidden units (Model 1) gave 90.8% accuracy in terms of discrepancy ratio.
The graphical analysis also confirmed that model 1 illustrates better performance compared to other models as shown in Fig. 6. Model 1 yields very good prediction between the predicted and measured values, all data are congruous and confined within the D.R. limit of 0.5 -2.0. Based on these criteria, model 1 using 10 hidden units yielded the best predictor embedding all eleven (11) non-dimensional parameters in the developed model.

Conclusions
The proposed streambank erosion rates models which use ANNARX had shown a very good prediction to the measured data. Buckingham Pi Theorem was used to derive the universal (dimensionless) functional relationship in quantifying streambank erosion rates. Total of 8 models consist of 11 independent variables have been derived. Analysis confirmed that for Model 1, using 10 hidden unit served the best predictor compared to other models, with r-squared value of 0.907 (90.7% accuracy) in quantifying the streambank erosion rates.
The validation for ANNARX model 1 with 10 hidden units gave 90.8% accuracy in terms of discrepancy ratio. The graphical analysis also confirmed that model 1 illustrates better performance compared to other models. It can be concluded that the streambank erosion rates prediction models were successfully developed with some limitations, especially on the measured data during the fieldwork investigation. Streambank erosion rates data for this study were obtained from two (2) rivers namely, Sg. Bernam, Hulu Selangor and Sg. Lui, Hulu Langat. Both rivers are located in the states of Selangor.
Further fieldwork investigation is needed to obtain data from rivers around Malaysia having various streambank characteristics. In addition, this technique required the dataset from a constant stream over a specified time intervals. One of the significant factors of obtaining a desirable prediction outcome is from the hidden units values. Combination of various hidden units is needed to acquire the best prediction model. Nevertheless, the streambank erosion rates prediction models developed in this study would be regarded as a valuable tool and good guidance for supporting streambank monitoring at erosion susceptible areas.