Application of Grey Prediction and BP Neural Network in Hydrologic Prediction

Based on the data of annual runoff at Boluo Hydrologic Station in the Dongjiang River of Guangdong Province from 1954 to 2010, this paper establishes a prediction model of an annual runoff through three methods, that is, the regression analysis, the grey theory and the neural network. The prediction model which is established by the regression analysis method has passed F-statistical test (P=0.05), but the relative error of predicted value at 30% of the data point is over ±30%, and its prediction precision is general. The precision of the residual prediction model of an annual runoff GM (1, 1) that established based on the grey theory is obviously better than that of the former one; the relative error of predicted value at only 10% of the data point is over ±30%; the Nash statistical coefficient (NS) of predicted value and measured value is 0.627, and the correlation coefficient (R) is 0.774. For the prediction model of BP neural network, the relative error at about 5% of the data point is over ±30%, and NS=0.66, R=0.853. In general, the precision and the reliability of the neural network prediction model of an annual runoff are the best.


INTRODUCTION
The hydrologic prediction can provide guidance for the rational development and utilization of water resources and the construction and operation of water conservancy project.The hydrologic prediction is affected by many factors and there is an interaction between factors, so it is difficult to describe mechanism of action accurately and quantitatively, and thus it results in many difficulties in hydrologic prediction.
The regression analysis method based on random theory is more common in hydrologic prediction.Such method has a clear concept and simple operation, but with a disadvantage of weaker high-dimensional nonlinear mapping capacity (Jing Ding and Yuren Deng, 1988;Biao Jin et al. 2009).Liuzhu Zhang et al. (2006) have improved the traditional hydrologic prediction model based on the regression analysis, and proposed a hydrologic prediction model of regression analysis weighted by an observation error.Qingguo Li and Shouyu Chen (2005) have established a hydrologic prediction model of regression analysis based on a support vector machine.
Artificial Neural Network (ANN) has a stronger self-organizing, self-learning and distributed storage capacity which is widely used in many fields of study (Jun Huang, et al, 2011;Wang et al. 2013.).Compared with the traditional regression analysis method, ANN has its unique advantages in the hydrologic prediction (Tiesong Hu, et al. 1995; Guodong Liu and Jing Ding, 1999).Cunjun Li, et al, (2007), Baoming Jin (2010) and Dongwen Cui (2013) establish a prediction model of daily runoff and peak discharge based on BP neural network with an ideal prediction effect.
The hydrologic sequence is characterized by coexisting contingency and regularity.Based on the grey prediction model and theory, "accumulation" or "inverse accumulation" of the original hydrologic sequence can effectively counteract the sequence mutation point and contingency (Deng, 1982), and overcome disadvantages of low prediction precision due to relatively short length of sequence which has significant advantages in hydrologic prediction.Yi Liu and Ping Zhang (1992), Wenming Wang, et al. (2007) establish a prediction model of river runoff based on grey theory, with better prediction precision.
Based on the data of an annual runoff at Boluo Hydrologic Station in the Dongjiang River of Guangdong Province from 1954 to 2010, this paper establishes a prediction model of an annual runoff at the hydrologic station through above three methods, and compares it with the prediction precision of three methods.The results can provide reference for the hydrologic prediction of river runoff.This paper adopts the root-mean-square error between the predicted value and measured value and the correlation coefficient (R 2 ) and Nash statistical coefficient as model evaluation indicators (NS) (Nash and Sutcliffe, 1970).It is as follows: Where X sim and X true respectively represent the predicted value and the true value of annual runoff; X mena is an average value of the difference between X sim and X true .

Prediction model of regression analysis model
The curve fitting of discrete data refers to a set of observed value (t i ,y i ), giving a function y=u(t) (where: i = 0, 1, 2, ..., m) and selecting a set of simple function Φ k (t) (where k = i = 0, 1, 2, ..., n) as a primary function.By determining an unknown parameter X k of the model (2), f (t) is closer to the observed value (t i ,y i ) in general: (2) In determining the unknown parameter X k of the above model, the least square method is most commonly used.That is, the selection of the function f (t) can make ∑(y i -y mean ) 2 become minimum.In the hydrologic data sequence, if the observed value of flow in a certain year is y, y will constantly change with the year x.Assuming that y=a 0 +a 1 x 1 +a 2 x 2 +…+a m x m , the coefficient in the Formula can be calculated by the use of the least square method.The coefficient matrix Formula can be expressed as follows: Where , n is the observation frequency of flow in the hydrologic data; m is the degree of polynomial.

Establishment of regression analysis prediction model
The prediction model is established by using a total of 42 sets of data of annual runoff at Boluo Hydrologic Station from 1954 to 2010.Trial calculation shows that, when m = 2, the established regression prediction model fails to pass the F-statistical test (P = 0.05) with relatively large prediction error.When m = 3, the regression analysis model is better and it passes F-statistical test.That is, F=6.21>F 0.05 =4.08.It indicates that, when m = 3, the established runoff prediction model of regression analysis is reliable and effective with a certain application value.

Test of regression analysis prediction model
The prediction model of the runoff ( 5) is tested by using a total of 15 sets of data of an annual runoff from 1996 to 2010, and the results are shown in

Method of grey prediction model
Assuming that the original time sequence is , its cumulative sequence is , and then the differential Formula model established based on cumulative sequence is as follows: (6) 01002-p.2

EMME 2015
The solution of above Formula is as follows: Formula ( 7) is a prediction model of one variable, GM (1, 1) on the first order.The parameter A in Formula (7) can be determined by the least square method.The specific formula is as follows: Where When the prediction precision in Formula ( 7) is below standard, it can be corrected by using the residual model of GM (1, 1).We can define that the residual sequence is , where .
The model of GM (1, 1) corresponding to residual is as follows: Finally, the correction model of GM (1, 1) is as follows: Where !d ® G .

Establishment and test of grey prediction model
The parameters: a=-1.93×10-3 , b=8543.24 can be obtained based on Formula (8) by using a total of 42 sets of hydrologic data of annual runoff at Boluo Hydrologic Station from 1954 to 1995.We can obtain a prediction model of annual runoff by substituting a and b into Formula (7): The precision of prediction model ( 11) can be tested by using 42 sets of data that are used for establishment of models.The results show that the root-mean-square error (RMSE) between the predicted value and measured value =2238.9m 3 s -1 , NS=0.026, R 2 =0.162, and its precision is poorer than that of the prediction model of regression analysis (5), so it is difficult to meet operating requirements.The residual grey prediction Formula of the model ( 11) can be established based on Formula (9) and the model ( 11) can be corrected.Af-ter calculation, in the residual model, parameters a e =2.367×10 -2 , b e =2498.2.We can obtain a residual grey prediction model by substituting a e and b e into Formula (9): We can obtain a grey prediction model with an annual runoff by substituting a, b, a e and b e into Formula (10): The model ( 13) is also tested by using 42 sets of data that are used for establishment of models.The results show that root-mean-square error (RMSE)=1644.8m 3 s -1 , NS=0.4698, R 2 =0.770, and the model prediction precision has a significant increase; for 42 sets of data, the relative error at only 7% of the data point is more than ±30%, indicating that the prediction model ( 13) of the annual runoff GM (1, 1) is practical.To verify the reliability of model ( 13), we test it by using a total of 15 sets of data of annual runoff from 1996 to 2010, and the results are shown in Figure 2. The predicted value of the model is very close to its measured value, and the relative error of only 2 sets of data points is more than ± 30%.The fitted equation of the predicated value and measured value is Y Predicated value =0.650Y Measured value +3154.7(R 2 =0.774,P=0.01).

MATEC Web of Conferences
The Nash statistical coefficient of the predicted value and measured value of the model, NS = 0.627, RMSE = 1728.8m 3 s -1 , indicating that the prediction model ( 13) is reliable, which can meet the requirements of hydrologic prediction.

Establishment of neural network prediction
model BP neural network adopts the multi-layer forward feedback network and the back-propagation algorithm, with a strong nonlinear mapping processing capacity (Ju et al. 2009), which constantly adjusts the network weights through forward propagation of learning information and back propagation of error information, so that the output value is close to the measured data as much as possible (Abdi et al. 1996).Three-layer BP neural network can map or approach the vast majority of rational functions (El-Din and Smith, 2002).
Taking the year as an input variable (X), and an annual flow as an output variable (Y), the paper builds a prediction model with an annual flow of three-layer BP neural network with only one hidden layer.The number of neurons, training algorithms and transfer function in the hidden layer has an important impact on network training and prediction results (Yesilnacar et al. 2007).To determine the above parameters, we adopt a trial-and-error method (Raman and Sunilkumar, 1995) to screen optimal network structure based on relevant indicators outputted by BP neural network in the training process.
There are three kinds of training algorithms: the conjugate gradient algorithm, the gradient descent algorithm with momentum item and the Levenberg-marquardt algorithm.There are three kinds of transfer functions in the hidden layer: the tangent function "Tansig", the logarithmic function "Logsig" and the linear function "Purelin".The number of neurons in the hidden layer should not exceed the number of training samples with a total of 42 sets of training samples in this paper.There are a total of 21 sets of number and space of neurons (1:2:41).After the combination of above three parameters, there are 3*3*21=189 kinds of neural network solutions in total.
Based on the network output results in the training process: the root-mean-square error (RMSE) between the predicted value and measured value, the Pearson's correlation coefficient (r), the Nash statistical coefficient (NS) and the network training time consuming (t) and network training iteration (I), a method of comprehensive evaluation for the sum of relative gap can be adopted to assess merits and demerits of the program, so as to determine the optimal neural network program.In neural network training of each program, the network setting error and the maximum number of training are respectively 10 -4 and 10 4 .The runoff prediction model of BP neural network is established by using a total of 42 sets of flow data from 1954 to 1995.To reduce error caused by the dimension of network training data and difference in the order of magnitudes, original data are processed by normalization.The specific formula is as follows: X i =x i /(x max +x min ) ( 14) Where, X i and x i are respectively the data of normalization processing and original data; x max and x min are the maximum and minimum values of the original data series.
The neural network command -NEWFF is used in Matlab® 7.13 software package to establish BP neural network model of hydrologic prediction.The above 189 kinds of programs are successively trained, and its results are shown in Figure 3.In general, the network training algorithm adopts Levenberg-Marquardt, and the transfer function in the hidden layer adopts "purelin" with better effect of network training and simulation.
This paper adopts a method of comprehensive evaluation for the sum of relative gap to screen an optimal program of network topology algorithm.Its specific method is as follows.Assuming that there are m evaluated program (s), each program has n evaluation indicator (s), and the database of evaluation object is K j =(K 1j ,K 2j ,…,K nj ), where j=[1 m], and the optimal evaluation object is K o =(K 1 ,K 2 ,…,K n ).The optimal evaluation object K p is determined as follows.
The bigger and better indicators select the maximum value of indicator in m program (s), while the smaller and better indicators select the minimum value of indicator in m program (s).A sum of the relative gap weighted of each program and optimal program is: Where W i is an index weight coefficient of the item i; M i is an index median of the item i in each program.The result is ranked by the size of D value.The smaller the D value is, the closer to the optimal program it will be.
In the algorithm of comprehensive evaluation for the sum of relative gap, the weight of output parameters of neural network training, the root-mean-square error (RMSE), the Pearson's correlation coefficient (r) and the Nash statistical coefficient (NS), the training time consuming (t) and iteration (I) is determined by the analytic hierarchy process.The results are respectively shown as follows: W RMSE =0.2334, W r =0.3130, W NS =0.3130, W t =0.0785 and W I =0.0620.According to the Formula (15), the optimal neural network program is the program 119-BZ with relative gap and minimum D of 0.0075.The hidden layer of such program has 27 neurons, which adopts the linear transfer function "Purelin" with the training algorithm of Levenberg-Marquardt.

Test of prediction model of neural network
Based on the optimal program (119-BZ) in the prediction model of BP neural network with annual runoff determined by above screening, the model precision is tested by using another 15 sets of measured data from 1996 to 2010, and the results are shown in Figure 4.

01002-p.4 EMME 2015
The predicted value of the prediction model of BP neural network is very close to the measured value.The relative error of only one data point is over 30%; the fitted equation is Y Predicted value =0.990Y Measured val- ue +231.3(R 2 =0.853,P=0.01).The Nash statistical coefficient of the prediction value and measured value of the model, NS=0.665, and the root-mean-square error (RMSE) =1639.1 m 3 s -1 , indicating that the precision of the prediction model of annual runoff based on the optimized BP neural network is better, which can meet the general requirements of hydrologic prediction.The prediction model of BP neural network is practical.

Comparative analysis of three prediction models
Based on several common statistical indicators, the paper conducts a comparative analysis of three prediction models of annual runoff, and the results are shown in Table 1.As can be seen, the effect of the prediction model of BP neural network is the best with the minimum RMSE, and maximum R2 and NS, of which the data point with more than 20% of relative error only accounts for 26.7% of a total of 15 sets of validation data.The variable coefficient of the relative error of the predicted value is only 0.301, indicating that the prediction precision and stability of the pre-diction model are the best.The precision and stability of the prediction model of regression analysis are the worst among three models.The effect of grey prediction model is slightly inferior to that of the prediction model of BP network model.MATEC Web of Conferences value; R vc represents the variable coefficient of the relative error of the predicted value.In Table 1, (1), ( 2) and ( 3) respectively represent the regression analysis method, the grey theory and the BP neural network.(2) The prediction model of regression analysis has a worse effect with Nash statistical coefficient of 0.4, of which the root-mean-square error is 1.3 times the degree of the other two methods, and the relative data of 30% of the predicted data is over 30%.The effect of the grey prediction model and the prediction model of neural network is better than that of the prediction model of regression analysis.NS of both predication models is over 0.6; R 2 between the predicted value and measured value is over 0.7(P=0.01).In a word, the precision and the reliability of the neural network prediction model of an annual runoff are the best in comparison.
LV DQ 2SHQ $FFHVV DUWLFOH GLVWULEXWHG XQGHU WKH WHUPV RI WKH &UHDWLYH &RPPRQV $WWULEXWLRQ /LFHQVH ZKLFK SHUPLWV XQUHVWULFWHG XVH GLVWULEXWLRQ DQG UHSURGXFWLRQ LQ DQ\ PHGLXP SURYLGHG WKH RULJLQDO ZRUN LV SURSHUO\ FLWHG MATEC Web of Conferences and it belongs to the national hydrologic site with the first-class precision, located at the right bank of Dongjiang River in Guangdong Province.The data used in this paper are the data of annual runoff at Boluo Hydrologic Station from 1954 to 2010.

Figure 1 .
NS of the predicated value and measured value is 0.392, and the efficiency of prediction model is relatively low; root-mean-square error (RMSE) =2208.0m 3 s -1 .The relative error at 33% of the data point is over ±30%; the relative error at 40% of the data point is less than ±10%, and the prediction precision is general.The fitted equation of the predicated value and measured value of flow is Y Predicated value =0.395Y Measured value +4287.3(R 2 =0.629P=0.01).Where, the relative error in 1997 and in 2003 is more than 30%because these two years are mutation nodes of the hydrologic regime (Xueyan Wang, 2011).

Figure 1 .
Figure 1.Test of prediction model of annual runoff by the regression analysis method

Figure 2 .
Figure 2. Test of GM grey prediction model

Figure 4 .( 1 )
Figure 4. Test of prediction model of BP neural network

Table 1 .
Comparative analysis and statistical results of three prediction models