SVM-based Implicit Stochastic Scheduling Mode for Cascade Hydropower Stations

. In view of the uncertainty of inflow of cascade reservoirs, an implicit stochastic joint scheduling function model for cascade hydropower stations based on support vector machine (SVM) was established. Taking the cascade hydropower station on the lower reaches of the Yalong River as an example, according to the long series of optimized scheduling simulation operation data, the Gauss radial basis (BRF) kernel function is utilized by LIBSVM for the scheduling function fitting of the cascade reservoirs in the lower reaches of the Yalong River. Besides, combined with particle swarm optimization algorithm, the support vector machine (SVM) model parameters c (penalty coefficient) and g (relaxation coefficient) were optimized. Eventually, the optimized scheduling function model was used for the cascade hydropower station simulation operation. The results show that compared with the existing scheduling technology, the nonlinear SVM scheduling function is better than the linear regression model, and the effect of the nonlinear SVM scheduling function is equivalent to that of the threshold regression model. Therefore, the SVM-based Implicit Stochastic Scheduling method can provide references for the actual operation of the cascade hydropower station.


Introduction
With the slowdown of hydropower development, the optimal operation of cascade reservoirs has attracted more and more attention. At present, there are two modes of optimal scheduling methods for cascade hydropower stations: explicit stochastic optimization scheduling and implicit stochastic optimization scheduling. The idea of implicit stochastic optimization of hydropower station was proposed by American scholars in 1967 [1] . The purpose is to find out the scheduling rules from the optimal scheduling process, and subsequently utilize the optimal scheduling theory for the tools that guide the actual operations, such as scheduling function, optimal operation chart, etc. [2] .
Major methods for establishing the cascade reservoir function model include multiple linear regression model, threshold regression model and artificial neural network model [3] . Research results show that the abovementioned methods have certain limitations. For example, multiple linear regression models cannot reflect the multidimensional nonlinear relationship among influence factors [4] , and the threshold regression method is limited when the amount of data is small [5] , while the artificial neural network model network structure is difficult to determine [4] . Support Vector Machine (SVM) is a machine learning classifier based on statistical learning theory. It can be used to solve classification or regression problems [6] . It avoids the explicit expression of nonlinear mapping because of the introduction of kernel function. Although its calculation amount is equivalent to the linear regression method, it can reflect the multidimensional nonlinear relationship. Compared with other nonlinear regression methods, the SVM regression algorithm is simple, easy to implement with less complex computation, which enable this algorithm to excel in solving large-scale problems [4] . Therefore, this investigation intends to solve the problem of implicit stochastic scheduling of cascade reservoirs based on support vector machine technology, so as to provide reference for the actual operation of cascade hydropower stations.

Support Vector Machine
At present, support vector machine classification technology has been widely applied to machine learning, pattern recognition, pattern classification, computer vision, industrial engineering and aerospace [7] . The basic idea of this technology is to define the optimal linear hyperplane, and summarize the algorithm for finding the optimal linear hyperplane as solving an optimal convex quadratic programming problem [4] . Take the principle of the solution of the nonlinear regression problem based on support vector machine as an example, the steps can be summarized as follows: (1) For a given sample point that cannot be divided by a hyperplane, the kernel function is used to map the sample points to a high-dimensional space, in other words, the nonlinear problem is converted into a linear problem in a high-dimensional space.
As for the nonlinear support vector machine, the Gauss radial basis kernel function is a common kernel function, and it can be expressed as: (2) After the kernel function is converted, the decision function can be expressed as: Where w is the weight coefficient, b is the bias phase. Finding the optimal w and b, and minimizing the confidence interval when the linear decision function remains unchanged, the optimization problem can be expressed as: Finding the optimal w and b, and minimizing the confidence interval when the linear decision function remains unchanged, the optimization problem can be expressed as: Where ε is the error.
When the constraint cannot be achieved, the slack variables ξ i and ξ i * is introduced and the optimization problem can be expressed as: The convex quadratic programming problem can be solved by using the Lagrange multiplier method.
(4) Through the regression estimation function obtained by learning, the obtained regression estimation function can be used for nonlinear classification or regression prediction. The regression estimation function can be expressed as: Where NNSV is the number of standard support vectors, ɑi * andɑj * are Lagrangian multipliers, and C is the penalty coefficient, K(xj, xi) is a kernel function.

Particle Swarm Optimization
Particle Swarm Optimization (PSO) is a new type of bionic algorithm proposed by Kennedy and Eberhart in 1995. The algorithm originates from a study on predation behavior of birds. The basic idea of this algorithm is to find the optimal solution based on the information transmission and information sharing between individuals in the group. This algorithm features less adjustment parameters, simple operation, easy implementation and strong versatility [8][9][10] . At present, there are certain achievements concerning PSO algorithm research and PSO algorithm is widely used for function optimization, constraint optimization, engineering design problems, and power system fields. Therefore, this investigation optimized the main parameters c and g of SVM by using PSO algorithm.

Model construction 3.1 Case Analysis
Taking the cascade hydropower station on the lower reaches of the Yalong River as an example, an implicit stochastic optimization scheduling function model was constructed. The Yalong River is the largest tributary of the upper reaches of the Yangtze River and it is rich in hydropower resources. The segment from Lianghekou to the middle and lower reaches of Jiangkou is listed as a national hydropower base. According to altitude, from high to low, the five hydropower stations along the lower reaches of Yalong River are Jinping Station I (JY), Jinping Station II (JE), Guandi Station (GD), Ertan Station (ET) and Tongzilin Station (TZL). These hydropower stations are operated jointly and thus form the cascade reservoir. The cascade reservoir is mainly responsible for power generation, with no other comprehensive utilization requirements [3] . Because the cascade reservoir scheduling method is affected by the accuracy of runoff prediction and the uncertainty of incoming water, in order to improve the operation level of cascade reservoirs, an implicit stochastic scheduling function model of PSO-SVM was constructed based on SVM method to guide the operation of cascade reservoirs.

Creation of Sample Data
Among the cascade hydropower stations on the lower reaches of the Yalong River, JY is the leading station with annual regulating capacity. The ET station is regarded as the fourth level, and it has seasonal adjustment capability. JE station, GD station and TZL station only have daily regulating capacity, whose water levels remain steady in long-term operation. Therefore, their regulation of runoff was ignored in this study. In order to facilitate the comparative analysis, the model input and output selected in this study are consistent with the literature [3]. Besides, the water levels at the end of each period of JY and ET are the decision variables. The JY initial water levels of the period, inflow flow of the JY-ET interval as well as the ET initial water level of the period are independent variables. These variables were regarded as the scheduling function model input.
In this study, the long-range runoff data from November 1953 to October 2012 is analysed. The calculation period is from November to December of the following year, and the calculation cycle is one month. Simulation calculation was performed by using the progressive optimality algorithm (POA). The results were used as data samples for implicit stochastic optimization calculation. The data from November 1953 to October 2002 in the data samples were used as training samples to determine the model parameters, while the data from November 2002 to October 2012 were used as test samples for the examination of model effects.

Model establishment
Through the use of pattern recognition and regression toolbox LIBSVM-3.23, the medium-and long-term scheduling model PSO-SVM of cascade reservoirs on the lower reaches of the Yalong River was established in MATLAB2018b. The model establishment steps can be summarized as follows: (1) The model training samples and prediction samples were normalized. The normalization range is [0, 1]. The basic principle can be expressed as: Where x is the original sample data; y is the processed sample data; x max , x min , y max , y min are the maximum values of the corresponding processing data.
(2) The particle swarm algorithm parameters were set, and the training set samples were input so as to obtain the main parameters c and g of the multi-group LIBSVM learning model.
The main parameters of the particle swarm algorithm are: learning factors c 1 and c 2 , the general range of values [0, 4]; population size; maximum number of iterations; inertia weight. According to the empirical method, the particle swarm parameters of all PSO-SVM models are uniformly set as: learning factor c 1 =c 2 =2, and the maximum inertia weight is 0.9; the minimum inertia weight is 0.4; penalty coefficient upper bound ; relaxation coefficient g ∈ (0.001, 100).
(3) The support vector machine and kernel function type in the training model of LIBSVM was selected to train the data training samples for the establishment of the learning model. Besides, the established learning model was used for the prediction of the training set, and the relative error and the qualified rate were analysed, so as to determine the optimal parameters c and g.
The training model parameters in LIBSVM are: support vector machine type, kernel function type, kernel function set value, and so on. In order to facilitate calculation and improve the prediction accuracy of the model, the selected support vector machine type in the training model parameters of PSO-SVM is multi-level classification and regression support vector machine, and the kernel function type is Gauss radial basis kernel function, while all other parameters in the model were set to be default values.
(4) Regression prediction is performed on the prediction sample set according to the learning model established by the optimal parameters c and g.

Model Parameters
According to the above steps, the parameters of the PSO-SVM scheduling function model of each month period, of the cascade reservoirs in the lower reaches of the Yalong River are obtained, as shown in Table 1.

Comparative Analysis
In order to facilitate the comparison analysis concerning the water level prediction results of the linear regression and threshold regression models in literature [3], the mean relative error index was used to represent the degree of fitting with the optimal scheduling trajectory [7] , and the PSO-SVM prediction yield was compared with linear regression model and the threshold regression model (the relative error less than 10% is considered qualified). Taking April as an example, the PSO-SVM scheduling fitting results of the cascade reservoirs on the lower reaches of the Yalong River were calculated, as shown in Table 2. The results were compared with the linear regression and threshold regression models in the literature [3], as shown in Table 3. The absolute difference is the difference between the PSO-SVM model and the linear regression model or the threshold regression model.    Table 3 shows that the maximum errors, minimum errors and average errors of PSO-SVM scheduling model are smaller than those of linear regression model by comparing PSO-SVM scheduling model with linear regression method, and the qualification rates are also higher than those of linear regression model. Compared with threshold regression, the average errors of PSO-SVM scheduling model are all less than those of threshold regression, and the qualification rates are generally better, so the scheduling fitting of PSO-SVM scheduling model degree is better; however, because the partial qualification rates of PSO-SVM scheduling model are less than those of the threshold regression model and the partial errors are greater than those of the threshold regression model, the fitting result of PSO-SVM scheduling model is more discrete and the stability of the simulation calculation is not as good as that of the threshold regression model. In summary, the effect of PSO-SVM scheduling model based on support vector machine is better, which is equivalent to that of the threshold regression method.

Simulation operation
By using PSO-SVM scheduling model and C# language programming on VS2017 platform, the rolling simulation calculation of cascade hydropower stations in the lower reaches of Yalong River from November 2005 to October 2006 is carried out. The simulation results of Jinping I and Ertan were obtained. The relationship between water level change process, output and power generation of the two reservoirs and time is shown in Table 4 and Figure 1.