Water demand forecast model of Least Squares Support Vector Machine based on Particle Swarm Optimization

： In order to solve the problem of precision of water demand forecast model, a coupled water demand forecast model of particle swarm optimization (PSO) algorithm and least squares support vector machine (LS-SVM) are proposed in this paper. A PSO-LSSVM model based on parameter optimization was constructed in a coastal area of Binhai, Jiangsu Province, and the total water demand in 2009 and 2010 were simulated and forecasted with the absolute value of the relative errors less than 2.1%. The results showed that the model had good simulation effect and strong generalization performance, and can be widely used to solve the problem of small-sample, nonlinear and high dimensional water demand forecast.


Introduction
Water demand forecast is an important part of water supply system optimal operation. Accurate water demand forecast can allocate limited water resources reasonably and effectively, it can not only avoid the waste of resources caused by wrong water allocation, but also ease the tension of water resources to a large extent. Because of the late start of water demand forecast in our country [1], the series length of water demand data is short and the reliability of data is low [2], and there are many factors influencing water demand forecast, such as quota method, time series method, trend analysis method and other traditional forecast methods. They not only cost a large amount of work, but also difficult to guarantee the accuracy of the forecast.
With the rapid development of information technology, many intelligent technologies based on data mining have gradually appeared, artificial neural network, support vector machine and so on have been favored by a wide range of researchers. However, because artificial neural network is a method following the principle of empirical risk minimization, the generalization performance of its model is much worse than that of support vector machine which follows the principle of structural risk minimization when dealing with small sample problems [3]. Therefore, support vector machine and its improved models are widely used to solve the problems of small sample, nonlinear and high dimensional water demand forecast. Therefore, in order to further improve the forecast accuracy of support vector machine in the field of water demand forecast, and simplify the operation process of machine learning, a water demand forecast model based on least squares support vector machine [4] and particle swarm optimization algorithm was established in this paper. We took an area in Binhai, Jiangsu province as our study area. The model was verified by using the historical water use data and related influencing factors from 2000 to 2010 in order to provide a reference for the establishment of high precision water demand forecast model.
The mathematical description of PSO is as follows: suppose that in the k-th iteration of a m-dimensional space, a particle population consists of n particles: . Among them, the position vector of the i-th particle can be expressed as x represents the position of particle i in the k-th generation, d-dimensional search space. The fitness value can be obtained by substituting k i X into the objective function which needs to be solved.
The current particle's individual optimal value pbest can be expressed as , and there must be one best particle in the whole population, that is the group optimal value gbest, …… . The velocity vector of particle i moving in the search space can be expressed as In the iterative process, the algorithm uses the following formula to update the velocity and position of the particle: ( 1 ) Where k is the current iteration number. search space before and after iterative updating respectively.
In order to limit the variation of velocity vector in each optimization process, the following constraints should be satisfied.
Where max V is the maximum allowable limit for contemporary velocity variation.

Least Squares Support Vector Machine
Based on the traditional SVM [10,11], the least squares support vector machine transforms inequality constraints into equality constraints, and uses square terms as the optimization index to make the calculation more convenient. n samples are mapped to high dimensional feature space by nonlinear mapping, and the optimal decision function is constructed as follows.
The loss function of LS-SVM uses the least square linear system, and its optimization problem can be transformed into: Unlike traditional support vector machines, LS-SVM transforms the non-negative relaxation factor i ξ into a binary norm of error. Similarly, by using Lagrange method to solve the above optimization problem, we can change it into the following quadratic programming problem.
Where i λ is the Lagrange operator corresponding to the i-th sample.
According to the conditions of optimization and the definition of kernel function ( , ')= ( ) ( ') , the optimization problem can be further transformed into solving the following linear equations.
The LS-SVM model is obtained by solving λ and b by the least square method.

PSO-LSSVM
Parameter optimization For LS-SVM, the accuracy and generalization ability of the model mainly depend on the penalty coefficient C, the kernel function and its parameters. However, there are no effective methods to select the parameters reasonably [12]. Most studies show that it is better to select RBF as the kernel function when using LS-SVM for regression estimation [13][14][15]. Therefore, PSO algorithm is used to optimize the penalty coefficient C and σ of RBF. The simulation accuracy and generalization performance of the LS-SVM model are not only related to C and σ themselves, but also closely related to the relationship between them. Therefore, each parameter can not be optimized separately, and the optimization of the parameter pairs should be considered at the same time under the premise of the given fitness function.
In this paper, the k-fold cross validation technique [16] is used to optimize the two parameters. The data sets are randomly divided into k equal parts. Each part has the same number of non-intersecting data. For each group (C， σ ), the model is trained by using (k-1) parts of them, the remaining one is used to validate. It simulates k times, and calculates the average error of k times. The average error is the cross-validation error. According to different (C， σ ) combinations, different cross-validation errors are obtained. The final fitness function is the minimum value of data set cross validation error MAPE, see formula 10.
Where k is the number of subsets of the dataset, usually 5 or 10, and i y and i y are the actual value vector and the analog value vector of the i-th subset, respectively.

Model construction
The essence of using PSO-LSSVM model to forecast water demand is to solve such a regression problem. 1 2 ( , , ) i n Where y indicates the water demand to be forecasted, i x is the i-th factor affecting the water demand.
Because there are many factors affecting water demand forecast, the complicated nonlinear and high dimensional problems between water demand and its influence factors can be solved by establishing a water demand forecast model by PSO-LSSVM. The model transforms this problem into a least squares support vector machine regression problem of n input variables (influencing factors) and one output (water demand).
The algorithm flow of PSO-LSSVM water demand forecast model is shown in Figure 1.

Case study
In this paper, the PSO-LSSVM model was applied to the simulation and forecast of water demand in Binhai of Jiangsu Province. The water users were divided into four categories, namely life, industry, agriculture and ecology.
The factors affecting the water demand of each type of users were different. Therefore, by using the method of sub-forecast, the influencing factors of water demand of different kinds of users were clarified, and the PSO-LSSVM water demand forecast models were established separately. The annual water consumption of all kinds of

Domestic water demand forecast
The main factors affecting domestic water demand are the total population, per capita urban water use quota, per capita wage, leakage rate of urban water supply pipe network and popularization rate of water-saving apparatus from 2000 to 2010.
The above five factors are used as the input of PSO-LSSVM model to simulate the domestic water demand in the study area. The relative errors of the training period and the verification period of the model are shown in Table  1.   Table  1, it can be seen that the absolute values of relative errors in the whole simulation period are less than 0.47% and the simulation accuracy is good. Among them, the maximum absolute value of relative error during training period and verification period are 0.24% and 0.46%, respectively.

Industrial water demand forecast
The factors affecting industrial water demand include gross industrial output value, industrial added value, reuse rate of industrial water, average consumption rate of industrial water and water use quota of per unit industrial added value from 2000 to 2010. The above five factors are used as the input of PSO-LSSVM model to simulate the industrial water demand in the study area. The relative errors of the training period and the verification period of the model are shown in Table  2.   Table  2, it can be seen that the absolute values of relative errors in the whole simulation period are less than 4% and the simulation accuracy is good. Among them, the maximum absolute value of relative error during training period and verification period are 2.18% and 3.94%, respectively.

Agricultural water demand
The main factors affecting agricultural water demand include precipitation, total agricultural output and planting area of crops from 2000 to 2010.
The above three factors are used as the input of PSO-LSSVM model to simulate the agricultural water demand in the study area. The relative errors of the training period and the verification period of the model are shown in Table  3.    Table  3, it can be seen that the absolute values of relative errors in the whole simulation period are less than 2% and the simulation accuracy is good. Among them, the maximum absolute value of relative error during training period and verification period are 1.28% and 1.93%, respectively.

Biological water demand forecast
The main factors affecting ecological water demand include urban green coverage, forest coverage and comprehensive index of environmental quality from 2000 to 2010.
The above three factors are used as the input of PSO-LSSVM model to simulate the ecological water demand in the study area. The relative errors of the training period and the verification period of the model are shown in Table  4.   Table  4, it can be seen that the absolute values of relative errors in the whole simulation period are less than 2.3% and the simulation accuracy is good. Among them, the maximum absolute value of relative error during training period and verification period are 1.30% and 2.25%, respectively.

Total water demand forecast
The total water demand in Binhai is composed of four types of users: life, industry, agriculture and ecology. The relative errors of the total water demand during training period and verification period of the model are shown in Table 5.   Figure 6, we can see that the simulated value of agricultural water demand of the model fits well with the actual value. Combined with the results showed in Table  5, it can be seen that the absolute values of relative errors in the whole simulation period are less than 2.1% and the simulation accuracy is good. Among them, the maximum absolute value of relative error during training period and verification period are 1.01% and 2.07%, respectively.

Conclusion
Water demand forecast is of great significance to water supply system operation and management. In this paper, PSO-LSSVM water demand forecast model is established, and a case study is carried out in an area in Binhai of Jiangsu Province. The conclusions are as follows.
(1) The PSO-LSSVM water demand forecast model can quickly find the optimization, which is superior to the traditional method in saving the workload.
(2) The model was used to simulate and calculate the domestic, industrial, agricultural and ecological water demand in the study area. The simulation value was in good agreement with the historical actual value, and the absolute value of the relative errors in both the training period and the verification period can be controlled within 5%. The model has high forecast precision and strong generalization ability. (3) The example shows that the PSO-LSSVM method has good applicability and high application value in the field of complex water demand forecast with small sample, nonlinear and high dimension.