Multi-time scale wind speed prediction based on WT-bi-LSTM

. The accurate and reliable wind speed prediction can benefit the wind power forecasting and its consumption. As a continuous signal with the high autocorrelation, wind speed is closely related to the past and future moments. Therefore, to fully use the information of two direction, an auto-regression model based on the bi-directional long short term memory neural network model with wavelet decomposition (WT-bi-LSTM) is built to predict the wind speed at multi-time scales. The proposed model are validated by using the actual wind speed series from a wind farm in China. The validation results demonstrated that, compared with other four traditional models, the proposed strategy can effectively improve the accuracy of wind speed prediction.


Introduction
In the field of renewable energy power generation, wind power resource is the most popular one due to their advantages of cleanness, wide distribution and easy to be utilized on large scales, and has attracted attention of all countries [1]. Wind speed is one of the main factors affecting wind power generation capability. However, with the characteristics of intermittent and randomness, it always causes some unstable phenomena in wind power system, such as oscillation, fluctuation, slope climbing, which result in certain limits and difficulties to integrate the clear power into the grid. Accurate wind speed prediction is one of the key technologies to ensure the safe operation of wind farm and grid system [2].
Traditional wind speed prediction methods can be divided into physical model-based methods and statistical model-based methods [3]. These methods have been studied by scholars in various countries, and have developed rapidly in prediction precision and accuracy. However, there are still many problems to be solved in order to make the speed forecast more accurate and at larger time scales. With the rapidly development of computer technology and artificial intelligence, modern predicting methods are developing in the direction of intelligence, integration and depth, which makes it possible to obtain better forecasting results compared with traditional methods [4,5].
Wind speed is a highly autocorrelated time-varying signal whose past information has a great effect on the forthcoming velocity [6]. Therefore, not only will the current inputs influence outputs, but the continuity and dependence between input samples need to be calculated when a speed forecasting model is established. For example, Lei Jinhao [7] et al. ever pointed out that long-term dependence was particularly important for generating separable and predictable features. Therefore, extracting the feature between speed signal using artificial intelligence technology is becoming a research hotspot. For example, long short term memory (LSTM) neural network model can make full use of the information of continuous samples, especially the information with long-term dependence. It is often used in time series problems such as speech recognition, machine translation, handwriting recognition and so on. It also achieves good results in wind speed and power prediction [8][9][10][11].
Jie Chen [8] and others thought that when it comes to using the integration method to build wind speed series models, it is insufficient to only use the linear combination method. Instead, it should take the structure of non-linear combination into consideration. Therefore, an integrated LSTM model using multiple LSTMs as sub-models and Extremal Optimization Algorithms (EO) as the top combiner to optimize the parameters of Support Vector Regression Machine (SVRM) is proposed. The numerical results demonstrate the superiority of the model. Yanfei Li et al [10]. established the main model of LSTM neural network based on the wind speed sub-sequence decomposed by Empirical Wavelet Transform (EWT); used Regularized Extreme Learning Machine (RELM) as the sub-model to model the prediction error of LSTM; and finally used Inverse Empirical Wavelet Transform (IWET) to obtain the predicted values.
The research results above confirm that LSTM model can make full use of the historical information of the sequence, and this model has been more and more widely used. At the same time, some scholars put forward the rule of learning signals from the opposite direction [12,13], making important progress in speech recognition, Chinese event extraction and some other aspects. However, in the wind speed prediction, it has never been considered and studied from this point of view. In addition to considering the past information, whether the reversal rule of the future to the present can be used in wind speed prediction is the research focus of this paper.
In order to obtain higher accuracy of wind speed prediction, a neural network model based on bidirectional long-short memory (bi-LSTM) is proposed to extract historical and future information of wind speed after the signal is transformed by wavelet. Firstly, through the Wavelet Transform (WT), the wind speed signal is decomposed into low-frequency and highfrequency components. Therefore, the different frequencies of the signal are separated to reduce the interaction. Secondly, the bi-LSTM model is established for each frequency signal, obtaining the hidden state in two directions. By making full use of the double side information, the signal components are predicted in multi-steps. Finally, the predicted components are added together to obtain the multi-time scale prediction results. After analyzing the wind speed series in the actual wind farm, five models are established. Among them, the proposed model in this paper which considers bidirectional information has the highest prediction precision and accuracy, especially in extracting trend information of wind speed in large time scale.
The rest of this paper is organized as follows. Section 2 introduces the methods used in this paper, which is mainly about WT and bi-LSTM. Section 3 compares the precision and accuracy of different models and discusses some possible reasons for the results. Finally, Section 4 presents the conclusions and prospects of this paper.

Wavelet Transform
Wavelet transform can analyze signals from the time domain and the frequency domain at the same time. Through stretch and translation operations, WT performs a multi-scale refinement analysis of the signals, which plays an important role in decomposing local signals [14]. For any arbitrary continuous signal ( ), assume that its energy is limited, that is, ∫ | ( )| 2 +∞ −∞ < ∞, make ( ) ∈ 2 ( ), where 2 ( ) is the signal space, then the continuous WT of ( ) is: where, ̅ ( , ) ( ) is the complex conjugate of the base wavelet function Ψ ( , which is represented as Mother wavelet, with representing scale parameter and the translation parameter. Different mother wavelets Ψ( ) acting on the signal ( ), leading to different decomposition results. However, there is no specific method to guide the selection of the mother wavelet, which is often determined by comparing the advantages and disadvantages of the transformed signal modeling effect. However, the sampling of the actual wind speed signal has a certain time interval, which belongs to a discrete signal in the time domain. When decomposing such signals, a discrete wavelet transform is required. Let the scale parameter of the continuous wavelet transform = 0 , the translation parameter = 0 0 , ( , ∈ ), so the parameter symbol changes from ( , ) to ( , ), and the discrete Wavelet Transform (DWT) is obtained.
Mallat put forward the concept of multi-resolution analysis when constructing orthogonal wavelet bases in 1989 [15], explaining the characteristics of wavelet spatially, proposing the Mallat wavelet packet algorithm for signal decomposition on this basis. The process of decomposition is shown in Figure 1.
The signal ( ) is orthogonally projected into the spaces and to obtain a lowfrequency discrete approximation signal ( ) and a high-frequency discrete detail signal ( )at a resolution . Ignoring the high-frequency ( ), each low-frequency ( ) signal can be decomposed further into another two parts of detail signal +1 ( ) and approximation signal +1 ( ) step by step with increasing until the end condition. After decomposing, the signal can be expressed as equation (3): The original signal decomposed by n times wavelet transform can be cumulatively reconstructed by an n-th stage approximation signal and detail signal of each stage, that is, in total, n+1 sub-signal sequences. For example, after wavelet transform, the wind speed signal can be decomposed into sub-sequence signals of different frequencies, effectively separating the high-frequency part of the original signal caused by atmospheric circulation, randomness, and volatility, solving the problem of irregular fluctuation of the sequence. It has been pointed out in the literature that the tightly supported bi-orthogonal Daubechies4 wavelet which was used as the mother wavelet to perform the decomposition of 2-4 layers could make a good analysis of the wind speed signal [16].

Long Short Term Memory (LSTM)
In the process of reverse optimization, the gradient information transported in Recurrent Neural Network (RNN) may disappear or explode as the number of layers increases, resulting in a weak reverse optimization and a low learning efficiency [17]. That is the reason why RNN is not suitable for dealing with long time series problem.
As a deep learning algorithm, the LSTM network is a special type of improved RNN. The LSTM cell unit undergoes a subtle combination of forgetting gates, input gates, and output gates, and introduces cell-state connections in the RNN network to resolve the problems of gradient disappearance or explosion during deep propagation. It is often used to deal with long-term dependent time series [18]. The structure of a basic LSTM cell unit is shown in Figure 2. Compared with simple RNN, LSTM no longer simply uses the cell state at the previous moment as the input of the current cell. Instead, it involves flexible choices after the training and learning through the forgetting gate , the input gate , and the output gate . When processing information within the LSTM cell unit, the operation usually involves the following steps.
From the formula above, we can see that each gate in the cell receives the hidden state ℎ −1 at the previous moment, the current input , and the state information −1 of the internal memory unit of the cell to perform different operations, and determine whether to activate by a logic function. The forgetting gate outputs a number between 0 and 1 through a non-linear transformation, determining the influence that the information of the last memory cell −1 has on the current memory state. State information that will be updated consists of two parts. One is determined by the forgetting gate; the other is determined by the input gate and inputs at time . That is, a part of the input gate is used to find the state that needs to be updated, and the other part is used to make the updated information go to the cell state. Finally, the state ℎ of the unit, the output at time and the input hidden state at time + 1 are determined by non-linearly activating ℎ (·) and the information of the output gate.
= ( = ( + ℎ ℎ −1 + + ) ℎ = * tanh( ) Among them, (·) is the sigmoid activation function, ℎ (·) is the tanh activation function, and are the connection weight matrix and the bias vector between the forgetting gate, the input gate, the output gate and different input quantities, which are usually unknown parameters to be learned in the deep learning network.

bi-LSTM
From the previous analysis, we can infer that the LSTM network can completely transmit and utilize forward information, but this way of unidirectional propagation makes the future information in the sequence, that is, the reverse information in the signal, cannot be fully used. The Bi-LSTM network joins a backward propagation LSTM network, which not only deals with the past information in forward propagation, but also considers the future sequence information in reverse recursive process, making it possible to learn the forward and backward rules of sequence at the same time [12]. As shown in Fig. 3, the bidirectional LSTM network includes two recurrent neural networks in positive and negative directions: the forward propagation process extracts longterm dependency information from historical signal; and the reverse process extracts future long-term dependency information in the signal. The output of the same neuron connects two LSTM meta cells, resulting in two hidden statesℎ 、ℎ ′ in the forward and reverse directions. The state of the output layer at time is = [ℎ , h ′ ] [19]. In the model proposed in this paper, the hidden state of the bi-LSTM network is passed to a fully connected layer to complete the non-linear mapping of the intermediate state; and then a regression layer is used to realize the regression prediction of the wind speed signal.

Case study
The data analyzed in the case study comes from a domestic wind farm, including three measured wind speed time series. The length is 1050; the sampling interval is 15min; the first 80% of the data is used as the training set, and the remaining 20% is used as the test set. The basic information of the sequence is shown in Figure 4 and Table 1. In order to construct the training and test samples of the multi-scale wind speed prediction model, we need to scroll the sequence signal back and forth, which causes a part of moved sequence to be inconsistent with the label, thus leading to some meaningless values which need to deleted directly. A total of 18 meaningless data were deleted in experimental sequence. The error index in case study uses three general error evaluation criteria [10]: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE).

Experimental setting and steps
In order to verify the precision and accuracy of the proposed method, five prediction models, namely Persistence Model (PM), SVM, bi-LSTM, WT-SVM and the proposed WT-bi-LSTM model, are established on three wind speed sequences. Among them, PM is the simplest model and is often used as a benchmark for evaluation of other higher-complexity models.
Although the wind speed is affected by many factors, this paper only considers the influence of the wind speed signal itself as input on the prediction. The wind speed values at time , − 1, − 2 and − 3 are used as the input to model wind speed at time + 1, + 2, … , + 7. This paper mainly describes sequence 1 and the main steps based on the WT-bi-LSTM model are as follows.
According to the above analysis, the three layers wavelet decomposition of the wind speed time series is performed by the tightly supported bi-orthogonal wavelet db4, obtaining three high-frequency detail signal components and one low-frequency approximation signal component. The decomposition results of sequence 1 is presented in Fig.5. The four signal components after the decomposition are scrolled back and forth to form an input-output sample matrix, respectively. After dividing the training set and the test set, the former is used to train the bi-LSTM network to obtain a prediction model, with which the test set is evaluated. In this paper, a deep learning network with a depth of 4 is established. The number of neurons is 4, 65, 27, and 1 respectively in the order of input layer, bi-LSTM layer, fully connected layer, and regression layer, with main hyperparameters illustrated as follows: the training cycles are 550; Adam optimization algorithm is employed to optimize the learning parameters; the initial learning rate is 0.01, decreasing with a discount factor of 0.5 in a frequency of 60 cycles.
The results of last step are superimposed with each other according to equation (3) as an inverse transform of WT to obtain the prediction result of the test set.
It can be seen from Fig. 5 that after the three-layer decomposition of the wind speed signal, the low-frequency part becomes smoother and the trend of the wind speed is highlighted explicitly; the high-frequency part is composed of three parts, vibrating up and down around the value of 0 at different frequencies. Each high-frequency component is a stationary signal, indicating that the high-frequency part can be regarded as white noise, mainly caused by factors such as topography and underlying surface other than wind speed signal.

Result and discussion
According to the experimental steps, seven models are established for 7 different time steps ahead of time , respectively. The prediction results of one step to seven step are shown in table 2-table 4, from which we can draw the following conclusions. One step and seven step prediction results of the first 100 values in test set are presented in Fig 6. As the time scale increases, the numerical performance of the prediction results deteriorates further. Taking the benchmark PM method as an example, the MAE value is 0.765 m/s in one step prediction, and the values of 3 steps, 5 steps, and 7 steps in advance are 1.042 m/s, 1.207 m/s, and 1.444 m/s respectively, decreasing 36.209%, 57.778%, 88.758%, respectively. MAPE and RMSE values are the same as the MAE; both deteriorate further as the time scale increases. The reasons for this phenomenon are various, mainly because of the wind speed signal itself: as the time span increases, the abrupt change of meteorological conditions causes increasingly instable factors in the original signal and increasing uncontrollability, thus leading to worse prediction effects.  Compared with the benchmark method PM, the prediction results of other more complex models on various time scales are improved, but the improvement effects of different models are inconsistent. Taking the one-step prediction as an example. Compared with PM, the increase rates of SVM, bi-LSTM, WT-SVM, and WT-bi-LSTM in MAE, MAPE, and RMSE values are 4.134%, 92.497%, 51.503%, and 90.392%; 3.505%, 92.460%, 50.120%, 90.527%; 3.842%, 81.163%, 50.480%, 83.223%, respectively. Mainly because the PM model only takes the current wind speed at time as the prediction result of time + without any linear and non-linear calculation or mapping. Therefore, establishing a linear or non-linear prediction model for the wind speed signal can effectively improve the prediction accuracy.  .545%, respectively. The SVM model only considers the mapping relationship between input and output data during the mapping process of the model, ignoring the interaction between samples. However, while the bi-LSTM model considers the mapping relationship between input and output, the rule that includes positive and negative two directions between samples is also explored in depth. Therefore, modeling the time-continuous signal of wind speed and considering the bidirectional rule of history and future can significantly improve the prediction accuracy.
Compared with the SVM model, the prediction accuracy of the wind speed multi-time scale of the WT-SVM model is significantly improved. As can be seen from the  15.019%, respectively. Mainly because WT decomposes the signal, the low frequency part of the trend signal and the high frequency part of the noise signal are stripped. For the model, the WT process obtains more feature attributes, which is more conducive to grasping the main part of the wind speed signal, thereby improving the prediction accuracy.
Compared with the bi-LSTM model, the WT-bi-LSTM model has a better numerical result in multi-step prediction of wind speed. As can be seen from the table, the prediction results of the 1st, 3rd, 5th, and 7th steps in the MAE, MAPE, and RMSE values are: -28.049%, 33.112%, 32.273%, 18.421%; -25.638%, 34.532%, 33.364%, 18.844%; 10.934%, 43.246%, 31.440%, 17.801%, respectively. In the one-step prediction results, the predictive numerical indicators MAE and MAPE of the WT-bi-LSTM model are not as good as bi-LSTM, but their absolute values are low. Probably the bi-LSTM model does not require a WT process in one-step ultra-short-term prediction, because the information required for high-precision prediction has been obtained. For longer-term predictions, the bi-LSTM model also needs to use the WT process to obtain longer trend information which can be clearly seen from the results of multi-scale predictions.
In order to verify the extension of the proposed WT-bi-LSTM model, the same modeling experiment as Sequence 1 is performed on the wind speed time series 2 and 3. The model structure and hyperparameters are not changed, the prediction results are shown in Table 7.
It can be seen from the table that for different wind speed sequences, the multi-time scale prediction numerical indicators of the proposed model perform best, and the optimal prediction results are obtained. In summary, the model WT-bi-LSTM has the best prediction effect and the highest prediction accuracy and precision. In the ultra-short-term one-step prediction, although the WT-bi-LSTM model error index is not as good as the performance of bi-LSTM, but the numerical result has been improved greatly compared to other models. Therefore, the WTbi-LSTM model is more capable of extracting wind speed signals over long time span. It is the next step in this paper to continue to study and improve the accuracy of ultra-short-term one-step prediction on the proposed wind speed sequence.

Conclusion
In this paper, a neural network model of bidirectional long short term memory using wavelet decomposition is proposed to model the wind speed time series and perform multi-step prediction. From the analysis above, we can conclude that the proposed model can make full use of history and future information of sequences. Modeling and analysis results on the measured wind speed sequence show that the proposed method is far superior to the benchmark PM method and other commonly used model. Compared with merely Bi-LSTM networks model, signal decompose strategy based on wavelet transform can make bi-LSTM have a better forecasting result on multi-scale wind speed signal prediction, especially in the long and large time scale.