Ascertaining Time Series Predictability in Process Control – Case Study on Rainfall Prediction

Rainfall prediction is a challenging task due to its dependency on many natural phenomenon. Some authors used Hurst exponent as a predictability indicator to ensure predictability of the time series before prediction. In this paper, a detailed analysis has been done to ascertain whether a definite relation exists between a strong Hurst exponent and predictability. The one-lead monthly rainfall prediction has been done for 19 rain gauge station of the Yarra river basin in Victoria, Australia using Artificial Neural Network. The prediction error in terms of normalized Root Mean Squared Error has been compared with Hurst exponent. The study establishes the truth of the hypothesis for only 6 stations out of 19 stations, and thus recommends further investigation to prove the hypothesis. This concept is relevant for any time series which need to be used for real time process control.


Introduction
Due to rapid growth of population, urbanization and industrialization, the demand for water has increased tremendously. The sustainable planning of water resources of a country needs prediction of hydrological and meteorological parameters among which the spatial and temporal variability of rainfall is indispensable due to its importance in issuing flood warning and to plan for crop scheduling. When considering process automation and control particularly under the changing Industry Revolution 4.0 scenario, the forecasting of a given variable from its time series plays very important role. Before forecasting, it is preferred to ensure that the time series is actually predictable. For example, Qian and Rasheed [1] have studied the predictability of financial time series data. In this study, rainfall prediction is considered. When compared to the traditional statistical methods, the performance of Artificial Neural Network (ANN) is reported to be the best in rainfall prediction [2,3]. Hence ANN is being more commonly adopted for rainfall predictions [4,5,6,7,8]. The ANN is also used in many other hydrological applications such as modeling of runoff, improving spatial interpolation of rainfall, bias correction of simulated climate data, modeling of flash flood prediction etc. [2,9,10,11,12] Since prediction of rainfall is a very arduous task due its chaotic nature and its dependency on many other climatological parameters, the researchers' adopted many methodologies as recourse of rainfall prediction using ANN. For instance, Haviluddin et al., [13] had attempted to improve the prediction by changing the hidden layer and epochs. Some authors hybridized ANN with other techniques to improve predictions. Ramana et al. [14] integrated wavelet technique with ANN. Mohd-Safar et al. [6] hybridized ANN with Fuzzy Logic to predict short term rainfall in tropical climate. Yet others used mutual information techniques for the identification of best input variable among various available input parameters to predict rainfall [15]. Pre-processing of inputs has also been commonly used to improve rainfall prediction using ANN [16].
Despite all such efforts, developing a robust rainfall prediction model continue to remain a challenge. Some researchers tried to ascertain first if a rainfall series is predictable or not before attempting prediction by using the knowledge of the memory effect of the time series [5]. If a time series have correlation among themselves either as a short memory or long memory then the predictability will be more. Hurst exponent is a measure of long term memory having value between 0 and 1. If the Hurst exponent value in very near to 0.5, then the time series is random. On the other hand, if the Hurst exponent value is close to either 0 or 1, the current trend will persist or show anti-persistent respectively in future as well. It is believed hypothetically that a persistent time series will always be predictable because of its low randomness. Some authors used Hurst exponent as a measure of predictability, but the actual predictability is not considered in the scope of their reported works [17,18,19,20]. For instance, Rangarajan andSant [19] proposed a predictability index which is derived from Hurst exponent. Rehman [20] explored dependence of rainfall predictability on pressure and temperature using predictability indices. Similarly, Rehmanand Siddiqi [21] indicated the dependency of temperature and pressure with predictability indices of precipitation and wind speed.
The work reported by Khalili et al. [5] is an attempt towards demonstrating the relationship between Hurst exponent and predictability of a monthly rainfall time series of 50 years. They studied rainfall series in Mashhad station and found Hurst exponent to be 0.96.They reported that the rainfall prediction is good as predictability is strong with a high Hurst value. Since, the relationship between Hurst exponent and predictability of rainfall time series has not received sufficient attention, it is necessary to investigate in more detail the validity of this hypothesis. This study investigates the Hurst exponent of 19 stations in the Yarra River catchment in Victoria, Australia. ANN is used for one lead prediction.

Study area and data set
The Yarra River catchment comprises of three segments which are upper, lower and middle. The lower segment is an urbanized one, which in danger of flash flood, if any extreme rainfall occurs in middle segment. Also the lower segment is highly depends on middle segment catchment for its water requirement [22]. Though area is less, it is one of the highest productive regions in Victoria. Hence, rainfall-related studies are of importance for this catchment [23].
The observed monthly rainfall time series from January 1981 to December 2012 of the middle segment of Yarra River Catchment are used in this study. To study the predictability, nineteen rain gauge stations are used and its locations are shown in Figure 1. The name of the nineteen stations corresponding to the station numbers can be found in Table 1.

Methodologies
Many methods are available to find the Hurst exponent, of which the rescaled range (R/S) analysis method is commonly adopted by many researchers [18,19,20].To predict the rainfall, ANN is used in this study. Both R/S analysis method and ANN technique are explained in the upcoming sub-section.

R/S Analysis
For prediction of Nile River flooding, a British Water Engineer, Harold E. Hurst developed the R/S analysis method [20].
The procedure to find the Hurst Exponent is as follows: 1) The data have to be arranged into a number of different time length (size) of data, Then the Rescaled range of each size will be calculated as follows,

Artificial Neural Network
The architecture of ANN is motivated by the structure of the human brain and nerve cells. This technique is used identifying the statistical pattern present in the time series and applies it to unknown data to predict. A network of countless simple elements called neurons with a small amount of local memory is considered. The neurons are connected through connections which carry numeric data encoded by various means. Each neuron operates only when it receives data through the connections. The architecture is formed by the learning algorithm which is responsible for the extraction of the regularities present in the data through the finding of a suitable synapses set during the process of observation of the examples. Accordingly, ANNs solve problems by self-learning. The feed-forward architecture is used in this study. One input layer, one output layer and one or more hidden layers are available in the architecture. The information passes from the input layer to the output layer through hidden layer. Each layer is fabricated by several neurons, and the layers are interconnected by a set of weights. Neurons operate the input and transform it to produce an analog output. More details about ANN have been discussed in [9,24,25,26].
The performance of the forecast is evaluated by the normalized root-mean squared-error (NRMSE) goodness-of-fit measure, represented by the following equation: Where X is the variable that is being forecasted; the subscripts m and s represent the measured and the simulated values, respectively, and n is the total number of training records.

Results and Discussions
The averages, standard deviations and Hurst exponents for the Rainfall time series of all the 19 rain gauge stations are tabulated in Table 1.For all the 19 stations, 384 values of monthly data are used for one-lead monthly prediction. To develop the ANN model, out of the 383 data values, the first 150 are used for training, the next 150 are used for testing and the remaining 83 are used for validation. The optimal architecture is found to be 13 hidden neurons using logistic activation function. The calculated NRMSE for all the 19 stations are also given in Table 1.
It is seen that for all the stations, Hurst exponent ranges from 0.66 to 0.88 and NRMSE ranges from 0.12 to 0.19. As per the hypothesis, a lower value of NRMSE is expected for a high value of Hurst exponent. But, from Table 1, it is seen that only in six stations this hypothesis is found to be true. Hence this outcome doesn't fully support the conclusions arrived at by Qian and Rasheed [1] for the financial data. This may be due to the difference in the characteristics of the financial data and rainfall data. The structures of the rainfall time series do not have serial correlation as much as in financial time series. From this, it can be concluded that Hurst exponent value also depends on the pattern of the data used.
Further, interesting information can be observed in Table 1 that the stations having identical Hurst exponent can have dissimilar NRMSE also. For example, Hurst exponent for the stations Fernshaw, Gladysdale (Little Feet Farm), Seville and Silvan are same i.e.0.72, but the NRMSE values are different i.e., 0.15, 0.19, 0.16 and 0.16 respectively. This shows evidently that the higher Hurst exponent time series are predictable but not always.
For discussion we considered only Fernshaw and Gladysdale station, due to its identical Hurst exponent with highest deviation in NRMSE. The time series plots of these two stations are shown in Figure 2 and Figure 3 respectively. Both these stations have negative trend which is not statistically significant.
The plot of actual and predicted validation set for Fernshaw and and Gladysdale stations are shown in Figure 4 and Figure 5 respectively. Despite, both stations having the same Hurst exponent , the predictability of rainfall for Fernshaw is noticeably better than that for Gladysdale. This leads to the conclusion that there is a need to further investigate the pattern of rainfall and Hurst exponent estimation and it is difficult to conclude clearly about the predictability of rainfall time series with the Hurst exponent alone.

CONCLUSIONS
1) Out of the nineteen stations considered in this study, only in 6 stations a strong Hurst indicated a stronger predictability.
2) The hypothesis that a value of Hurst exponent greater than 0.5 indicates persistence of a time series and hence predictability is not found to hold true always. Thus, this hypothesis needs further investigation.