Prediction of water consumption using Artificial Neural Networks modelling (ANN)

. This paper presents an application of Artificial Neural Network models (ANN) to predict the water consumption at two scales: i) District Metered Area (DMA) located in the Scientific Campus of Lille University and ii) End user representing a restaurant inside this DMA. Data are collected from Automated Meter Readers (AMRs) that measure in near real-time the water consumption. The models are trained at both daily and hourly time intervals using historical values and the variation between the hour and the type of days. The paper shows that the ANN-based models can well predict the water consumption including peak values.


Data collection and analysis
Water consumption data were obtained at hourly time interval from AMRs in the period from the 1 st of January 2015 till the end of April 2016. This study is based on data related to (i) a DMA that comprises university faculties, administrative buildings and students' residences and (ii) a restaurant inside this DMA. Statistical analysis is conducted first in order to predict the water usage profile during the year and identify the boundaries of the study domain. Figure 1 shows the water consumption of the DMA and the restaurant at monthly time interval. It could be observed that the water consumption is * Corresponding author: elias_farah@live.com related to the university calendar, with a significant drop during summer holidays. This is expected since teaching and administrative buildings are closed during this period and the resident students leave the campus for summer break. In addition, the water consumption in this study is not related to the seasons or the weather variation. This result could be explained by the localization of the demo site in North of France, which has a marine west coast climate according to the Köppen Climate Classification [8] and experiences rainfall more than 150 days per year.

Modelling approach
A feed-forward back-propagation artificial neural network is used in the numerical modelling using a Matlab script. The time series of historical water consumptions are used in the model to forecast future demand. Two main data analyses were carried out: -The first analysis aimed at predicting daily water consumption and forecast the demand for the next day taking into account a lag of 1 day in order to study the effect of the water consumption of the day before on the actual demand. -The second analysis was conducted to predict the water consumption for the next day with a step of 1 hour using the historical values of the day before, with a lag length of 24 hours. This lag interval is chosen due to the transmission of recorded data at 24 hours' time interval. The input vector for the daily basis includes the day of the week DWi (i varies from 1 to 7, i.e. Monday, Tuesday, Wednesday, etc.), the holidays HO (i.e. automn, winter, spring and summer holidays) and the special days SD (i.e. New Year's day, Easter Monday, Labor day, etc.). The input values are either 0 or 1. In other terms, for the DWi input, if the day is a Monday, the value will be set to 1 and the others days will be assigned as 0. For the daily prediction, a new vector is added to the input consisting of the water consumption of the day before Qd-1. The target data set correspond to the daily water consumption. Analysis at hourly time interval required the consideration of the additional following parameters: a vector representing the hour of the day HDi (i varies from 0 to 23) and the time series of historical water consumption with a lag of 24 hours Qh-24. A pre-processing step is necessary to filter the values from any abnormal event due to a reported leak or to any faults in the signal. The anomalies in time series are identified using Chebyshev's inequality [9]. This theorem guarantees that in any probability distribution, the majority of the values are close to the mean value. Using this inequality, the values are filtered to eliminate the outliers that exceed the average plus three standard deviations (i.e. 89% of the values lies within two standard deviations). The input dataset is then divided into three sets: 70% for training, 15% for validation and the remaining 15% corresponds to the testing set. The input and target variables are normalized in the range of 0 and 1 according the following equation: (1) where X̅ is the normalized input or target variables X, Xmin and Xmax represent the minimum and the maximum of observation values respectively. The aim of normalization is to avoid any prioritization of the variables and to remove any arbitrary effect of similarity between the objects [10]. The performances of the ANN prediction models are evaluated by the root mean square error (RMSE) and the coefficient of determination (R 2 ):

Daily prediction analysis
Collected data for the DMA and the restaurant are trained using the water consumption of the previous day (i.e. lag equal to 1 day). Table 1, figures 2 and 3 show the quality of the ANN predictions. For the DMA, a good correlation is observed between recorded and predicted values (R 2 is equal to 0.827 and RMSE is equivalent to 0.110); the ANN model predicts well the water consumption, including the peak and minimum values. The ANN model performances for the restaurant are yet better with a factor R 2 equal to 0.902 and RSME equivalent to 0.1.

Hourly prediction analysis
In order to model the water demand in quasi real time even, different ANN models were trained taking into account the water consumption at the hourly scale. A set of vector representing the hour of the day HDi (i varies from 0 to 23) is added to the input matrix in addition to the day of the week, holidays and special days variables and the time series of historical values with a lag of 24 hours. Table 2 and figure 4 show the quality of the ANN predictions. For the DMA, the ANN model predicts well the water consumption including the peak and minimum values with R 2 = 0.825 for the ANN testing phase. Results obtained for the restaurant are also very good with R 2 = 0.884 which is slightly higher than the correlation from the data of the DMA. This can be explained by the fact that data from the restaurant's water meter represent a unique profile whereas the DMA includes different behaviors of water consumption.

Conclusion
This paper presented the use of feed-forward back propagation ANN models for the prediction of the water consumption in the campus of Lille University at both daily and hourly time intervals. Analyses showed good performances of the ANN model for both time-scales, including peak values. For the hourly prediction, considering the historical values with a lag of 24 hours, the time and day of the week, holidays and special days, the root mean square error equals 0.07 at the DMA scale and 0.059 at the end-user one. The ANN-based model could be easily implemented and used for short-time consumption prediction. Coupled to realtime consumption recording, they could indicate anomalies in water consumption, in particular those related to water leak. This method could also be used to estimate "recover" missing data.