Short-term Electricity Load Forecasting Model and Bayesian Estimation for Thailand Data

This paper proposes multi-equation linear regression model with autoregressive AR(2) method for modelling and forecasting a day ahead electricity load. AR(2) is used to show the dependency of next data on its previous two days data because the nature of electricity load consumption for the next day follow the pattern of previous days. Since, we allocate one equation for particular half hour, we need 48 separate equations to predict for one complete day. Parameters of model are estimated based on two different approaches(i) Classical approach, and (ii) Bayesian approach. Classical or Ordinary Least Square approach estimates the parameters in terms of single value and hence its forecast is also single value, where as Bayesian approach includes the predictive distribution of electricity load due to multiple values of parameters. So, we can forecast electricity load not only from mean, but median and with other percentile values. In this paper, we use 70-percentile value for forecasting because the performance for all models accounts better in this percentile than that of mean and median forecast. Finally, we compare the performances where Bayesian estimation provides better and consistent performance than that of OLS estimation.


Introduction
Accurate short term load forecasting is important and crucial because electricity distribution grid requires proper balancing between electricity supply from production companies and its consumption at every instant.If the electricity load is not balanced, it affects the electricity distribution system with unnecessary blackouts or load shedding and huge loss of revenue for utility company.So, proper forecasting helps to maintain power system more secure, avoid blackout risk and provide adequate electricity supply.Therefore modelling and forecasting of electricity load for short horizon is alarming issue today.Normally, forecasting system consist the several models, and we normally provide historical observed data as main input variables.Many authors build univariate time series model without exogenous variable.For example, paper [1] follows a double seasonal exponential smoothing for half hour data and predict with mean average percentage error (MAPE) 1.25 to 2 percent.In paper [2] overall forecast result is about MAPE 1.5 to 3 percent for the dataset of 10 European countries.But, for proper prediction only the historical load data may not sufficient because there are several other important factors that cause instant variation on demand of load.Paper [3] investigatesthoroughly on the electricity load and temperature issue showing the relationship between electricity demand and a Corresponding author: kamal_chapagain@wrc.edu.nptemperature in Great Britain.Harvey and Koopman [4] has used structure time series model and include temperature using cubic spline to get forecast MAPE up to 3 percent, where as same model extended by [5] with Bayesian semi-parametric regression model on new South Wales data gives forecasting MAPE around 2 percent.Since, the load of special days, and public holidays deviates significantly from normal load and need to be treated separately.
The multiple linear regression method is still an interesting forecasting option because of its simplicity.Paper [6][7][8][9] use regression based method for short term load forecasting of normal days and use dummy variable for special days.Time series ARIMA process also perform forecasting for nonstationary mean by a differencing process [10] and is the popular model for forecasting.Paper [9] present a model for hourly electricity load forecasting state space model based on stochastically timevarying process that are designed to account for changes in customer behaviour and in utility production efficiency.For long term forecasting, Paper [11][12] forecast electricity demand with different forecasting methods-ARIMA, ANN, and MLR also compare their performance for Thailand data.Similarly, [13] propose new Gene Expression Programming (GEP) approach for the prediction of electricity demand.Multivariate weather-like temperature, wind speed, cloud cover, humidity, with historical load based models are employed for modelling electricity consumption load [12] [3].Classical approach for short-term forecasting normally employs regression methods with dummy variables for example [6][7][8].
The paper [9] develops a regression model for each hour of the day, and incorporates dummies for special days-like bridge days as well.Bridge day is the special day if there is a normal day like Monday before holiday or Friday after holiday.In such special days,consumption of electricity load is lower than other normal days, so they treated separately.Paper [6] build a separate regression model for each of the hour and include special day, weather effects on load using dummy variables.Paper [7] use multiple equation regression model vector auto-regression model for 48 coefficients for each dummy variable to reflect intraday seasonality using half hourly load and estimated using Markov chain MonteCarlo technique.Paper [13] builds a dynamic multivariate periodic regression model for hourly data.Since, unrestricted model contains many unknown parameters and effective methodology is developed with the state-space framework where as [8] developtwo-stage model for each hour of the day for load, such that special days load in the first stage using dummy variable.Any unexplained component in the load is then modelled in the second stage using either AR or ANN.But, we adapt the combination of [6] and [7] techniques as a base paper for this research.
The plan of the paper is as follows.Section 2 presents the model and modelling strategy.Two different approaches for parameter estimation are described in Section 3. Section 4 presents the forecasting result, and their performance is analysed in Section 5. A final remark as conclusion is found in Section 6 and abbreviations used in this paper is listed in Section 7.

Model Specifications
We develop multiple linear regression (MLR) model with autoregressive AR(2) inspired by the paper [6] who develop multiple regression model with separate equations for each half hour of the day.Since, we develop the model to predict the electricity load for whole day with every half hour interval, we develop 48 separate prediction equations.The model is formulated as a AR(2) regression with its own coefficients and error terms, ܻ ݀ℎ = ܺ ݀ℎ + ߳ ݀ℎ (1) where ߳ ݀ℎ = ܰ(0, ߪ 2 ).In equation 1,݀ represents the day and ℎ for every half hour.The some of the variables used are perfectly predictable like day of the week, and month of the year.The Weather variables such as temperature and square of temperature are also used with some lagged load variables in this AR(2) model.The complete weekday/holiday model for particular hour can be written as, , where, ܻ ݀−1 , and ܻ ݀−2 explains the AR(2) model that accounts dependency of current value to previous two days.‫݉݁ܶ‬ ݀ , andܶ݁݉‫‬ ݀ 2 use the current value of temperature and temperature square.Both weekday and holiday model equation 2 consist same 20 coefficients for weekday and holiday that we have to estimate.And similarly, weekend model for particular hour is expressed as This equation 3 represents the model for weekend prediction and we use separate historical weekend data for the prediction.

Classical Approach (OLS Technique)
We use ordinary least square estimation technique as a classical method for the estimation of parameters.This method minimizes the sum of squared residuals (SSE) to obtain the parameters.The load model for this method can be written as, where, is݊ × 1observations, is a݊ × ݇ matrix of predictors, ࢼ is a݇ × 1 parameters, and ࣕ is an ݊ × 1 random distributions.Since, parameters ߚ ݅ are unknown and estimated from the observation data of and variables ܺ ݅ as (ࢼ) = ( ′ ) − ′

Bayesian Approach
Unlike to OLS, we draw burst of ߚ መ and corresponding ߪ ො 2 from the posterior distributionwhere, posterior distribution is the product of prior belief and likelihood function.Prior belief is the initial assumption of ߚ መ with some value of variation and likelihood function is the observational data with some mean and variance.As, we consider the prior belief is normally distributed so that our posterior distribution is also normally distributed.So, ܲ൫ߚ መ , ߪ ො 2 หܻ൯ = ‫݊ܽ݁ܯ(ܰ‬ * , ‫ݎܸܽ‬ * ) so that, where, ߚ 0 , ∑ ‫݁ݎܽ‬

Forecasting Result
From the historical data set, 80 percentages of available data was used for training and remaining 20 percentage data was separated for validation purpose, but here we predict only for five days and present the result as in figure 4. In Figures 1-3, prediction of electricity load from Bayesian estimation is plotted for each half hour and compared with actual load.We utilize the advantage of probabilistic approach and predicted the next day electricity load taking 70-percentile value from the distribution.Because, the prediction from 70 percentile providing here a consistent value with actual load for each model than that of mean and median prediction value.Although, this performance is still not enough to satisfy for weekends and holidays forecast.We forecast the electricity load for same horizon from OLS and compare with Bayesian approach and tabulated in performance analysis section.The consistency of estimation in parameter can be examined from the sequence of retained draws during Gibbs sampling.The visual inspection in Figure 4 is easier to check from its random fluctuation of all 20 coefficients of weekdays without any trends.Since, they are fluctuating randomly around the stationary mean, we can conclude that our estimation of parameters from Gibbs sampling is converge.

Performance Analysis
The performance is based on the capability of model that can predict the future value with better accuracy.Mean Absolute Percentage Error (MAPE), and Root Mean Squared Error (RMSE) are widely used techniques to measure performance in forecasting.the deviation of predicted data from our real load in terms of percentage.We also account RMSE, which measures the deviation in terms of Megawatts.Performance of weekdays prediction from both OLS and Bayesian approach (Gibbs sampling) is better than that of the weekends and holidays predictions.In five days prediction horizon, Gibbs sampling estimation technique gives better performance for all five days with the MAPE range from 0.54 to 3.68 for every half hour.
For weekends, though the forecast performances from both techniques do not look good.Bayesian approach provides more consistent MAPE and RMSE value than that of OLS.In real data analysis, the consumption of electricity load on Sunday is much less than that of Saturday, but here we treat within a same model.So, this may be one possible cause of inconsistent performance.So, the performance of weekdays and holiday forecast is really inconsistent from both techniques.

Conclusion
This paper presentsMLR with AR(2) model for one day ahead forecasting which can predict for every half hour electricity load consumptions.The model is treated separately according to weekdays, weekends, and holidays.Since, we estimate the coefficients in two different techniques and performance is compared for individual days.The overall MAPE calculation for weekdays using least square method is 3.73 percent where as Gibbs sampling result 1.77 for same time horizon.Some inconsistency for the performance of weekend is due to the different average load consumption on two consecutive days.

2 ቃ
of ߚ መ value and its variance ߪ ො 2 as prior belief.From this ‫݊ܽ݁ܯ‬ * and ‫ݎܸܽ‬ * we draw the number of values using Gibbs sampling.These samples are stored after crossing the burn in stage of Markov chain and discard other samples.In our algorithm only 1000 samples are saved out of 5000 samples.So, the series of samples are stored as, ߚ መ = ‫݊ܽ݁ܯ‬ * + ቂߚ መ * ‫ݎܸܽ(‬ * ) 1 and for ߪ ො 2 we sample scalar value from inverse Gamma distribution with degree of freedom

Table 1 .
Overall performances of our models (weekdays, weekend, and holiday)

Table 2 .
List of abbreviations used in the paper