Modelling the Electricity Consumption in a Manufacturing Company through Probability Distribution

. The paper deals with model of electricity consumption in a company providing hot and cold forming. As modelled data, there were used hourly records of electricity consumed in the manufacturing processes. The data covered working hours over a year. The probability distributions, chosen for modelling, were normal distribution, gamma distribution, logistic distribution, Weibull distribution and Rayleigh distribution. The goodness of fit of these distributions was judged based on the values of the information criteria ( AIC , AICc and BIC ), coefficient of determination ( R 2 ) and the root mean square error ( RMSE ). According to these criteria, the best fit was achieved by the two-parameter Weibull distribution.


Introduction
The aim of the paper is to model the electricity consumption in a manufacturing company that provides hot and cold bending. Despite the growing electricity demand from residential sector, commercial and public services, and transportation (partially caused by the spreading interest in electric cars), the industrial sector still possesses the leading role as main electricity end-user worldwide [1]. The electricity consumption by industries over the globe has increased by about 0.9% annually between years 2010 and 2018 [2]. Rapid growth has been observed in energy-intensive industrial subsectors where also processes of metal forming belong. With lessening supplies of fossil fuels and only slowly growing incorporation of renewable electricity production sources, such as wind, solar, thermal and biomass, we might witness possible electricity shortage in the future. In order to prevent such scenario, companies need to transform their processes to be more energy-efficient. In literature, there have been proposed several possibilitiesrescheduling the processes in order to optimize the peak demand patterns [3][4][5][6], supplementing part of the energy supply by renewable energy sources [7][8][9][10], applying modern procedures and machines in manufacturing [11][12][13], etc. Such solutions do not need to be limited to industry, but are applicable also in households. However, optimization of electric energy consumption is conditioned by appropriate electricity consumption model. The electricity consumption can be considered as a variable with stochastic character. Such assumption is supported by consumption variation in time and place as well as by being influenced by a number of internal and external factors. Due to this fact, several statistical approaches were adopted to modelling the electricity consumption or electricity demand. Particular method of modelling depends on the structure of available data and on the purpose of the analysis. There have been developed energy consumption models from the level of countries [14][15][16][17] to level of individual units, such as households, companies, machines [18][19][20]. Models of electric energy consumption include models based on analysis and forecasting of time series, probability distributions, regression analysis or Markov chains. For review on modelling methods, see for example [21]. In this paper, we model electricity consumption in manufacturing processes recorded over the period of one year. Five probability distributions are fitted to data: normal, Weibull, Rayleigh, logistic and gamma distribution. The goodness of fit of each probability distribution is judged by the Kolmogorov-Smirnov test. The most appropriate probability distribution is chosen based on values of information criteria -Akaike's information criterion, corrected Akaike's information criterion and Bayesian information criterion; coefficient of determination and the root mean square error. The paper is organized as follows: in section 2, the modelled data are described alongside with brief characteristics of probability distributions chosen to fit the data. Moreover, the estimation method for probability distribution parameters is defined and the model selection criteria and goodness of fit test are outlined. Results of the fitting procedure are discussed in section 3. Summarization of the paper is given in section 4.

Data and methodology
In the following section, we briefly characterize data along with probability distributions that are fitted to the data. As a method for estimation of parameters of each probability distribution, the maximum likelihood method is chosen. Further in the section, the criteria for goodness of fit are listed.

Characteristics of data
Data modelled in this paper represent electricity consumed in manufacturing processes in a company providing hot and cold bending. The data were collected from four production halls and were recorded hourly. They cover the consumption of electric energy (in [kW]) during the working hours throughout one year. According to this, the sample includes 3984 observations. Descriptive statistics of the data are given in Table 1. As we may see from the coefficient of skewness, the data are positively skewed. This might indicate that the probability distribution fitting the data will be asymmetrical. Such assumption is not surprising since consumptions of low and moderate amount are more common; on the other hand, high consumptions occur seldom.

Probability distributions
As authors in [22] noted, a variety of probability distributions can be used to fit the electricity consumption data. In [23], authors fitted two-parameter Weibull and twoparameter lognormal distribution to electricity consumptions in selected Swedish households. Electricity consumption in university campus in Nigeria was modelled by normal, two-parameter Weibull and Gumbel distribution [24]. The electricity loads have been modelled by several probability distributions: in [25], Weibull distribution was fitted to energy consumption of customers in Northern Ireland; lognormal distribution was applied to electricity consumption data in [26]. In [27], a number of distributions -Weibull, normal, inverse-normal, Rayleigh, gamma, lognormalwere fitted to consumption data.
Similarly in [28] where data of electricity load in two substations of Venezuelan power system were fitted by normal, Weibull, gamma, Rayleigh and exponential probability distributions, respectively. Probability distributions chosen for fitting to data considered in this paper are summarized in Table 2.
shape parameter;scale parameter;location parameter; scale parameter

Parameter estimation method
Among the methods for estimation the parameters of probability distributions, the maximum likelihood method (MLM) has been chosen since it is considered as the most common one. Let 1 , 2 , … , be a random sample from a population with probability density function ( ; ) where = ( 1 , 2 , … , ) is a vector of parameters, ∈ Ω, and Ω is the parameter space. The likelihood function ( , ) is a function of vector parameter defined as follows (1) with = ( 1 , 2 , … , ) being the realization of random sample. If there exists a value ̂∈ Ω such that for all ∈ Ω ( ,̂) ≥ ( , ) then ̂ is called the maximum likelihood estimate of parameter . When estimating parameters of particular probability distribution via MLM, we need to find values that maximise the likelihood function (1). In order to achieve that, the system of equations in the form need to be solved. However, it is more convenient to work with the logarithm of likelihood function called loglikelihood function. Substituting (4) in (3), we obtain a system of likelihood equations for finding the maximum likelihood estimates of parameters. The likelihood equations for each considered probability distribution are given in Table 3.
We can see that shape parameter of Weibull and gamma distribution, respectively, as well as both parameters of logistic distribution cannot be calculated directly. In such cases, the iterative methods are required to find these particular estimates.

Model selection criteria
In order to choose the probability distribution that fits the data the best, we apply information criteria, coefficient of determination and root mean square error (Table 4).

Criterion Formula
Akaike's information criterion [29] = −2 ln ( 1 , 2 , … , ; ̂) + 2 Akaike's information criterion corrected = + 2 ( + 1) − + 1 Bayesian information criterion [30] = −2 ln ( 1 , 2 , … , ; ̂) + ln Here 1 , 2 , … , represent a random sample from a population with cumulative distribution function ( ) and 1 , 2 , … , are respective observations. Term ( 1 , 2 , … , ; ̂) is the maximum value of likelihood function (1) for estimated model, m is number of estimated parameters, n is the sample size. Further, for coefficient of determination R 2 and RMSE, the term ̂( ) represents the estimated cumulative distribution function with ̅ being the average value defined as Function ( ) is the empirical distribution function where ( ( ) ≤ ) = 1 if inequality ( ) ≤ holds; otherwise 0. Values (1) , (2) , … , ( ) are observations in ascending order. When choosing the best fitting model among the considered ones according to given criteria, we judge by the values of these criteria as followswe select the model with lowest values of AIC, AICc, BIC and RMSE and with R 2 closest to 1. However, these criteria provide only partial information. Their main disadvantage is that the value of information criterion alone does not express whether the probability distribution fits the data properly or not. Therefore, it is optimal to combine the information criteria with goodness-of-fit tests. Among the existing tests, we opt for Kolmogorov-Smirnov goodness-of-fit test (KS test). The test determines whether the data follow given probability distribution with cumulative distribution function ̂( )(null hypothesis), against the alternative that they do not come from such distribution. The null hypothesis is rejected on significance level when the test statistic is greater than the critical value of KS test. Usually, the p-value is used for comparison with null hypothesis being rejected at significance level when − value < .

Results and Discussion
Parameter estimates obtained by MLM, along with the information criteria, coefficient of determination and the root mean square error are summarized in Table 5. All the calculations were done in software Matlab R2019b.  Table 6. Critical value of the test is 0.0215 on significance level =0.05.Obviously, Weibull distribution significantly fits the electricity consumption data on given significance level . However, in case of Rayleigh distribution, we cannot unequivocally decide whether to reject or not the null hypothesis because the value of test statistic is very close to critical value (p-value very close to significance level). In order to make decision upon goodness of fit, we apply Anderson-Darling goodness-of-fit test on significance level =0.05. As p-value of the test is 0.0045, we may conclude that Rayleigh distribution does not fit the data significantly. Other three probability distributions do not significantly fit the modelled data based on the results of KS test. Fit of all considered probability distributions to energy consumption data is visualised in Figure 1.

Conclusions
In the paper, data of the electricity consumption in manufacturing processes of hot and cold bending were modelled. The data covered hourly consumption in working hours throughout one year. They were modelled by fitting probability distributions. As tested distributions, following were selected: two-parameter Weibull distribution, two-parameter gamma distribution, normal distribution, logistic distribution and one parameter Rayleigh distribution. As criteria for selecting the best fitting model, information criteria -Akaike's information criterion, Akaike's information criterion corrected, Bayesian information criterion, coefficient of determination and root mean square errorwere considered. In order to assess the goodness of fit of each distribution to modelled data, the Kolmogorov-Smirnov test was realized. According to model selection criteria, Weibull distribution provides the best fit, closely followed by Rayleigh distribution, with gamma distribution being the third best. However, only Weibull distribution passed the KS test, i.e. this distribution significantly fitted the data. Thus, we may conclude that electricity consumption data from manufacturing processes in considered company follows Weibull distribution with shape parameter ̂=565.796 and scale parameter ̂= 1.952.