A Mobile-based Platform for Big Load Profiles Data Analytics in Non-Advanced Metering Infrastructures

With the rapidly increase of electricity demand around the world due to industrialization and urbanization, this turns the availability of precise knowledge about the consumption patterns of consumers to a valuable asset for electricity providers, given the current competitive electricity market. This would allow them to provide satisfactory services in time of load peaks and to control fraud and abuse cases. Despite of this crucial necessity, this is currently very hard to achieve in many developing countries since smart meters or advanced metering infrastructures (AMIs) are not yet settled there to monitor and report energy usages. Whereas the communication and information technologies have widely emerged in such nations, allowing the enormous spread of smart devices among population. In this paper, we present mobile-based BLPDA, a novel platform for big data analytics of consumers’ load profiles (LPs) in the absence of AMIs’ establishment. The proposed platform utilizes mobile computing in order to collect the consumptions of consumers, build their LPs, and analyze the aggregated usages data. Thus, allowing electricity providers to have better vision for an enhanced decision making process. The experimental results emphasize the effectiveness of our platform as an adequate alternative for AMIs in developing countries with minimal cost.


Introduction
A keen attention has been directed in the last decades towards the utilization and optimization of electricity sources, especially with the tremendous increase of demand on the different levels of granularity. Thus, automatic meter reading (AMR) came about in the middle of 80s as an automated mechanism to gather basic electric meter-reading data, instead of depending on the individual collectors of the electricity providers to pass by one-by-one household to get its reading that represents the monthly energy consumption [1]. AMRs tended to be dedicated for collection only on either a monthly or daily basis, without any facilities of control messages or broadcast commands. In 2005, advanced metering infrastructure (AMI) technology has arisen in many countries, evolving from the foundations of AMR [2]. AMIs allow automated, real-time record and send of consumers' consumption data to a central computer in order to analyze and control user consumption.
The analysis of the high velocity, volume and variance of collected AMI data has allowed accurate aggregations for seasonal / long term usage trends, and provided a pattern of the total demands with many associated innovative analytical outcomes, including load profiling, peaks identification, abnormal usages, demand forecasting, electrical thefts, and intelligent pricing tariffs [3][4]. Thus, it allows operators to optimize the energy supply. Due to the lack of satisfactory nation-wide infrastructure and budget to establish their AMIs, many developing countries are still counting on individual collectors to record consumers' consumptions. Besides missing the vital bundle of advantages formerly discussed, this approach is always subject to the physical presence of the consumer at his premises at the time of the collector's visit and to human errors.
In this study, we present our novel mobile-based Big Load Profiles Data Analyzer (mBLPDA) platform, which is considered to be a cost-effective alternative for AMIs in developing countries. mBLPDA platform uses the widely-spread smart mobile devices to allow each consumer to securely submit his own reading within a predefined duration allowance, associated with some automatically-captured data for the reading to verify the submitted data. The submitted readings are verified then aggregated per consumer to build his load profile (LP) into a central cloud-based server, where further analysis is performed using R language packages. The rest of the paper is organized as follows; section 2 overviews the different studies related to the LPs. Section 3 discusses our proposed platform, whereas section 4 presents the associated experiments and results to evaluate our proposed platform. Finally, section 5 concludes our study and highlights our future work.

Related Work
Many researches have been directed towards the analysis of smart meter readers' data for different purposes. Table  1 represents a comprehensive summary for the main studies related to electrical load profiles, clarifying the study area with the applied analytical approaches. Electricity production and consumption were investigated to forecast electricity load consumption in [5] using a genetic algorithm, in [6] using a data-based methodology, and in [7] using support vector machine for regression (SVR) and multilayer perceptron (MLP) for district or single household level. Whereas in [8], a semi-parametric approach based on generalized additive models theory was suggested. In [9], an incremental time series clustering technique using ARIMA time series forecasting model, along with a novel affinity score for determining cluster similarity of time series datasets was proposed. An Incremental Summarization and Pattern Characterization (ISPC) framework was introduced in [10] to incrementally characterize patterns in a data stream of electric meter reader and correlate these across time through incremental learning. Authors in [11][12] proposed a threat model and a predictive usage analytical toolkit for smart meter management applying time series analysis predictive modeling and regression analysis respectively, using data analytics on data collected by AMI in order to identify abnormal consumption trends and possible fraud to reduce electricity theft.
Another methodology for distribution feeder identification was presented in [13] to provide correctness improvement of distribution primary feeder connectivity by identifying the groups of customer meters connected to the same primary distribution feeder. Kmeans and self-organizing maps (SOMs) were applied in [14] to analyze load profiling in order to provide more customized tariff offers for customers pricing and supply availability. A profile identification, short-term load forecasting, and customer segmentation methodology was proposed in [15] to compute the daily profiles from estimated models. Another cloud-based Dynamic Demand Response (D 2 R) platform was introduced in [3] to perform curtailment strategy selection, intelligent demand-side management and forecasting, visualize LP patterns, and to relieve peak load. In [16], Fourier transforms and Gaussian time series techniques were investigated to forecast domestic electricity system demand by analyzing LPs. Models were proposed in [17][18][19] to construct LPs for household electricity in order to determine energy use, forecast usage, predict charge in future energy use, design the power network, and to improve efficiency of energy build structure. to the approaches developed in [21][22] to reduce the total energy consumption in periods of expected peak load.
On the other hand in [23][24], Probabilistic Neural Networks (PNNs) and Fuzzy C Means (FCM) clustering algorithm were used to determine and allocate the typical LPs to consumers. Unsupervised learning based on SOMs have been used in [25][26][27] to classify, filter and identify customers' consumption patterns in order to learn both their distribution and topology, and segment the demand patterns for electrical customers. Additionally frequencybased indices and hourly LP curve methods were proposed in [28]. In [29][30], authors have modified the Follow-The-Leader algorithm and proposed a frequencydomain approach with SOMs for consumption patternbased classification of electricity consumers in order to know accurate knowledge of the customers' consumption patterns. Hierarchal clustering was also considered in [31] to determine customers' daily LP in order to cluster the similar units of measured LPs. A hierarchical cluster tree was constructed using the distances between profiles to divide or group each pair of near LPs until coherent clusters were formulated.
In [32], hierarchical clustering and FCM with PNNs were used in order to study the distribution of deviations between the estimated purchased electricity at the power market and the actual consumption. In [33][34], a probabilistic model using beta probability distribution function (beta pdf) and a formulating method for energy LP for domestic buildings were proposed respectively to analyze residential consumer loads. Clustering has been applied based on the proposed scenarios of occupancy patterns. Whereas Hadoop was used in [35] to introduce an auto-regressive time series modelling technique, and in [36] to build a smart meter data analytics approach using OpenTSDB, Hbase in order to store time series data and to compute electricity consumption profiles from residential smart meter data, combined with temperature data and the occupants' daily habits.
Although the previous researches investigated various analytical studies for the LPs and consumption patterns of electricity consumers, however all studies counted on the existence of an AMI to provide the data required for such analysis. Thus, this completely waives all territories having non-AMIs from getting benefits out of their LPs' analytical studies. One main reason for the uniqueness of the proposed research is that it provides an alternative platform for AMIs and individual collectors' visits that require consumers' physical existence to get their meter's readings, allowing developing regions to analyze and control their LPs for better decision making, optimized service in case of peaks, and energy theft detection.

Mobile-based BLPDA
In this section, we present our proposed mobile-based platform for big LPs data analytics (mBLPDA). This platform allows non-AMIs environments to track and analyze their energy consumptions for better decision making when AMIs are infeasible. A cloud-based server is utilized to manage and store the big LPs data, where all connections are made through JSON web services between the different sub-systems. A web-based interface is considered for the administrative services required for the system at the energy provider's side, providing the main analytical results with the appropriate graphical representations. As shown in figure 1, the system architecture of our proposed mBLPDA platform is composed of two main components explained as follows. This architecture can be considered for any energy domain having metering devices, i.e. electricity, gas, water… etc.

Mobile-based Load Profiler (mLP)
A mobile-based application has been designed to allow building a secured LP for consumers having non-smart meter devices. This mobile load profiler app (mLP) allows the authenticated users to use their smart mobile devices to capture images with a pre-defined frame size according to the size of the meter device's reading window. An OCR module is integrated to mLP in order to extract the reading digits and the meter device ID out of the captured image. Once the reading digits and ID are extracted, mLP wraps them together with the date, timestamp, and exact longitude and latitude of capturing the reading image, in addition to the digital signature of the authenticated user. Advanced Encryption Standard (AES) algorithm is then applied to encrypt the wrapped content, which is directly sent to BLPDA server [37]. Thus, readings are secured and associated with the user's authenticated identity during transmission. mLP allows consumers as well to track their consumptions, invoices, and payments on a monthly basis, and to get main analytical insights about their usage patterns.

Big Load Profiles Data Analyzer (BLPDA)
This represents the core engine of our proposed platform, where big data analytic approaches are investigated to deal with the huge volume of LPs data. It is responsible for the LPs management and the associated analysis per region. BLPDA consists of the following sub-modules.

Load Profiles (LPs) Manager
LPs Manager receives the encrypted readings that are sent simultaneously from the authenticated users through their mLP. The content is first decrypted and verified to confirm that the extracted meter device ID and location match successfully with the registered profile of the authenticated user. The verified data is then accumulated to the consumer's LP according to the date of reading in order to compute his energy consumptions. Hence, LPs are constructed per consumer and distributed based on the geographical location of the meter devices. A payments and receivables management system can be integrated to generate invoices and handle the payment processes.

Regional Consumption Analyzer (RCA)
Considering the LPs accumulated for consumers, RCA uses R language packages to perform two main categories of analysis; per region and per household within a region analytics [38]. (i) Per region: RCA clusters the huge volume of consumption rates within a region using k-means approach, and forecasts the expected load demand for the next 12 months using time series analysis. Accordingly, usage peaks can be predicted in order to avoid service malfunctioning under load pressures.
(ii) Per household within a region: RCA estimates an overall approximation for the population distribution within a region using correlation, and detects possible energy theft cases using anomaly detection approach through the new R anomalous package [39][40]. This takes into consideration the LPs' patterns and the local average of consumption per individual.
Analytical reports and graphs are aggregated on a monthly and quarterly basis to visualize the outcomes at the energy provider side as well as the consumer side through mLP. Figure 2 clarifies the system flowchart of mBLPDA platform.

Experiments and Results
In order to evaluate our proposed platform, we operated mBLPDA for 6 months in 4 different regions. 1,648,576 electrical meter readings were collected through mLP Android app. Although mBLPDA architecture is generic for any energy domain having meter devices, we focus in our experiments on the electricity domain in order to predefine the frame size of capturing to match the local electric meter devices. We have considered one of the most popular electrical meter device models as shown in figure 3, with the edge corners of the adjusted frame size CSCC 4023 of capturing. Moreover, a customized OCR module is integrated to mLP to extract the reading digits and meter device ID out of the captured image, having the specific font type and size of the local electric meter devices.  Experimental results indicate that mBLPDA platform can provide most of the essential analytical insights as in AMIs. Figure 4 shows the average time in seconds versus the size of content to handle in bytes in order to process a captured reading image by extracting and encrypting the associated content (i.e., reading digits, meter device ID, in addition to the date, timestamp, longitude, and latitude of image capturing), and then to transmit all of these processed data into the central cloud-based server. mLP app takes an average of 12.57 seconds to process and transmit 150 bytes on average, and increases till it reaches around 25.11 seconds for an average size of 550 bytes. Thus, mLP app consumes an overall average time equals to 20.63 seconds for the processing and transmission of one reading. Whereas Figure 5 presents the consumers' LPs collected through mLP app during the pilot duration for 4 the regions, having average consumptions ranged from 4,617,388 KW to 8,886,716.042 KW per month/ region. We have used Rand Index (RI) as a measure to evaluate the quality of consumption rates clustering. RI is considered since it equally weighs false positives and false negatives, where both are false, as in the equation below [41]:

RI = TP + TN / TP + FP + FN + TN
Where TP represents the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives. Consumptions rates are clustered into 3 main clusters (i.e., minimal, normal, and high consumption, thus k=3). As shown in figure 6, mBLPDA platform has achieved RI values ranged from 0.88 till 0.93, which indicates coherent and high-quality clusters with an overall RI average equals to 0.9. Regarding the evaluation of peaks prediction in mBLPDA platform, we applied F-measure criteria, where it combines both recall and precision parameters to assess the prediction results according to the following equations respectively [42]: As shown in figure 7, mBLPDA platform successfully achieves F-measure results for consumptions' peaks prediction with a minimum value of 0.71 and a maximum value of 0.95. This indicates a high prediction capability of an overall F-measure value equals to 0.83. We have performed additional experiments as well to investigate anomaly detection at mBLPDA on the household's level, as a preparation for energy theft detection when accompanied with payments control. A sample anomaly detected on a household's level is presented in figure 8, where the average consumption rate for this consumer during 5 months is 120 KW. The reading recorded for December is 243 KW, which is a way beyond the normal average consumption rate.

Conclusion and Future Work
Managing and analyzing the energy consumptions of energy providers among the world has become a dramatically increasing demand, where AMIs have supported during the last years to provide useful analytical insights. In this paper, we tackle the problem of building energy LPs in the non-AMIs environments. Many developing countries have not yet settled their AMIs, depending on individual collectors to record readings.
We propose our mobile-based big LPs data analyzer (mBLPDA) platform, where consumers can use mLP smart app to securely submit their own meter reading within an average of 20.63 seconds for the processing and transmission of one reading. Readings are all aggregated on cloud-based server that applies R language packages to analyze consumption rates, forecast load demands and peaks, and to detect energy theft, with an approximated population distribution estimation based on the built LPs. A popular design for local electric meter devices is used. Experiments show that our proposed mBLPDA platform successfully clusters LPs with an average RI equals to 0.9, whereas it predicts peak loads with an average of Fmeasure equals to 0.83.
As our future work, we investigate considering most of the designs of the local electric meter devices, including other energy meter devices for gas and water, and adding more analytical studies that increase the visibility of energy providers to support their decision making processes for a better service.