Comparison on the Analysis on PM10 Data based on Average and Extreme Series

The main concern in environmental issue is on extreme phenomena (catastrophic) instead of common events. However, most statistical approaches are concerned primarily with the centre of a distribution or on the average value rather than the tail of the distribution which contains the extreme observations. The concept of extreme value theory affords attention to the tails of distribution where standard models are proved unreliable to analyse extreme series. High level of particulate matter (PM10) is a common environmental problem which causes various impacts to human health and material damages. If the main concern is on extreme events, then extreme value analysis provides the best result with significant evidence. The monthly average and monthly maxima PM10 data for Perlis from 2003 to 2014 were analysed. Forecasting for average data is made by Holt-Winters method while return level determine the predicted value of extreme events that occur on average once in a certain period. The forecasting from January 2015 to December 2016 for average data found that the highest forecasted value is 58.18 (standard deviation 18.45) on February 2016 while return level achieved 253.76 units for 24 months (2015-2016) return periods.


Introduction
Environmental quality management is more concerned about extreme situations due to its various hazardous impacts. However, most statistical methods are concerned primarily with what goes on in the centre of a statistical distribution, and do not pay particular attention to the tails of the distribution. Statistical modelling of extreme air quality has a very practical motivation since these events have major effects on human and ecosystem. Degradation of ambient air quality standards reduces visibility, impairs air, land and water transportation and seriously affects the economy.
The region's drier weather conditions has led to escalation in hotspot activities that are caused mainly by land clearing and "slash and burn" agricultural practices [1]. Improper management of open burning for commercial plantation sectors, heavy industries and business activities has made the situation worse. High PM10 levels have been a common problem in Malaysia especially in the dry season. During the haze periods, PM10 was found as the main pollutant while the other air quality parameters remained within the permissible healthy standards [2]. Study conducted by Dominick et al. [3] found that air pollution in eight selected air monitoring stations in Malaysia based on year 2008 to 2009 are predominantly influenced by PM10. Yusof et al. [4]  A time series is a collection of past values of the variable being predicted. Holt-Winters method was first proposed in the early 1960s. It uses a process known as exponential smoothing. Exponential smoothing in its simplest form should only be used for non-seasonal time series exhibiting a constant trend or also known as a stationary time series. The smoothed series depends on all previous values, with the most weight given to the most recent values. The Holt-Winters model uses a modified form of exponential smoothing. It applies three exponential smoothing formulae to the series which represent by α, β and γ hyper parameters. Firstly, the level (or mean) is smoothed to give a local average value for the series. Secondly, the trend is smoothed and lastly each seasonal sub-series (ie all the January values, all the February values and so on, for monthly data) is smoothed separately to give a seasonal estimate for each of the seasons. The Holt-Winters method set the hyper parameters γ to 0 which apply the exponential smoothing with trend and without seasonal component [5].
Extreme value (EV) theory is unlike other statistical approaches since its focus is on the tail of distribution either on maxima or minima values. The scope of EV theory has been widely explored in various fields. Recently, it has become a vigorously research area due to its implication in many applications. Literatures on EV theory among others are by Coles [6] and Haan [7] that provide from basic EV theory to the application of EVT in various fields. Basically, there are two main models for extreme series which is generalized extreme value distribution (GEV) and generalized Pareto distribution (GPD). This study focuses on extreme PM10 concentrations based on GEV model. This paper aim to approximate the possible extreme PM10 levels in the future.

Methodology
There are two main approaches on the analysis of PM10 data in this research. The first approach is using the monthly average data where we use the time series approach to make a proper forecasting based on Holt's linear trend method. Second approach is based on EVT where the extreme data are extracted from the original data based on monthly maxima data.

Holt-Winters Method
Holt-Winters method consists in a simple yet effective forecasting procedure, based on exponential moving averages, covering both trend and seasonal models [5]. As mentioned earlier the Holt-Winters model assumes that the seasonal pattern is relatively constant over the time period. The exponential smoothing formulae applied to o a series with a trend and constant seasonal component using the Holt-Winters additive technique are: where: ,  and  are the smoothing parameters â t is the smoothed level at time t b t is the change in the trend at time t ŝ t is the seasonal smooth at time t p is the number of seasons per year

Extreme Value Theory
EVT provides a concrete theoretical groundwork on which statistical models for describing extreme events are properly set up. General discussions about EVT are about their conventional forms which are Gumbel, Frechet and Weibull that is unified into its general form generalized extreme value distribution (GEV) and the recent approach is based on generalized pareto distribution GPD which depends on the threshold value of the data. This paper analyse the data based on GEV model where the series of extreme data were extracted based on monthly maxima (block).

Generalized Extreme Value Distribution
The cumulative distribution function (cdf) and probability density function (pdf) of the GEV distribution is given by (1) and (2) respectively. The GEV model has three parameters, which refers to location, scale and shape parameter respectively. The  value determines the type of GEV distribution.
Earlier works on EVT take on one of the three distributions and subsequently estimate the corresponding parameters. According to Coles [6], there are two weaknesses regarding this issue. First, a technique is required to choose the most appropriate distribution for the data analyzed and second, the inferences are made with the assumption that the choice is correct. Therefore, a better analysis could be done using GEV where the value of shape parameter,  itself will determine the most suitable tail behavior of the data.

Return Level
Application of EVT in air quality studies are concerned about how well the mathematical theory can be applied to further answer questions relating to the probability that pollutant concentration will exceed a certain level in a given period which refers to the return level. Awareness of the return levels of extreme air pollution events could benefit the development of air pollution risk management practices. This motivates the need to estimate the most terrible PM10 level that will be occur over a certain period in the future.
Return levels in extremes explain the value of extreme events that occur on average once in a given period. For example, what is the PM10 level that will be exceeded on average once in the next 100 years? It is convenient to interpret extreme value models in terms of quantiles or return levels rather than individual parameter values [8]. The return level for GEV with return period 1/p is defined by zp.

Data Description
The air quality level in Malaysia is described in terms of Air Pollutant Index (API). API is an indicator of the air quality and is developed based on scientific assessment to indicate in a manner that can be easily understood, the presence of pollutants and its impact on health. The API scale and terms used in describing air quality levels are categorized as in Table 1. In extreme value analysis, most of the data considered range between moderate to very unhealthy level for which necessary actions need to be taken. The CAQM measures concentrations of five major pollutants in the ambient air, namely, PM10, sulphur dioxide (SO2), nitrogen dioxide (NO2), carbon monoxide (CO), and ozone (O3). PM10 is used to describe aerosol particles with diameter of less than m  10 for solids or liquids found suspended in the atmosphere.
The API is calculated based on the average daily concentrations of SO2, NO2, CO, O3 and PM10. Dominant air pollutant with the highest concentration considered as pollutants that will determine the IPU. Typically, concentration of PM10 is the highest compared to the pollutant others, and this determines the API readings. PM10 concentration is related to gases and particulates which are expected to originate mostly from industrial and vehicle emissions and also from some transboundary pollutions involving Malaysia. The three major sources of air pollutions especially in urban areas are mobile (motor vehicles), stationary (power stations, industrial fuel burning process and domestic fuel burning) and the burning of municipal and industrial waste. During haze periods, PM10 was found as the main pollutant while the other air quality parameters remained within permissible healthy standards. The extreme level of PM10 data is particularly due to the haze and biomass burning as well as industrial and vehicle emissions which usually contribute to high PM10 levels.

Sampling Site
The data collected is for Perlis region. Perlis is located in the north peninsular Malaysia and bounded with Thailand in the north and Kedah in the south with total area around 795 km2. Perlis is distributed to three parliament which is Kangar, Arau and Padang Besar and categorized as sub-urban area. Figure 1 shows the continuous air quality monitoring station (CAQMS) used to measure the air pollution located at Institut Latihan Perindustrian, Kangar. The device must be at 30 degree elevation from nearby building, which means that any taller building near it will interrupt the measurement to be recorded In Perlis however, source of pollution is probably mainly from open burning of paddy fields, in most part of Perlis. Other than that is due to the haze from neighbouring country, Indonesia.

PM10 Data
PM10 notation is used to describe aerosol particles with diameter less than 10μm in the form of solids or liquids found suspended in the atmosphere. Malaysia guideline for PM10 concentrations for 24-hours average and 12-months average are 150μg /m3 and 50μg /m3 respectively. In air quality control, PM10 is recognized as the most influencing atmospheric pollutant for air quality index in a majority of cities in Malaysia. The adverse effects of PM10 to human health and material damages are the main reasons for extensive explorations on the behaviour of this pollutant. Prolonged exposure to high concentrations of PM10 can be harmful to health especially on eye and throat irritations and respiratory problems among sensitive groups.

Results and Discussions
This section analyse the data based on the monthly average and monthly maxima PM10 data for year 2003 to 2014. Results were compare for both series using descriptive statistics and forecast value for year 2015 and 2016 based on Holt-Winters method with hyper parameters γ to 0 which apply the exponential smoothing with trend and without seasonal component Fig. 2. and Fig. 3. show the plot of monthly average temperature data with corresponding summary statistics in Table 2. In general both figures doesn't show any seasonal trend. The average data show obvious different on the level between January 2004 to January 2006 to the rest of the period. Transboundary haze is expected contributed to the higher PM10 recorded intermittently due to the land and forest fires in the Riau Province of Central Sumatera, Indonesia. Additionally, quite persistent pattern from July 2006 to July 2013. Most of the data are lied between 30 and 50 units. The means for average data is 42.12 while maxima data is 144.53 units. From Fig. 3., the obvious peaks are mostly happen in July especially for years 2005, 2008 and 2013. This is probably due to very hazy weather condition from forest fires in Sumatra. Haze is an annual problem during the monsoon season from May to September as winds blow the smoke across the Malacca Strait to Malaysia.  Inferences on the extremes of environmental events are essential as guidelines in designing structures in order to survive under the utmost extreme conditions. Extreme air pollutants caused various effects associated to human health and material damages. In many cases, the pollutants are responsible for huge impacts on economic performances. The EV theory is applied to model the extreme PM10 pollutant for three air monitoring stations in Johor. This study analyse the extreme PM10 data based on maximum likelihood estimation technique and hence estimate the return level. The estimated parameter for GEV model given in Table  3.

Forecasting
The monthly average data is forecast by using Holt-Winters method. The observed, fitted and predicted monthly average PM10 values are shown in Fig. 4 Table 4. Compare to GEV model, the return level for 24 months return period gives the return level value is 253.76 unit. This means that the highest value predicted using GEV model will reach very unhealthy level in two years period.  From the monthly average data, the PM10 data is fairly good except during periods of haze as indicated by its API. However, the average for monthly maxima series indicates that the pollutant in very unhealthy level. This obvious different is that the monthly average values explain the ordinary situation while the monthly maxima refer to the extreme cases only. Thus, the forecasting value for average data reflect the predicted values for the next two years period while the return level explain the maximum level that can be estimate in average for the next two years. Therefore, if our interest is on examining the extreme occurrence or catastrophic events, extreme value theory is the best methods that should be used.