Characteristic and Prediction of Carbon Monoxide Concentration using Time Series Analysis in Selected Urban Area in Malaysia

Carbon monoxide (CO) is a poisonous, colorless, odourless and tasteless gas. The main source of carbon monoxide is from motor vehicles and carbon monoxide levels in residential areas closely reflect the traffic density. Prediction of carbon monoxide is important to give an early warning to sufferer of respiratory problems and also can help the related authorities to be more prepared to prevent and take suitable action to overcome the problem. This research was carried out using secondary data from Department of Environment Malaysia from 2013 to 2014. The main objectives of this research is to understand the characteristic of CO concentration and also to find the most suitable time series model to predict the CO concentration in Bachang, Melaka and Kuala Terengganu. Based on the lowest AIC value and several error measure, the results show that ARMA (1,1) is the most appropriate model to predict CO concentration level in Bachang, Melaka while ARMA (1,2) is the most suitable model with smallest error to predict the CO concentration level for residential area in Kuala Terengganu.


Introduction
Air pollution is one of the important issues to deal with.Air pollution that happened today is from the result of a natural cycle phenomenon in which a very large impact on health in particular and economic development [1].In Malaysia, Department of Environment (DoE) is responsible in monitoring the level of air pollutants and there are 52 monitoring locations belong to Department of Environment.There are 5 main pollutant that are significant and concerned by Department of Environmental Malaysia which is ground level ozone (O 3 ), carbon monoxide (CO), nitrogen dioxide (NO 2 ), sulfur dioxide (SO 2 ), and fine dust with aerodynamic diameter less than 10 micrometer (PM 10 ).
Carbon monoxide is one of the main pollutants in residential area since the main source of carbon monoxide is from motor vehicles [2].Generally, Carbon Monoxide (CO) is a byproduct of incomplete combustion of Hydrocarbon Fuels.Two kinds of health effects are attributed to CO, very long term and short term.Long periods of exposure to high concentration of CO can raise the level of carboxyhemoglobin in the blood.Due to the high level of carboxyhemoglobin in blood, blood loses the capability to absorb oxygen.Due to this the amount of oxygen which is carried by blood to different parts of body is reduced [3].Short periods of exposure to very high concentrations of CO can precipitate or hasten the death of individuals with severely impaired heart or respiratory functions.This hazardous particular substances can harm us by disturb our healthy body system.Every part of our body needs oxygen.Air is made up of approximately 20.9% oxygen and 79% nitrogen.The hemoglobin within our blood carries oxygen to every cell in our body.The hippocampus is the part of the brain that is involved in memory forming, organizing and storing.It is a limbic system structure that is particularly important in forming new memories and connecting emotions and senses, such as smell and sound, to memories [3].
Dangerous of carbon monoxide worsening because it can causes health effects to patients suffering respiratory problems.This is due it prevents the body from using oxygen as well as potential cancer patients [4].Therefore prediction of carbon monoxide is important, especially as initial steps by give an early warning sufferers of respiratory problems such as asthma, wheezing as a precaution as well as early warning for us or the related authorities to be more prepared to prevent and take suitable action to overcome the problem.The main focus in this research is to understand the characteristic of CO concentration and also to find the most suitable time series model to predict the CO concentration in Bachang, Melaka and Kuala Terengganu.

Data and study area
Bachang and Kuala Terengganu are categorised by Department of Environment Malaysia as an urban area located in Melaka and Terengganu, respectively.Previously, this two monitoring station was classified as a residential area [5].Geographically, Melaka state is situated on the western coast of peninsular Malaysia with the coordinates is latitude 02° 12.78'N of the equator and longitude 102° 14.05'E of the prime meridian.Terengganu state is situated on the northwest coast of peninsular Malaysia.The coordinates is 05 o 20.23'N latitude and 103° 09.45'E longitude.Hourly average of CO concentration in this two location for 2013 and 2014 was obtained from the Department of Environment Malaysia and was measured in part per million (ppm) using carbon monoxide gas detector at the continuous air quality monitorind stations.Monitoring data was computed by averaging direct measurement from the monitoring sites on a yearly basis and cross-reference with Malaysia Ambient Air Quality Guidelines as shown in Table 1.
where 0 α is drift components and t e is error term [6].The hypothesis to determine the stationary of the data is as follows: H o : the time series data is non-stationary H 1 : the time series data is stationary The null hypothesis will be rejected if the ADF value is greater than the critical value.

Model Identification and Estimation
Each time series model were identify part by part starting from Moving Average (MA), Autoregressive (AR), and Autoregressive Moving Average (ARMA).In this study Akaike Infomation Criterion was used as medium of lag length which is to identify the compatibility of error from those process come out either too small or large of error.In estimation period, each model is directly involved and process of its own equation.Their combination will create a new value that will be used in the next stage of analysis.Among the time series model used is : i.
Moving Average (MA) Process ii.
Autoregressive Moving Average (ARMA) or Autoregressive Integrated Moving Average (ARIMA) Process The moving average process of order q is denoted MA(q) and defined by where θ 1 , . . ., θ q are fixed constants, θ 0 = 1, and { } t ∈ is a sequence of independent (or uncorrelated) random variables with mean 0 and variance σ 2 [7].
The autoregressive process of order p is denoted AR(p), and defined by the following Equation [7].
where φ 1 , . . ., φ r are fixed constants and { } t ∈ is a sequence of independent (or uncorrelated) random variables with mean 0 and variance σ 2 .
The autoregressive moving average process, ARMA(p, q), is combination of AR(p) model and MA(q) model which is can defined by

Validation
To judge the developed model, three error measures were used.The root mean square error method is the most common indicator.For a good estimator, the RMSE value must approach zero.Therefore, a smaller RMSE value means that the model is more appropriate by [8].It can be defined by : The normalized absolute error is a more sensitive measure of residual error than RMSE [8].It is defined as Mean absolute percentage error is a relative error measure that uses absolute values to keep the positive and negative errors from canceling one another out and uses relative errors to enable you to compare forecast accuracy between time-series models.The formula for calculating the MAPE is as follows : ( ) where N is the number of monitoring records, i O is the observed monitoring records and i P is the predicted monitoring records [9].

Results and discussions
This section will discuss the results that have been obtained from the analysis of CO concentrations in Bachang, Melaka and Kuala Terengganu.Time series plot and descriptive statistics are used to get a general overview and compare the level of CO concentration at both monitoring stations before carried out the procedure for prediction.

Time Series Plot and Descriptive Statistics
Acquisition data for CO concentration level in Bachang, Melaka and Kuala Terengganu show respectively the time series plotting.Fig. 1 and Fig. 2 show the time series plot for Bachang and Kuala Terengganu respectively.Missing values were replaced using the mean top bottom method.Descriptive statistics for both monitoring site are shown in Table 2.
Basically, the mean and mode of CO concentration at both monitoring sites are almost the   For the validation, 14 days ahead was predicted and the performance indicators for the time series model are summarized in Table 3.For the purpose of seeing more clearly, the 14 days observed and predicted of CO concentration using the most appropriate time series model were plotted in Fig. 3 and Fig. 4 for CO concentration in Bachang and Kuala Terengganu respectively.

Conclusion
Set of secondary data from DOE were analyzed by varying the descriptive statistics and the result shows that mean value of CO concentration in Bachang, Melaka is slightly lower than Kuala Terengganu but for the skewness of CO concentration show, both urban areas have a positive values, indicated that data were skewed to the right which is means most of data are lower than median.To predict the CO concentration level using time series analysis by choosing the appropriate model for both monitoring station located in urban area, only autoregressive (AR), moving average (MA) and autoregressive moving average (ARMA) were considered since both sets of data are stationary.The quality and reliability of the developed models were validated using AIC and three performance indicators (NAE, RMSE, MAPE) and the result of this study show that ARMA (1,1) is the most suitable model to predict the CO concentration for urban area in Bachang, Melaka.Similarly, another area which is for Kuala Terengganu, ARMA (1, 2) was best fitted to predict the CO concentration level.
set of data are positively skewed, meaning that most of recorded data are lower than median.

Table 1 .
Malaysia Ambient Air Quality Guideline for Carbon Monoxide 3.1 StationarityUnit root test which is Augmented Dickey-Fuller (ADF) test was used to test the stationarity of data since this is the most commonly used test in time series analysis to determine the stationarity of any data set.The ADF test uses the following Equation (1): is white noise.This process is stationary for appropriate φ , θ .

Table 2 .
Characteristics of CO concentration level in Bacang and Kuala Terengganu Results from Augmented Dickey Fuller test shows that the monitoring record of CO concentration from 2013 to 2014 are stationary for both monitoring stations with ADF statistic for Bachang, Melaka is -4.562 and ADF statistic for Kuala Terengganu is -4.106.Since both p-value are less than 0.05, the time series data can be concluded as stationary.Autoregressive model (AR model), Moving Average model (MA model) and Autoregressive Moving Average model (ARMA model) were considered to find the most model for these set of data by comparing the Akaike Info Criterion (AIC) statistic values.The small AIC statistic values indicate the most appropriate model with smallest error.For Bachang, Melaka, the most suitable model to make the prediction with the smallest AIC statistic value is ARMA (1, 1) where the AIC statistic value is -1.2462 while for Kuala Terengganu, ARMA (1,2) is the best choice based on the smallest AIC statistic value, -1.5151 compared to others.Time series equation based on the best time series model for Bachang, Melaka is as follows: For Kuala Terengganu, the time series equation based on the best time series model is appropriate