Statistical modeling of particle mater air pollutants in the city of Ruse , Bulgaria

This paper is devoted to examine the PM10 and PM2.5 pollution in Ruse region, Bulgaria. It is a continuation of our previous work [1] where we presented a statistical analysis and modelling of the level of PM10 air pollution in Ruse using data from one of two monitoring stations (station 2) in the city. Now in this paper we present statistical analysis of the level of PM10 pollutant on the basis of data from the another monitoring station (station 1). The measurements cover the period since 2015 up to now. The results from analysis and modelling of PM2.5 air pollutant are also presented and commented in the paper.


Introduction
Ruse is a city in the north-eastern part of Bulgaria on the Danube river coast, which is the north border between Bulgaria and Romania.The climate of the region of Ruse is temperate continental, characterized by cold winters and dry, warm summers.Spring and autumn are short.Recently it becomes clear that the mean values of the temperature in Ruse region are slightly goes up for the last 40 years and they are bigger than the mean temperature for Bulgaria [6].Air pollution especially by particle matter (PM10 and PM2.5), which in Ruse region is going up recently, maybe affects temperature, atmospheric pressure and humidity in the region.All this in our opinion may be a reason for the increase in average temperatures nowadays for Ruse region.That is why studies on the air pollution by particle matter is needed for better understanding of these processes and for taking adequate measures to protect environment.

Material and methods
In the town of Ruse, Bulgaria the regular measurement of air pollution starts on 24.10.2015[2,3].It is carried out by an automatic measuring station Vazrazhdane (station 1) and by a stationary station in the Zdravets -east district of the town (station 2).Since 30.09.2016 measurements are performed also by a mobile automatic station.The two stationary stations are located about 2.20 kilometers apart (the distance between them is measured by the Google Earth program) (Fig. 1).
A map of the location of the stations is published on the site [3] of the Regional Inspectorate of Environment and Water in Ruse, which shows that each one covers an area with a radius of about 1,7 km.
For this study we use official data for particle matter air pollutant PM10 from stations 1 and 2 and for PM2.5 -from station 1 [3].
We use SPSS software to analyze and model data from the measurement stations in Ruse.
The norm for PM2.5 is average annual rate under 25(µg/m 3 ).The average annual rate of PM2.5 for 2016 is 23.52, i.e. within the norm.For the one-year period 01.08.2016 -31.07.2017 that includes the last 365 days of the sample it is 26.05, i.e. above the norm.
Four of the values of PM10 and two of the measurements of PM2.5 in august 2016 are suspiciously recorded as zero.We treated them as missing as they probably are.
As it can be seen from Figure 2, the values of PM10, measured in Station 1 (blue line), Station 2 (red line) and PM2.5 (green line) in Ruse, are highly and positively correlated.The Pearson product-moment correlation coefficients for the three variables: PM10 at Station 1 (PM10_s1), PM2.5 (measured only at Station 1, PM2_5) and PM10 at Station 2 (PM10_s2) are given in Table 2.All are statistically significant (p < 0.0005).The distributions of PM10 at Stations 1 and PM2.5 are probably unimodal as can be seen in Figure 3.The Kolmogorov -Smirnov and Shapiro -Wilk tests for normality are both significant (p < 0.0005) so we have to reject the hypotheses of normality of the distributions of PM10 and PM2.5.

Analysis and modelling of PM10 values measured at Station 1
In this section we will look in more detail at the values of the dangerous air pollutant PM10 (µg/m 3 ) measured in Station 1 in Ruse for the period 24.10.2015-31.07.2017.The study covers 647 days.Values for 584 days were measured, i.e. for 90.26% of the days, and the values for 63, i.e. for 9.74% of them are missing.
The sample quartiles are Q1=27.10;Q2=35.70 and Q3=50.75.All 41 outliers lie above 86.225,i.e. 1.5*(Q3-Q1) from the third quartile Q3; 14 of them can be classified as extreme -they exceed 121.70, i.e. 3*(Q3-Q1) from the third quartile Q3.The outliers are distributed only in winter months: 21 in January (8 extreme), 9 in February (3 extreme), 6 in November (3 extreme) and 5 in December (0 extreme).Almost all values above the norm, in particular 118 out of 146, are measured in the winter months: 32 in January, 24 in February, 10 in March, 21 in November and 31 in December.In fact, 57.14% of the valid measurements in January are above the norm.For February, November and December these percentages are 42.11%,35% and 53.45% respectively.The monthly averages of PM10 are shown on Figure 4.Although seasonality is clearly expressed, the observation period is not long enough to examine seasonality with respect to the months of the year.
Figure 5 presents the mean values of PM10 for weekdays.The analysis of variance performed did not show statistically significant (p = 0.94) differences in the means over the days of the week.We will model the time series of values of PM10 at Station 1 using the method of ARIMA (Auto-Regressive Integrated Moving Average) as one of the better known shortterm forecasting methods (see [4,5]).We replace the 63 missing values using linear interpolation.Because in the future it is expected new extreme values to be observed, it is not appropriate to exclude the 41 outliers from the study.
The best reported model using Expert modeler mode in Forecasting procedure (Create Traditional Models) of SPSS gives the following autoregressive ARIMA(1,0,2)( and run again the Expert modeler, which gives a model without outliers, but having identical characteristics with Model 1, i.e. still not adequate.Undoubtedly, the natural logarithm transformation is necessary, because the variance of the time series is obviously non-stationary (see Fig. 3).Let us denote by Zt, Zt = ln(Yt), the transformed series of the original data Yt.The new series Zt are shown on Figure 6, it has already stationary variance.
The autocorrelation (ACF) and partial autocorrelation (PACF) functions of Zt are given on Figure 7.The autoregressive part of the model, i.e. the AR component is probably 1, since the PACF "cuts off" at lag 1.In [1] we show that the best model for PM10, measured at Station 2 in the period 24.10.2015-25.05.2017, is the AR(1) model for the natural logarithm transformed series Z't of the original data, 1 3.544 0.749 Analyzing the charts of the ACF and PACF of the residuals of this model (Figure 8), we see that at lags 2, 10 and 14 they are outside of the defined limits.The graphics of ACF on Figure 7 shows that the autocorrelations begin to decay toward zero only after the lag 13 (13=14-1).Based on these considerations, it can be assumed that a model ARIMA(1,0,14) with MA terms only at lags 2, 10 and 14 could be appropriate for describing the time series Zt.Really, when we start Expert modeler again, but now with the option to consider only no seasonal models, it reports this model as the best one.
The obtained Model 4 has the same number of estimated parameters as Model 1, but is already adequate.It has better characteristics than Model 1 at all points.To get rid of the additive outlier on 05.01.2017 we use again the intervention variable I in (1) and get the following Model 5: The Model 5 has identical characteristics with Model 4, but it does not contain outliers.Stationary R-squared equals to 0.607 indicating that the model explains 61% of the variation in the transformed series Zt of PM10 at Station 1.The forecasting for the next 10 days is given in table 3 below.The root-mean-square error for the next 7 days is 2.95 and for the next 10 days is 6.35.
Figure 9 shows the plot of the observed and fitted values of PM10 by Model 5.It can be well seen that the model gives good results and can be used for prediction of PM10 pollution in Ruse.

Analysis and modelling of PM2.5 values
The best reported model using Expert modeler is the ARIMA(1,0,2) model: The forecast of Model 7 for the next 10 days is given in The root-mean-square error of the prediction for the available real values of PM2.5 (the values for the period 07 -09 august are missing) for Model 7 is 3,555 and for Model 8 -2,430.Figure 10

Conclusions
The most dangerous air pollutants (PM10 and PM2.5) for Ruse region in Bulgaria measured in the period 24.10.2015-25.05.2017 by two different measuring stations were statistically analyzed in the paper.The measurements of PM10 show that during winter time the concentration of PM10 is above the norm, reaching a maximum value of 217.30 (µg/m 3 ) at norm 50 (µg/m 3 ) for the average daily rate.Next to examine better these air pollutants we create ARIMA statistical models of PM10 and PM2.5 measured by station1.The values of PM10 and PM2.5 given by the models are in a good agreement with the measurements.
The main atmospheric characteristics for Ruse region (temperature, atmospheric pressure and humidity) measured for more than 40 year period were examined in the paper [6].For the period 1988 -2012 the average temperature for Ruse region is higher than the mean temperature for Bulgaria.At the same time in Ruse region it is observed that water of river Danube and the air pollution increased [7,8,9].Obviously the atmospheric characteristics temperature, atmospheric pressure and humidity are interrelated and maybe they are correlated with the air pollution.Especially strong is the air pollution during the autumn-winter season with particulate matter in the last decade [1].The reasons for this are not only transportation and industry but the use of solid fuel heating.Air pollution maybe affects temperature, atmospheric pressure and humidity and this influence may cause the climate change in the Ruse region.This influence has to be studied in details further.This paper contains results of the work on project No 2017 -FNSE -05 and No 2017 -AIF -3 financed by "Scientific Research" Fund of Ruse University.

Fig 2 .
Fig 2 .Locations of the atmospheric air monitoring stations in the town of Ruse, Bulgaria [3]

Fig. 6
Fig. 6 Time series plot of Zt

Fig. 9
Fig. 9 Observed (red line) and fitted values (blue line) of PM10 at Station 1 shows the plot of the observed and fitted values of PM2.5 by Model 8.

Table 1 .
Descriptive statistics for pollutants monitored at Station 1

Table 2 .
Correlations between the three factors out of norm in Ruse Box test for randomness of the residuals we have evidence to conclude that the fitted Model 1 is not adequate.The model determines the observation on 05.01.2017 as an additive outlier, i.e. it corresponds to a surprisingly large (in our case) or small value, but the subsequent observations are unaffected by it.We introduce in the model one intervention variable I, MATEC Web of Conferences 145, 01010 (2018) https://doi.org/10.1051/matecconf/201814501010NCTAM 2017  is the error term at time t for the transformed series Zt of PM10 at Station 1.The model is not adequate.It identifies the observation on 05.01.2017 as an Additive outlier.
where t   is a shock, innovation, or random error at time t .This model has 3 outliers: Innovational at 12.01.2016,Additive at 15.12.2016 and Level Shift at 07.04.2017.We used three intervention variables The model is adequate since the Ljung-Box statistics is not significant at alpha = 0.05.Four of the observations on PM2.5 are identified as outliers: Transient at 1