Multisite daily precipitation simulation in Singapore

Stochastic precipitation simulation is of great importance for the design and operation of water infrastructures. The objective of this research is to develop a stochastic simulation method for daily precipitation. Daily precipitation generation needs special treatment because of many zero values appearing due to dry days. It is implemented for 26 rain gauge stations located in Singapore. This research follows three steps. First, a hidden autoregressive (AR) model is fitted to time series data at each gauging station using a power transformation. Zero precipitation amounts are treated as censored values of the power-transformed Gaussian process. The hidden AR has four parameters: mean, autocorrelation, power transformation, and variance of error. Second, a conditional multivariate Gaussian distribution is fitted to residuals of the AR models and used to fill in censored values corresponding to errors of the AR at dry events. Third, stochastic simulations from the created spatial-temporal model are carried out. Single and multi-site statistical characteristics such as empirical distribution function, cross-correlation coefficient and entropy are used for evaluation of the model. The results of this research show that the developed model produces synthetic precipitation amounts having statistical characteristics very similar to the observed ones.


Introduction
Stochastic precipitation simulation is of great importance for the design and operation of water infrastructure projects because precipitation is one of the key inputs of the hydrologic systems analysis which is required in every step of projects such as planning, design, operation, and monitoring.The basic idea of a precipitation generator is to reproduce long synthetic precipitation data series which preserve some important statistical characteristics of observed precipitation data not only at single station such as mean value, variances, median, minimum and maximum values and some important quantiles of rainfall amounts, but also at multiple locations such as interdependence between two stations at different locations, namely Pearson and Spearman rank correlations of precipitation values.Recently, the entropy-based measure is also used to evaluate the performance of precipitation simulations [2,6].It provides a measure of dispersion, uncertainty, disorder and diversification of precipitation [5].
Daily precipitation model development is more demanding and needs special treatment because daily precipitation has unique characteristics including zero-inflated data due to dry days.Unlike annual and monthly precipitation amounts which can be modelled with simple autoregressive moving average (ARMA) processes, daily precipitation amounts cannot be directly modelled by the ARMA model [2,3].The main difficulty with modelling daily precipitation amounts arises from the intermittent property of precipitation values both in space and time.Dry days with high probability due to zero-inflated data can be modelled by a discrete distribution and the rainfall amounts on rainy days at a selected location can be described by a continuous distribution [1], but the continuous distribution is usually skewed.
Two additional contrasting properties that make developing daily precipitation model more challenging are the long memory visible in the autocorrelation function and the sudden changes in the series [3].Long memory refers to a not negligible dependence between distant observations in a time series [7].Precipitation intensities itself follow exponential decay which indicates that precipitation series can be change rapidly.
The purpose of this study is to develop daily precipitation models at multiple locations incorporating both temporal and spatial dependencies.Zero-inflated data for dry days are treated as latent variables and estimated using the MCMC approach.The models are implemented in Singapore.The paper is divided into five sections.After the introduction, the spatial and temporal model developments are described in section 2. Section 3 presents the data and study location.Section 4 explains the simulation steps.The empirical findings of the investigation in Singapore are discussed in section 5.In the end, a summary of the results is given in section 6.

Temporal model
An autoregressive model (AR) with lag-1 is used to model mixed daily precipitation occurrence and amount simultaneously.The general equations for this model are defined as follows.
Where Y t is daily precipitation at time t, Y t-1 is daily precipitation at time t-1,  is the mean parameter, r is autocorrelation, and  t is the residual error of AR (1) process at time t.The residual error  t is independent, identic, and normally distributed (white noise process).By taking  t and putting on the left side and also putting Y t on the right side, the equation (1) can be written in a different formula.
  Since  t follows the normal distribution, the probability density function (PDF) of  t can be written as a simple normal distribution function.
Where  2 is the variance of the residual error  t ,  t is the random variable of residual error which is equal to Y t -(+r.Y t-1 ) and the mean is zero.By substitution  t with Y t -(+r.Y t-1 ), equation (3) can be written in a different formula.
Since the daily precipitation Y t follows the linear regression model AR (1) process, Y t also follows the Gaussian process due to the assumption that its errors are white noise processes.In fact, the daily precipitation values present a skewed distribution characterized by zeroinflated data.To model the zeroes in the AR (1) process, a latent variable approach is employed and zeroes are considered as censored values of realizations in the negative part of the distribution whereas non-zeroes precipitation values are considered the positive realizations of a random variable.For the true daily precipitation (Yt>0), a likelihood function of density function derived from equation ( 4) can be expressed as follows.
Since negative values of Y t are considered as censored values, the likelihood function is modified.The likelihood function is used not only for parameter estimation purpose but also for filling in censored values.
Y k corresponding to variables for which is just known Y t  0. These correspond to no rain observation.

Power transformed AR (1) model
Daily precipitation amounts follow the skewed process instead of the normal distribution [1].The skewed precipitation amounts are transformed to normality by using a beta  power transformation.This beta  power transformation is applied only to positive rainfall (Z t >0) values.In the following, rainfall values are represented by Z t, whereas Y t represents the underlying transformed into normalised AR.By considering the beta  power transformation, the transformed rainfall amount values are plugged into the AR (1) model.
The probability density function of each residual error  t as a function of daily precipitation Z t is changed into equation 8 and 9 due to the beta  power transformation.The likelihood function of equation 8 and 9 can be expressed as shown in equation 10.The likelihood function is used for estimating parameters on the basis of rainfall values Z t > 0 and latent variables Y t corresponding to Z t = 0.
Where N neg = {k 1 ,..,k J } corresponds to the times for which no rain was observed (Z k1 =…=Z kJ =0).This likelihood function is used both for the purpose of parameter estimation and also for filling in censored daily precipitation by including as arguments of the likelihood function of latent variables Y k1 ,…,Y kJ corresponding to dry days, Z k1 =Z kJ =0.

Spatial gaussian model
Residual errors obtained from times series model AR (1) are modelled spatially by spatial Gaussian distribution.A vector-valued random variable is said to have a multivariate normal (or Gaussian) distribution with mean vector  and covariance matrix  denoted by


, if its PDF is given by this following formula.
Where  is a d × d symmetric, positive definite of a covariance matrix,  is expected value, d is dimensional multivariate of Gaussian model, T is matrix transpose operator, and d is the multivariate dimensional.In order to estimate parameters  and , the likelihood function is used.Given an independent and identically-distributed (iid) sample of random vectors, the likelihood of the sample, assuming data is normally distributed, is given by the equation below.
Where n is the length of series data and  -1 is the inverse of the covariance matrix, also called a concentration matrix or precision.Zero mean vectors and covariance matrix  are used to model the residuals of the fitted individual AR (1) models using multivariate Gaussian distribution.These residual errors consist of censored errors since some of them come from the censored daily precipitation that is non-positive daily precipitation.In order to fill in the censored residual errors, conditional normal distribution is adopted.
Where X 1 is the censored residual errors, X 2 is the true residual errors,  1 is the mean value of the censored residual errors,  2 is mean value of true residual errors,  11 is variance of the censored residual errors,  12 is covariance matrix between the censored residual errors, and  21 is the covariance matrix between the true residual errors and the censored residual errors, and  22 the is variance of the true residual errors.

Data and study location
The models are implemented in Singapore using daily precipitation amounts which are simultaneously measured from 26 stations during the period of 1980-2010 (31 years).The stations cover a wide range of distances from 1.9 to 38.7 km as presented in Fig. 1.

Spatial model
Residual errors resulted from the marginal time series AR (1) models are modelled spatially using a multivariate Gaussian distribution with a mean vector of zeroes and covariance matrix .The residual errors include censored data coming from censored data in time series AR (1).Hence, conditional multivariate normal distribution is applied to impute censored residual errors based on the true residual errors.To estimate covariance matrix  as well as to fill in censored residual errors, the MCMC approach is implemented.The detailed steps for the fitting spatial model are illustrated as follows.
1.For each station, observed daily precipitation Z t data are transformed into normal variates Y t using estimated beta power transformation () as equation 7. 2. Simulated N number synthetic daily precipitation data using fitted temporal model AR (1) as equation 1. Variable N corresponds to the length of observed daily precipitation minus one data because it is taken as initial value Y t-1 in the first simulation (Y t=0 ). 3. If daily precipitation given at time t is observed positive (Z t >0), calculate the true residual errors ˆ using subtracting transformed observed daily data (step-1) with simulated synthetic data (step-2).4. If daily precipitation given time is observed non-positive (Z t 0), initiate the censored errors  ~.
5. Estimate parameters of multivariate normal distribution (the covariance matrix  and correlation matrix ) and impute the censored errors using MCMC approach with metropolis Hastings algorithm.

Model simulation
Based on the temporal and spatial models constructed, synthetic daily precipitation data can be generated.The detailed steps for developing simulation models are described below.

Results and discussion
To evaluate whether or not the model performs well, some measures such as empirical cumulative distribution function (ECDF), cross-correlation coefficient (CCC) and entropybased measure [2,4] are taken.A good model will reproduce synthetic daily precipitation series which have similar statistical key parameters characteristics to observed data.

Single site evaluation
Mean daily precipitation is a basic measure to be used for measuring information of daily precipitation generation model between simulated and observed daily precipitation.The mean is a measure of central tendency of data distribution.The developed model can synthetically reproduce daily precipitation very well in terms of mean value for all stations.Like the mean value, the standard deviation is also used to measure whether or not simulated daily precipitation resembles observed daily precipitation.The standard deviation is defined as the positive square root of the variance.The variance of a given set of data gives a measure of the variability of the data with respect to the mean.The model performs very well in that there are no significant differences between simulated and observed daily precipitations.
Cumulative relative frequencies both observed and simulated daily precipitation for all rain gauge stations are shown in Fig. 2. Overall, simulated ones can satisfactorily follow the behaviour of observed ones.Using the Kolmogorov Smirnov test with a significant level 95%, there is no significant difference between the two different datasets.But interestingly, for extreme values, some simulated cumulative relative frequencies tend to be located slightly above the observed cumulative relative frequencies, indicating an underestimation of the probability of very high rainfall values.It can occur due to the strange precipitation events that are influenced by many factors, for example, climate change.

Cross-correlation coefficient (CCC)-based evaluation
Spatial correlation exhibited by observed daily precipitation should be mimicked by the good multi-site daily precipitation generation model.Multi-site simulated daily precipitation should reveal similar two-dimensional cross-correlations of a pair of daily precipitation series data to the real data.The correlations measure the strength of association between two continuous variables.
In this study, three different types of CCCs, namely Pearson, Spearman, and Kendall cross-correlation, are applied to evaluate the multi-site precipitation generation model.The Pearson correlation is also called the linear correlation coefficient because it measures the linear association between two variables.The Pearson correlation is not resistant to outliers because it is computed using non-resistant measures, namely means and standard deviations.
In contrast, the Spearman correlation is a rank correlation.It depends only on the ranks of the data and not the values themselves.Thus, the Spearman correlation is more resistant to outliers than Pearson correlation.Fig. 3 reveals that there is only a slight difference for all CCCs between simulated and observed daily precipitation.It is found that simulated daily precipitation exhibits lower cross-correlations compared to cross-correlations observed in data.

Entropy-based measure
Traditional measures such as mean, variance, cross-correlations, and other conventional criterions are not enough to evaluate multi-site daily precipitation model.The entropy method to measure uncertainty information especially used to validate daily precipitation model independently is the breakthrough approach in any hydrologic application.Therefore, the entropy-based measure is also used for both observed and simulated daily

Fig. 2 .
Fig. 2. The empirical cumulative distribution function for both observed and simulated precipitation amounts.
1. Set up a number of values to simulate as N.For comparison purposes, this number should be the same as the observed values.2. The observed daily precipitation at time t=1 (Z t=1 ) is transformed into normal variate Y using estimated beta power transformation as equation 7.This value Y is used as starting value for simulation (Y t=0 ) . 3. Generate residual errors based on fitted multivariate Gaussian distribution.4. Simulate synthetic daily precipitation data using fitted time series model AR (1) at each station.5.If simulated daily precipitation at time t from step-4 is positive (Y t >0), transform this value Y t to the power fitted beta ().6.If simulated daily precipitation at time t from step-4 is non-positive (Y t 0), set up simulated daily precipitation to zero.7. Set up this simulated data Y t as previous value Y t-1 for the next step simulation.8. Repeat steps 3-7 until N number of simulations.