Application of Linear Moments and Uncertainty Analysis to Extreme Rainfall Events in Sabah

Linear moments (LM) has been applied in extreme rainfall study for several countries, including China, United States of America, and Peninsular Malaysia. In this study, the LM procedures were applied to extreme rainfall data corresponding to locations provided in Malaysia Urban Stromwater Manual (MSMA) to derive new design rainfalls. Different record lengths were considered to assess the changes in design rainfall, and Monte Carlo simulations were carried out to compute confidence interval of the derived design rainfalls. Based on the Goodnessof-Fit (GoF) test results, the Generalized Extreme Value (GEV) probability distribution was chosen to derive the design rainfalls. The updated design rainfalls at all four locations showed significant reduction at design rainfalls of 50-year ARI and above. The difference of the design rainfalls from shorter record lengths with respect to the full record length and the confidence intervals do not necessarily reduce with a longer record. In hypothetical cases where 100-yr ARI rainfall was added, the increase in design rainfalls did not exceed the upper bound of the confidence intervals. The derived confidence intervals hence allow for better risk assessment, and should be considered in the design of critical structures, i.e. dams.


Introduction
Design rainfalls are often applied in the design of hydraulic structures, such as monsoon drains and detention ponds. The design rainfalls for Sabah in both MSMA [1] and HP No. 26 [2] are currently based on rainfall data prior to 1980. The main aim of this study is to update the design rainfalls in Sabah using LM procedures. Secondary aims in this study are assessing the differences in the design rainfall and confidence intervals derived from different record lengths, and whether the derived confidence interval was sufficient to accommodate for the presence of outliers.

Literature Review
The first stage in extreme rainfall studies is the compilation of extreme rainfall events, using either the Annual Maxima (AM) or Peak-Over-Threshold (POT) approach. The POT approach ensures that other extreme rainfall events apart from the annual maxima are included [3,4], but suffers from the subjective selection of a suitable threshold [5,6] and potential correlation between extreme events [7]. In contrast, the AM approach is simple, hence remaining popular in extreme rainfall studies [7].
Common probability distributions considered in extreme rainfall studies include Gumbel, Generalized Extreme Value (GEV), Generalized Pareto (GPA), and Pearson III (P3). For the AM series, the GEV is the limiting distribution [8], with Gumbel being a special case of GEV. Similarly, the PoT series converge to GPA [9]. P3 has been considered as the most suitable distribution in several extreme rainfall studies [10,11]. The distributions are often fitted to the rainfall data by estimating the parameters via statistical procedures, such as the Maximum Likelihood Estimation and Linear Moments (LM).
The LM procedures were first published by Hosking [12], and are generally viewed as robust to outliers as there is no squaring and cubing of the variables involved [13,14]. The LM procedures produce only small bias in small samples [15]. The LM procedures have since been applied in several countries, including USA, China, and Malaysia [3,16,17,18,19]. The design rainfalls in Peninsular Malaysia were updated using the LM procedures [20].
GoF tests are used to assess the validity of the fitted distributions in representing the extreme rainfall events. The k-s test and AD tests perform better than the Chi-Squared [21], with the AD deemed as more powerful as it places more emphasis on tail end of the distribution [22]. The critical values for the k-s and AD tests change based on the theoretical probability distribution considered and number of parameters being estimated, with various works [23][24][25][26] published on this matter. The GoF has been by used to determine the best-fitting probability distribution in studies where multiple distributions are considered [27,28]. GoF test results should however be treated with caution, as the results depends not only on the PD choice, but also the fitting procedures [4].
In recent years, several extreme event deemed highly improbable based on standard design rainfall procedures had occurred, such as the 2003 storm at the Panama Canal [8], and the 1999 storm at Venezuela [29]. This indicates on the importance of allowing for uncertainty inherent in the derivation of design rainfalls to be incorporated via the construction of confidence intervals. The construction of confidence intervals would allow for the design risks to be properly assessed [12].

Method and Material
Rainfall data from four (4) stations highlighted in Fig. 1 were considered in this study due to close proximity to locations of design rainfall in MSMA [1], with record lengths listed in Table 1 considered. The 'Base' scenario represents the full data record available.  The maximum annual 1-hour rainfall was extracted from the existing rainfall records, with L-moments and ratios computed using equations by Hosking [30]. L-moment estimators were applied for probability distribution fitting, considering the Gumbel, GEV, P3, GPA, Normal, and Exponential (Exp) probability distributions. The k-s and AD tests were applied to assess the GoF of the fitted probability distributions for each scenario. The k-stest statistic, D*, and AD test statistic, A*, were computed using equations published by Stephens and others [23][24][25][26], as listed below: where n is the sample size, F(x) is the cumulative distribution function of the probability distribution, and x i is the ordered data.
For the purposes of GoF, the P3 distribution was treated an extreme value distribution. Critical values published by Stephens and others [23][24][25][26] were referred to in this study, and listed in Tables 2 and 3. The distribution with the lowest test statistic value was deemed as the best fit. The design rainfall at 10, 50, and 100-years ARI were then extracted from the best-fit probability distribution, with the 95% confidence intervals constructed using Monte Carlo procedures outlined by Silva et al. [31]. The relative difference of the design rainfalls in various scenarios relative to the 'Base' scenario and the root mean square error (RMSE) of the Monte Carlo simulations were subsequently calculated and assessed.
Normal Distribution: A* = (1 + 4/n + 25/n 2 ) Exponential Distribution: A* = (1 + 0.6/n) Gumbel, GEV, P3 Distributions: A* = (1 + 0.2/n 1/2 )  To further assess the confidence intervals, a 100-year ARI event derived from the fitted probability distribution was added to the original data series, and alternative 'outlier' design rainfalls derived at 10, 50, and 100-year ARI derived via re-fitting the same probability distribution based on the modified data series. The 'outlier' design rainfalls were then compared against the upper bound of the confidence intervals. All the procedures above were carried out using Microsoft Excel, a software capable of carrying out statistical analysis.

Goodness-of-Fit
A summary of the best fit for each scenario and station based on GoF tests is listed in Table 4, ranked in an ascending order. The GEV distribution provides the overall best performance in both k-s and AD tests, and was subsequently used in the derivation of the design rainfalls and confidence intervals. The Exp distribution was the worst performing, followed by the GPA distribution, with over half deemed as bad fits in the AD test. Although the Normal distribution shows good performance in the k-s test, its overall ranking falls behind both Gumbel and P3 in the AD test, indicating a lower fit for the Normal distribution at the tail ends. The P3 distribution slightly outperforms the Gumbel distribution, but has more bad fits in the AD test.

Design Rainfalls and Upper Bounds
The design rainfalls and the upper bounds based on the existing rainfall records are listed in columns 'Ori', and 'UB in Table 5. The 'outlier' design rainfalls based on the addition of the 100-yr ARI event into the data series is listed in column 'Out' in the same table. The derived design rainfalls are similar to the MSMA design rainfalls only at 10-yr ARI, becoming lower at higher ARI's of 50-year and above. The 'outlier' design rainfalls are higher due to the addition of the 100-year ARI event into the data series. The upper bounds were sufficient to cover for the 'outlier' design rainfalls at all rainfall stations.

Changes in Design Rainfall and Uncertainty
Changes in the design rainfall and the inherent uncertainty of the design rainfalls were quantified using the relative difference and RMSE, as listed in Table 6. The relative differences compared to the base scenario are generally not significant, with only several cases exceeding 10% at Inanam (scenario 3), and Kalabalakan (scenarios 1 & 2). There is no clear trend of reduction in both the relative difference and RMSE in relation to an increase in record length, with different trends occurring at each station. It may be surmised that longer record lengths do not necessarily lead to a reduction in design rainfall changes and its inherent uncertainty.

Conclusion
The updated design rainfalls allow for engineering structures to be designed accordingly with more recent data, extending the range to beyond 50-year ARI, which was previously not available in the existing design rainfalls. Derivation of design rainfalls using the Lmoments procedure may be applied to even record lengths of 10 years, with generally no significant difference to the design rainfalls derived at longer record lengths. Although the inherent uncertainty in design rainfall does not necessarily reduce with an increasing record length, the full record length should still be used as it is the best representation of the rainfall characteristics. The design rainfalls become much higher when outliers are added to the existing data series, highlighting the dangers of carrying out engineering design without considering the inherent uncertainty of the design rainfalls. It is recommended for confidence level bands to be constructed in the development of design rainfall curves. The upper bound of these design rainfalls may then be applied for critical engineering structures, such as dams.