High Risk Flash Flood Rainstorm Area Mapping And Its Application in Jiangxi Province, China

The leading hydrologists around the world have been working hard to develop some kind of preventive measures to reduce the disastrous consequences of a flash flood in advance. For this purpose, a flash flood early-warning and forecasting system that can accurately and timely forecast an coming flash flood has being the research focus in this field, despite its difficulties and complexities. An ideal to specify those areas that are subject at high risk to flash flood in terms of precipitation intensity in a relatively large region is proposed in this paper. It is accomplished through the design of the High Risk Flash Flood Rainstorm Area (HRFFRA) with a certain return period for a given duration based on the application of the end-to-end Regional L-moments Approach to precipitation frequency analysis. A HRFFRA is defined as the area potentially under hitting by higher intense-precipitation for a given duration with certain return period that may cause a flash flood disaster in the area. An example to develop the HRFFRA has been demonstrated in detail in this paper through the application of the Regional L-Moments Approach to precipitation frequency analysis in Jiangxi Province, South China Mainland. The high risk areas that will be hit by an forthcoming flash flood can be visually showed by the HRFFRA, with its help, hydrologists and governments can substantially reduce the disastrous outcome of a flash flood beforehand.


INTRODUCTION
A flash flood is a flood that rises and falls quite rapidly with little or no advance warning, usually caused by intense rainfall over a relatively small area. Flash flood and herein induced muddy flow is one of the major rainstorm-related disasters in China. The worst flash flood with muddy flow, caused by extremely heavy rainfall with soil erosion, occurred on 8/8/2010 in Zhouqu, Gansu Province, took 1467 lives with 198 missing. The outstanding flash flood with muddy flow caused by record heavy rainfall during the Molakot Typhoon on 8/8/2009, suffered in xiao-lin-zhuang, Taiwan, took more than 460 lives. The flash flood may also occur after the collapse of a human structure, for instance, the failure of the South Fork Dam located on the Little Conemaugh River 14 miles upstream of the town Johnstown, Pennsylvania, burst through on 5/31/1889, killed more than 2,200 people and caused US$17 million of damage after several days of extremely heavy rainfall. The most recent catastrophe caused by flash flood across the world occurred in the town Kedarnath, India. With 5000 people approximately dead, it suffered extensive destruction from flash floods triggered by torrential rains during June 2013.
Changes in meteorological conditions and the utilization pattern of land have increased floods frequency across the globe [6]. Moreover, a research of Mazzarana et al. has found that extreme rainfall events which causes extreme flood events will become more and more frequent across the world in the future [7]. That means we would face a more frequent threats of flash flood. The prevention of flash flood is and will be a tough challenge across the world. Especially for China, since hilly areas and mountains covers as much as two thirds of its land surface [11]. Since the 1980s, researchers in China began to study Flood hazard mapping, but it was not until recently that flash flood hazard mapping was paid attention. It is of significant importance to map and manage the natural hazard causing by flash floods.
The development of the HRFFRA based on statistical analysis of historical rainfall data, synoptic analysis of prevailing storm rainfalls as well as the field survey of historical flash flood events is presented in the paper. A HRFFRA is defined as the area potentially under hitting by higher intense-precipitation for a given duration with certain return period that may cause a flash flood disaster in the area. A Regional L-moments Method (RLMM) is utilized to statistical analysis of rainfall series. Furthermore, the spatial interpolation scheme is used in combination of the contouring technique to form the HRFFRA.

METHOLOGY
The frequency analysis of extreme events is often limited by data availability at the desired temporal and spatial scales, while Regional Frequency Analysis (RFA), which involves ''trading time for space'' by pooling together data for stations with similar statistic characteristics, is an alternative approach for more accurate estimation of extreme events even in ungauged area or area with short records. RFA does not intend to provide regional estimates [5], it only helps us to get more precise sitespecific frequency estimations.

Regional L-Moments Analysis
• Probability distributions and parameter estimation. Frequency analysis employs a limited data sample to estimate its underlying population by selecting and parameterizing a probability distribution, which is uniquely characterized by a set of parameters. In hydrologic frequency analysis, the number of parameters of a plausible distribution ranges from 2 to 5. Threeparameter distributions, such as Generalized Logistic (GLO), Generalized Extreme Value (GEV), Generalized Normal (GNO), Generalized Pareto (GPA), and Pearson Type III (PE3), behave both relatively reliably and flexibly, are often selected to represent the data. Sometimes, the four-parameter Kappa and the fiveparameter Wakeby distributions are used. The parameters of a probability distribution have traditionally been estimated by the Conventional Moments Method (CMM) which can be expressed as: [ ] Here, μ is the mean of the sample data. The sample estimates based on CMM have some undesirable properties. The higher sample moments associated with skewness and kurtosis can be severely biased, and very sensitive or unstable to the presence of outliers in the data. If the data have a skewed distribution, the selection of the most fitted distribution could be unreliable and the quantile estimates are likely unreliable. Due to the linearity of its moment statistics, the L-moments Method (LMM) has become accepted as a more robust method for selecting and parameterizing fitted probability distribution functions. L-moments are expectations of certain linear combinations of order statistics [1]. Since it's a linear function of the order statistics, L-moments are more robust than conventional moments to the presence of outliers in the data, and provide more robust parameter estimates than the CMM. Letting X 1:n , X 2:n ,…X n:n be the order statistics of a random sample drawn from the population of X, the rth L-moment can be expressed as: The following example taken from the 743 rain gauges in Jiangxi Province, China, demonstrates how differently the sample coefficient of skewness, C s , and the sample coefficient of L-skewness, L-C s , behaves in terms of bias when they were used separately to estimate the population parameters. The at-site Cs and L-C s were computed and used as population parameters together with other parameter estimates to generate a plenty of synthetic data via M-C Simulation. C s and L-C s were calculated for each repeat of the simulation, then averaged over the repeats for each site. These averaged simulated C s and L-C s were separately compared with their original values obtained from the real data at each station. As shown in Figs. 1-1~1-2 L-C s behaves much better than Cs in terms of bias. Sample C s presents severe bias to its population Cs. The figures shown here used the GEV distribution. The same properties have been observed for other distributions.  In the original data set, station #30-6420 in Jiangxi Province contains an unusual outlier of 346 mm (7/03/2009) in 6-hour AMS compared with its average 86 mm. The CMM could not handle such an outlier to make a credible estimate of C s even with a synthetic data length of 500 years, while the LMM handles it very well. Figs. 2-1~2-2 provide a comparison when the GEV is fitted to the synthetic data. Similar results are obtained for other distributions. Clearly, L-C s estimated by L-moments is more robust to the outliers than C s estimated through CMM. In fact, the study showed that, when data size reached 100 years at all sites, the L-C s estimates are quite stable regardless of outliers present in the data.
• The index-flood procedure In the NWS updates of precipitation frequency, a Regional L-moments Analysis (RLMA) with a so-called index-flood procedure [2][3] was used. The basis of the index-flood procedure is the assumption that the frequency distributions at the stations in a homogenous region are identical apart from a site-specific scaling factor. The index-flood is a location estimator, usually the at-site sample mean i x . The stations are assumed to have similar climatology with respect to extreme precipitation. The stations in the predefined homogenous region share a common dimensionless frequency distribution. The frequency values Tj q at several desired return periods of the dimensionless regional distribution are called Regional Growth Factors. Then the site-specific quantiles ji T Q can be written as: ji * , 2,5, ,100,1000 , 0 Table2 illustrates that the RFA is better than the at-site analysis in terms of uncertainties of the quantiles. The comparison was done as follows: the stations with the longest record were selected from the 84 daily regions in the Ohio River Basin that were previously developed and tested to be homogenous for NOAA Atlas 14 Volume 2. Each selected site underwent frequency analysis through at-site analysis and RLMA. Then, a plenty of synthetic data were generated to investigate the variation of the quantiles under the two scenarios. The coefficient of variation, C v, was used as an index to describe the uncertainties. Table 2 demonstrates that the C v of quantiles in the RFA is much smaller than in the at-site analysis, proved that quantiles estimated through RFA are much more robust than through at-site analysis.

• Homogenous Regions
The key to RLMA is to construct "homogenous" regions, which assumes frequency distributions at different sites in the region are identical apart from a scale factor. This assumption is rarely valid in the real world, because it requires the same statistics at different sites. However, an acceptable threshold of variation for these statistics can be established to measure the homogeneity. Variation of the L-C v over stations in a region, V L-Cv , is investigated via M-C simulation. The standardized V L-Cv , H1, serves as an index to measure the homogeneity.
Hosking suggests the region is "acceptably homogenous" if H 1 < 2 [3]. In our studies this criteria has been found to be appropriate. However, in precipitation frequency studies, we found that sole use of H1 was not enough. The effect of L-C s on the formation of homogenous regions was also considered. Attention was paid to those sites that have too high or too low values of L-C s in comparison with the regional averaged L-C s . Fig.3 shows the regionalization of 6-hour AMS in Jiangxi Province, with 17 homogeneous regions grouped. Note that the regionalization of the 6-hour data was done based on the 743 rain gauges of 20-years plus inside Jiangxi Province only without creation of a buffer zone surrounding it as data limitation for this study. • Goodness-of-fit Three tests were applied to select the best distribution in each region.
The M-C Simulation test. In Fig. 4, the relation curves of the L-C s vs. L-C k for common theoretical distributions were plotted. The averaged L-C s and L-C k generated through 1,000 M-C simulations, plotted as a point in the diagram is an example from the Ohio River Basin 24-hour data over 4,253 daily stations. The goodness-of-fit is then judged by the deviation from the mean point to the distributions in the L-C k scale. To account for sampling variability, the deviation is standardized, denoted as Z DIST . For a confidence level, say 90%, a distribution is acceptable if |Z DIST | ≤ 1.64. Among accepted distributions, the distribution with the smallest |Z DIST | is the most appropriate distribution [1]. Fig. 5 indicated GEV is the most appropriate distribution for the 4,253 stations as a whole in the Ohio River Basin.

Fig. 4. M-C simulation test for Ohio daily data stations
Root Mean Square Error (RMSE) of the sample Lmoments. RMSE is originated from the Four Criteria test [4] Unlike the M-C simulation test, which emphasizes the effect of mean on deviation, the sample L-Cs and L-C k at all sites are used to assess the variability. A weighted RMSE calculated for each of the plausible distributions serves as an index in the test. The distribution with the smallest RMSE is the most appropriate distribution. In eq.5, S i, L-Ck and D i, L-Ck are the sample L-C k and the distribution's L-C k , respectively. Real-data-check test [4] Comparing a quantile estimate from a fitted distribution at a given return period with the real data series, an empirical exceedance frequency, F i,Tj, to the quantiles can be calculated, which is then compared with its corresponding theoretical probability P Tj . A relative error, RE, can be calculated over several return periods to reflect the degree of the match between them. The smaller the RE is, the better the fitting will be. Due to sampling error, the RE for a single site or a few sites is meaningless. However, the regional averaged RE calculated over a number of sites is of statistical significance and can be used as an index for the goodness-of-fit.
A final decision can be made based on a summary of the above three tests. It should be done on a region-byregion basis. The final results provide a strong statistical basis for goodness-of-fit. Sensitivity tests are then used to ensure there is a smooth transition in terms of variation of the RGFs across regions. The best fitted distribution of 6hour data for each region in Jiangxi is given in Fig. 3 above.

Internal Consistency Check and Adjustment
The precipitation frequency is performed separately at each duration; 1-hr, 3-hr, …, 24-hr. Sometimes estimate curves for two adjacent durations may meet or cross at some frequency. This result, based on sound statistical analysis, is physically unreasonable. The causes of such an anomaly are primarily discontinuities in selection and parameterization of distributions between durations, data sampling variability, and the application of average conversion factors to convert 1-hr data to 60-min and to convert 1-day data to 24-hr for a specific site. Such anomalies, i.e. internal consistency violations, were removed by distributing the surplus of the ratio of the longer duration vs. the shorter duration of the previous frequency at a constant slope to the anomalous frequency and higher through 10,000-year plus, keeping the pattern of the ratios consistent until it converges at 1.0 after 10,000-year. An example the adjustment at site 30-9660 was shown in Table 2. And the frequency estimate curves for 1-hr, 3-hr and 6-hr before and after the adjustment for site 30-9660 was shown in fig. 5. It is clearly that after the adjustment the anomalies were missing.

Spatial Consistency Check and Adjustment
Due to lack of rainfall stations, there may be gradients appeared in estimated quantiles between regions. An appropriate spatial adjustment to reasonably smooth the quantiles over regions is required. A two-way adjustment called "Two-time back-forth interpolation" scheme is proposed: firstly to develop a computation grid net to cover the study area, with a resolution equivalent in density to the rainfall stations available in the area, as shown in Fig. 6, secondary to spatially interpolate the quantiles at stations into each grids as pseudo stations, thirdly to interpolate the quantiles at grids back to the stations. A study showed that the RE between the empirical frequencies and the quantiles is reduced after the spatial adjustment on average over the entire study area. Figs. 7-1~7-2 gave a general view of the results for the spatial consistency check and adjustment of the 6-hour data. The quantiles in Fig. 7-2 looks more smooth spatially.

HIGH RISK FLASH FLOOD RAINSTORM MAPPING
Based on the results of quantiles for various durations such as 1-hour, 3-hour, 6-hour over a range of return periods from 1-yr, 2-yr to 100-yr, 200-yr up to 1,000-yr, an interactive visualization product, a frequency data server, has been developed to get spatial interpolation results for a given duration and for a desired return period. Figs. 8-1~8-2 are the two episodes as examples from the data server. Fig. 8-2 is a pop-up picture obtained instantaneously by moving the cursor showing in Fig. 8-1 as a red circle to a point inside the study area. This visual frequency data server is further employed to get the HRFFRA for a given duration and a desired return period.

CONCLUSIONS
Other aspects such as intersite dependency, confidence limits, ratios of PDS vs AMS have been extensively studied and developed for this study and omitted here for compactness. Due to the Difficulties and complexities exists in accurate and timely prediction of an incoming threatening flash flood, The HRFFA atlas with short durations, derived through spatial interpolation scheme and contouring technique, based on the quantiles calculated through RFA in combination with LMM, provides a scientific and practical foundation for early-warning and forecasting of flash hazard.