Apply data mining to analyze the rainfall of landslide

. Taiwan is listed as extremely dangerous country which suffers from many disasters. The disasters from the landslide result in the loss of agricultural productions, life and property and so on. Many researchers concern about the disasters of landslide, but there are few discussions for the threshold of rainfall for landslide. In this paper, data mining is applied to establish rules and the threshold of rainfall for landslide in Huafan University, Taiwan. These used variables include rainfall, insolation, insolation rate, averaged humidity, averaged temperature, wind speed, and the tilt of inclinometer. The inclinometer is an important instrument for measuring tilt, elevation or depression of an object with respect to gravity. There are 26 inclinometers in Talun mountain area of Huafan University. In this research, the used data were collected from January 2008 to July 2014. In the proposed approach, the regression analysis is used to predict rainfall first. Then, decision tree is used to obtain decision rules and set the threshold of rainfall for landslide. The output of this approach can provide more information for understanding the change of rainfall. The threshold of rainfall could also provide useful information to maintain the security for Huafan University.


Introduction
The United Kingdom risk management consultant company, Verisk Maplecroft, announced that United States, Japan, China, and Taiwan were suffered extreme dangerous natural disasters in 2011 [1].These above countries are still suffering severe damages due to the impact of climate in 2015.The flood is one of the severe damages that concentrate in Asia, especially in India, Bangladesh, China, and Taiwan.These floods cause 75% risk of death in the world, and about annual 2.2 million people are affected by the effect of mountain landslide.Additionally, there are about annual 78 million people expose in the dangerous of tropical cyclone [2].Taiwan is listed as extremely dangerous country which suffers from many natural disasters.The rainfall of tropical cyclone is one of the most important issues in Taiwan, because it direct links to voluminous rainfalls [3].
Recently, there are many researches focus on rainfall.These researches include statistical approaches, ensemble, Bayesian model, and data mining [3][4][5][6][7][8][9][10][11][12].Above techniques provide useful information for the rainfall.Unfortunately, there are no information of inclinometer and few discussions for the threshold of rainfall for landslide.The inclinometer is an important instrument for measuring tilt, elevation or depression of an object with respect to gravity.It provides the direct MATEC Web of Conferences response to the settlement and displacement of various stratigraphic depth distributions via the slopes of the tilt tube [13].There are 26 inclinometers at Huafan University where is located on Talun mountain area of Shihding District in New Taipei City, Taiwan.Talun mountain area belongs to slope land.Because data mining techniques provide useful information for the rainfall of landslide [14], it is an important issue to analyze the rainfall of landslide and inclinometer data for Huafan University.In this paper, data mining is applied to analyze the rainfall of landslide for Huafan university.
The rest of this paper is organized as follows.Because regression analysis and decision tree play important roles in the proposed approach, section 2 provides the brief literature overview of multiple regression and decision tree.Section 3 describes the proposed approach.Section 4 outlines the simulation results.It also provides detailed comparisons.Finally, conclusions are drawn in the last section.

The brief description of multiple regression analysis and decision tree
In this paper, multiple regression analysis is used to provide the predict model of the rainfall of landslide.Furthermore, decision tree is used to find the threshold of rainfall.In this section, multiple regression analysis and decision tree are brief described first.
For multiple regression analysis, the data consist of m observations on a dependent variable Y and n independent variables,  1 ,  2 , ⋯ ,   , The relationship between Y and  1 ,  2 , ⋯ ,   is expressed as Eq. ( 1).
Where  0 ,  1 , ⋯ ,   are constants,  0 is referred to as the intercept. 1 ,  2 , ⋯ ,   represent the corresponding n regression coefficients, and  is the random error.Using matrix in Eq. ( 1), the model equation can be expressed as [15]  =  + The normal equation can be expressed as Where  ̃ is a vector of least squares estimates of .The solution to matrix equation is written as [16]  ̃= ( ′ ) −1  ′ Decision tree is based on the greedy algorithm that utilizes a divide-and-conquer strategy to recursively construct decision rules [17].It consists of the root node, internal nodes, branches, and leaves.Each decision tree represents a rule which categorizes data according to these attributes [18].A node specifies an attribute (feature) in the dataset.A branch connects either two nodes or one node and a leaf.Each node has a number of branches which are labeled as the possible value of attribute in the parent node.Leaves are labeled as the decision value of classification.The tree-like structure is composed of a root node, a set of internal nodes, and a set of leaf nodes.When applied to the set of train patterns, Info(S) measures the average amount of needed information to identify the class of the pattern S.
Where || is the number of cases in the training set.C j is a class for  = 1,2, ⋯ ,  where k is the number of classes and  ( ,  || ) is the number of cases used in C j .To consider the expected information value   () for attribute X to the partition S, it can be stated as: where n is the number of output for the attribute X, S j is a subset of S corresponding to the j th output and |  | is the number of cases of the subset S j .The information gain, Gain(X), according to attribute X is shown as: For constructing the decision tree, the root node will branch to child node first.The recursive processes will be continued to branch the child node of training datasets until each leaf node in the dataset contains only one class of patterns, or until there is no improvement [18].

The proposed approach
The rainfall of Huafan University can be divided into: (a) From December to next April, this dry season rains more days than other periods.However, the cumulative rainfall leads to less rainfall intensity.(b) Plum rain season is from May to June, and the typhoon season is from July to September.From September to November, it is also typhoon season and northeast monsoon.Due to the rainfall levels in typhoon season increased significantly, the intensity of rainfall rises to the top from July to September in one year.Occasional typhoons bring rainfall and the northeast monsoon rainfall from October to November but accumulated rainfall and average rainfall intensity become less.In this paper, the multiple regression analysis and decision tree are applied to analyze the rainfall of landslide in Huafan University.
In this study, these used variables include rainfall, insolation, insolation rate, averaged humidity, averaged temperature, wind speed, and the tilt of inclinometer.The dependent (target) variable is rainfall and others are independent variables.The proposed approach is based on multiple regression analysis and decision tree (DT).The pseudo code of the proposed approach is listed as follow.Procedure: Pre-process data Begin Apply multiple regression analysis; Apply decision tree; Output the multiple regression results, threshold of rainfall, and decision rules; End In pre-process data, the missing values in the inclinometer data are imputed by 0. It means that there are no tilts for these used inclinometers.In the proposed approach, the multiple regression analysis is first used to predict rainfall.After multiple regression analysis, decision tree uses Eq. ( 7) to generate decision rules.These obtained threshold of rainfall and decision rules are provided for decisionmaking.Finally, the proposed approach outputs the multiple regression results, threshold of rainfall, and decision rules for decision-making.

Results
There are 79 collected datasets and 33 used variables in this study.These collected datasets are from Jan. 2008 to Oct. 2014 in Huafan University.These variables include rainfall, insolation, insolation rate, averaged humidity, averaged temperature, wind speed, and others are inclinometers (C5~C32).First, multiple regression analysis is applied to predict the rainfall of landslide in Huafan University.After the process of multiple regression analysis, the predict model of multiple regression analysis is shown in Eq. ( 8).
The residual standard error of multiple regression analysis is 132.1029.Thereafter, decision tree is used to generate decision rules.There are total 7 decision rules generated in this study, and these generated decision rules are shown in Table 1.

121mm
In Table 1, these rainfalls for decision rules of No. 3, 4, 5, 6, and 7 are not so large enough, so the decision rules of No. 1 and 2 are set as the threshold.For decision rule of No. 2 (C7<7.025,Insolation Rate<12.75), its rainfall is 465 mm.Finally, the decision rule of No. 1 is set as the important threshold.These threshold values must take more attention to beware of disaster emergency and dangerous.However, it is safe when the rainfall is below 465mm.It needs to prepare all necessary plans for disaster emergency, when the rainfall is between 466 mm and 616 mm.Moreover, it is dangerous as the rainfall is greater than 617 mm.

Conclusions
In this paper, multiple regression analysis is applied to provide the model for the predict rainfall of landslide in Huafan University.Decision tree is used to obtain decision rules and set the threshold of rainfall for landslide.There are seven decision rules for the rainfall of landslide in Huafan University.From decision rules, the threshold of rainfall is set as 465 mm and 616 mm.It could provide useful information to maintain the security for Huafan University.

Table 1 .
The decision rule of rainfall.