The Preventive Maintenance of Highway Based on Data Mining

Judging from the current situation of Chinese highway maintenance,only after there is highway distress would the staff have a repair, that results in the poor efficiency of highway maintenance. In order to improve the efficiency of highway maintenance,this paper will use data mining technology to predict the pavement performance of highway and analyze the main factors of pavement performance attenuation,so that the preventive maintenance can be carried out.We will provide data support to the preventive maintenance of highway by using the isolation Forest anomaly detection algorithm to have a data pretreatment, the regression model and time series GM (1,1) model to predict the pavement performance and the association rule analysis and isolation Forest to analyze the main factors of pavement performance attenuation.


1.Introduction
In recent years, with the rapid development of information technology, the highway maintenance management has realized informatization.Technology status assessment of highway, traffic flow and other information have accumulated a large amount of data with the highway operation.On the other hand, our road maintenance management is still using traditional maintenance methods--passive maintenance, only if the pavement performance dropped to a level of fairly low will a series of repair measures be carried out.This kind of maintenance dose not make full use of our existing data information and it can just improve the pavement performance a little, what's more, the rate of performance decaying again is also very fast.That results in the extremely low efficiency of highway maintenance and the waste of maintenance funds.
According to these problems above, this paper will apply data mining technology to highway maintenance work to fully explore the potential value of technology data that already existed for the purpose of improving the efficiency of highway maintenance [1] .
In this paper, the iForest anomaly detection algorithm is used to preprocess the existed data of highway pavement performance.Then, we predict the pavement performance like as pavement damage index (PCI), riding quality index (RQI), road rutting depth index (RDI) and pavement comprehensive quality index (PQI ) by using the regression model and the time series model GM (1,1) [2] .Finally, it uses the association rule algorithm

2.Key technology of preventive maintenance 2.1IForest anomaly detection model
The iForest algorithm consists of a large number of binary trees, called isolation tree and iTree for short, which are the basis of the iForest algorithm and are constructed as follows.
Assuming that the data set has N records,randomly selecting ψ records from the data set as the training samples to build an iTree,usually sampling without replacement.
In the samples, an attribute q and a split value p is randomly selected within the range of all values of the attribute(between the minimum and maximum).Then the samples is divided into two parts by attribute q and split value p.The instance whose q is smaller than p will be divided into left subtree and the instance whose q is greater than or equal to p will be divided into right subtree.That results in a splitting condition and a data set on the left and right subtree.We recursively divide subtree according to the method above,until either:(i)the tree reaches a height limit,(ii)the subtree has only one node,(iii)all data in subtree have the same values.
After the amount of iTree reach to t, iForest is ready to work.We can evaluate the degree of anomaly of test instance x by using the generated isolation forest above, the process of testing as follows: (i) instance x traverses an iTree from the root node until the traversal is terminated at an external node,(ii)make the test data x in each iTree along the corresponding branch conditions to go down to the leaf node and calculate the height h (x) of the end node in the iTree,(iii)calculate the anomaly score of instance x using the following Equation(1).
where E(h(x)) is the average of h(x) from a collection of isolation trees.c(n) is the average height of the unsuccessful search in binary search tree, and can be calculate by the equation as follows: where H(i) is the harmonic number and it can be estimated by ln(i) + 0.5772156649.We can calculate the anomaly score by Equation( 1).Then we are able to make the following assessment: (a) if instances return s very close to 1, then they are definitely anomalies [3] , (b) if instances have s much smaller than 0.5, then they are quite safe to be regarded as normal instances [3] , (c) if all the instances return s ≈ 0.5, then the entire sample does not really have any distinct anomaly [3] .

2.2The model of regression prediction
Through research and practical observations, we find that the attenuation of pavement performance is not linear, and it is very slow in the beginning period of highway operation, but it will begin to drop sharply when the performance drops to a certain value.
According to this characteristic, we use the nonlinear regression equation shown in Equation(3) as the model [4] .
where PPI0 represents the initial value of the pavement performance (usually 100), α is the parameter which controls the time that the pavement performance decays to 63.2% of the initial value, and β is the parameter which controls the decay nature of the curve, t is the age of highway.Then Equation(3) can be transformed to the format as Equation ( 4): Then we can do variable substitution on Equation( 4 Equations [5] : After we get the values of a and b, we can estimate the values of α and β according to the equation as follow. then we can use the Equation(3) to estimate the future trend of highway pavement performance.

2.3The model of time series prediction
The factors that influence pavement performance are too many and uncertain [6] , so grey theory can be used to predict and analyze the pavement performance.And GM (1,1) is a more commonly used prediction model, with the modeling process as follows [7] .

Given
Then solve the differential Equation( 12), and take According to the relationship between X1 and X0,we can estimate the value of using the formula as follow:

3.Practical application of data mining in preventive maintenance
Taking the data set from a common trunk highway section of Hubei Province as an example, the regression  The comparison between the predicted value and the actual value of PCI and variation trend based on regression model are shown in Fig 1.

Fig 1.Comparison of regression prediction and actual values
Table 2 and Figure 1 show that the predictive values from regression model is basically consistent with the actual values of pavement condition index, and the max differences do not exceed 1%,futher, the PCI index is declining year by year.According to the current trend，it will be reduced to under 80 in the next year, so we have to take measures to have a preventive maintenance in advance, for the purpose of preventing further deterioration of road performance [8] .

3.2Pavement performance prediction based on time series model
In order to ensure the credibility of the prediction results, the time series GM (1,1) model is also used to predict the pavement performance indicators, so that the two predictions can be confirmed mutually.We still used the PCI values in Table 1 as an example to forecast PCI's future tend [9] .
According to It can be seen from Table 2 and Figure 2 that the predictive values from GM(1,1) model is basically consistent with the predictive values from regression model and it also quite similar with the actual value, the max differences do not exceed 1%,futher, the PCI index is declining year by year.According to the current trend， it will be reduced to under 80 in the next year, so we have to take measures to have a preventive maintenance in advance, for the purpose of preventing further deterioration of road performance.weget the same conclusion with regression model.

Conclusions
The combination of data mining technology and highway maintenance will bring great surprises to China's highway maintenance and management system, which will change our traditional passive maintenance mode into the period of active preventive maintenance.
As a result,we predict the future trend of the highway and the main factors of influencing pavement performance to find out the potential or possible highway distress in advance,so that we can locate the particular disease of highway and be well prepared before the highway pavement perormance reach to a bad status.
The research in this paper shows that the pavement performance predictive values based on the regression model and the time series model is consistent with the actual value and the error is very small.To a certain degree,it reflects the fact that the data mining technology can accurately predict the variation tendency of pavement performance and provide reliable data support for the highway preventive maintenance [10] .In short, Preventive maintenance of highway can greatly improve the efficiency of highway maintenance and save a lot of funds used in highway maintenance.
and iForest algorithm to analyze the main factors of pavement performance decay and have practical application with existed pavement performance data of MATEC Web of Conferences 139, 00084 (2017) Hubei Province.The pavement performance prediction results and the mining of main factors of highway performance decay can allow us to locate the target that needing maintenance and carry out preventive maintenance.Further,preventive maintenance can prevent or delay the rapid deterioration of highway diseases, effectively extend the road life and save a lot of road maintenance funds.
), we replace lnt with variable x and replace β with variable MATEC Web of Conferences 139, 00084 (2017) then the Equation(4) can express as y = ax + b.According to our historical pavement performance data PPI and t, we can get the corresponding(x, y), so we can estimate the parameters a and b through the least squares method as follows x1(2),x1(3),...,x1(n)} through the Equation(sequence of consecutive neighbors of X1,as Z1={z1(1),z1(2),...,z1(n)},by the Equation(10) as follow.z1=0.5x1(k)+0.5x1(k-1)(10) Then basic equation of the GM (1,1) model and the albinism differential equation are shown as follows: a, b are the parameters need to be estimated.We can use the least squares method to estimate the value of a and b by Equation(13)(14) as follows.
model and the GM (1,1) model are used to forecast the performance of the pavement.The pavement performance index from 2011 to 2016 as follows.

Table 1 .
The pavement performance index of one section and b according to Equation(5)and Equation(6)as a≈ 0.4668, b≈-1.5227;next,wecan estimate the values of α and β according to the relation between α, β and a,b.as a result that α≈26.0985,β≈0.4668;finally,thepredictive values of PCI index can be calculated according to Equation(3).The prediction results of PCI from 2012 to 2016 are shown in Table 2.

Table 2 .
Predictive values of PCI based on regression model

Table 3 .Table 3 .
Predictive values and differences based on GM(1,1) The comparison between the predicted value and the actual value of PCI and variation trend based on GM(1,1) model are shown in Fig 2.