Research on Prediction Method of Freeway Operation Situation Based on Short-Term Traffic Flow Multi-Parameters Regression

. In order to realize the short-term prediction of traffic flow operation security situation, so as to enhance the operation safety of freeway, based on the traffic flow detection data and traffic accident data of Beijing section in Jing-ha freeway, the security situation prediction model based on short-term multi-parameters was established in the paper. Firstly, we extracted all the traffic flow data of microwave coils and videos, as well as the traffic accidents after updating the electromechanical system of Beijing section in Jing-ha freeway, so as to establish the risk prediction database, and developed the pre-analysis software of basic data; secondly, the traffic flow data of 30 minutes prior to the time of the accident were divided into six slices at 5-minute level, and the volume, speed, occupancy as well as their statistical parameters were selected as the chosen parameters of the model; next, single-parameter Logistic regression analysis was carried out in each time slice respectively; finally, based on the results of the correlation analysis of parameters and the single-parameter modeling significance, the multi-parameters Logistic regression model was established in each time slice respectively, thus obtaining the short-term prediction model of the traffic flow operation security situation. The results indicate that, the modeling fitting effect of slice 1 is the best, that is the change of the traffic flow parameters and their statistics in 5 minutes prior to the time of the accident can effectively predict the possibility of the accident, in which the average speed has a significant impact on the risk of the accident.


Introduction
Intelligent transportation system (ITS) is widely used in the global scope, which makes the traffic managers have large real-time traffic situation data. Many researchers and practitioners have been fully aware that all the advantages of ITS will not be recognized without realizing the ability of the traffic flow short-term prediction [1] (Brian L. Smith, 2002). The traffic flow prediction model can provide such a kind of ability, and can provide forward-looking traffic management as well as comprehensive travel information service.
At present, the object of traffic forecast is mainly urban road, and it focuses on the unblocked reliability and efficiency [2]- [7], including the assessment of traffic operation situation, short-term traffic forecast, road traffic condition judgment and so on, but the researches on the traffic flow operation security situation prediction are rare.
In China, due to the lack of detailed accident data and microscopic traffic flow data as support, resulting in a serious shortage in the real-time traffic flow security analysis theory, leading to the current freeway security management lag behind the real-time traffic situation prediction in our country. The United States and Canada began to carry out researches on the traffic accident detection algorithms and the traffic flow harbinger characteristics before the traffic accident from 1990s [8][9][10][11][12][13][14][15]. Among them, Chris adopted the speed differences between upstream and downstream, and the variances of the cross-section speed as the characterization factors of the traffic flow real-time risk discrimination [9], the results of which was referenced by Kansas state highway agency of USA in 2006. However, the main shortcomings were unknown of the risk reason and subjectivity of the risk rank assessment, and its scientificity needs to be further examined.
So, the prediction method of freeway operation situation based on short-term traffic flow multiparameters regression is researched in this paper, thus achieving the short-term prediction of the traffic flow operation situation, and the results are helpful to reduce the risk of accident, decrease traffic accidents, and improve the operation security of freeway.
1.32 kilometers, and the average spacing of detectors far away from Beijing direction is 1.53 kilometers. All the traffic flow data of microwave coils and videos, as well as the traffic accidents after updating the electromechanical system of Beijing section in Jing-ha freeway are extracted in this paper, in which, the traffic accidents include time, location, type, reason, and et. al., and the traffic flow data are speed, volume and occupancy of divided-lane at 1minute level.
First, we select the nearest detector data within 2 kilometers prior to or back to the location of accident, in order to screen data for first round, in which, the choice of accidents should avoid these caused by external factors such as weather, linear conditions, drivers and vehicles as much as possible, only in this way we can accurately excavate the rule of the traffic flow fluctuation affecting on the accidents, and in consideration that the relationship between the causes and effects of single-vehicle accidents and the traffic flow may not be strong under high service level, so the select of accident samples is more emphasis on the multi-vehicle accidents in large traffic volume (under C-classical service level); and then, we screen data for second round: check the quality of the traffic flow data, including deleting and processing the singular values such as speed with 0 km/h, diagnose the outliers by the data spatio-temporal graph method and the statistical method, and correct the abnormal data values by the simple difference method and the filtering method, in order to improve the accuracy and reliability of the analytical results, thus selecting the speed, volume and occupancy of detectors with better data quality to pair with accidents, and extract the traffic flow data of the control groups by 1:4 ratio.
The control group data meet the following requirements: the date is different with the corresponding accident; the time, week, and location are the same as the corresponding accident; the control group has no accidents at this location at the same day. Then, we select the control groups with better data quality according to the same method mentioned above, so as to establish the database required in this paper.
Because the location of traffic accident recorded by the police department is a cross-sectional stake number, therefore, it is necessary to aggregate the divided-lane traffic flow data into a cross-sectional traffic flow data, thus taking the cross-sectional data as the foundation of the research, so, the divided-lane data are transformed into the cross-sectional data by weighting method: respectively volume, occupancy, speed of divided-lane, n is the number of lanes.
We chose a tall building on the southwest corner of the intersection and used a video-based system to record traffic conditions. Then, we selected 150 typical crossing cases, including two straight-moving vehicles from the orthogonal direction, as shown in Fig. 1(b). To avoid interference from other vehicles to drivers' crossing behavior, we selected only the simplest crossing cases. In each case, one straight-moving vehicle encountered another straight-moving vehicle and no other object.

Logistic regression model
Binary Logistic regression model is commonly used to quantitatively analyze the impact of explanatory variables on binary dependent variables, also can be used to estimate the occurrence probability of a category of the dependent variables, and from the traffic flow operation results, the dependent variables just can be divided into two categories: accidents and non-accidents. The probability of the accident corresponding to one sample data is: The linear expression after logit transform is: where, ( ) i P x represents the probability of traffic accident; ' i x E represents the linear combination of explanatory variables: '

Logistic model testing
In Logistic regression, likelihood ratio test, Akaike information criterion (Akaike Information Criterion, AIC) and Schwarz criterion can be used to reflect the goodness of the model fitting.
We adopts AIC to reflect the fitting effect of the final model in this paper, that is 2 2( ) AIC LL K S , in which, K is the number of independent variables of the model, S is the total number of response variable categories minus 1, the range of 2LL is 0 to infinity, which is the smaller the better. Under the same conditions, the smaller value of AIC indicates the better model fitting.
The prediction accuracy for classification is usually used to reflect the prediction accuracy of the model. Using Logistic model to predict the classifications needs specifying a threshold of probability, that is, when the probability calculated by Logistic model is greater than a specified threshold, it is discriminated as the traffic accident, and when the probability is less than a specified threshold, it is discriminated as the security state with non-accident. The threshold value decides the forecast accuracy of each category and the total samples, the current researches commonly use the proportion of a DOI: 10.1051/ 0 (2016) matecconf/201 MATEC Web of Conferences category in the whole samples as the threshold value of this category prediction. Because of researching the prediction method of freeway traffic operation situation short-term in this paper, the proportion of accidents in the whole samples is adopted as the threshold value.

Data preparation
In order to predict the traffic accident in advance, the calibrated traffic flow data within 30 minutes prior to the accident is extracted in this paper, meanwhile, the traffic flow data of the control group in the corresponding period is extracted for each accident. Through screening of data quality, we ultimately retain 112 groups of accident samples and 448 groups of non-accident samples as the control groups for modeling research, and divide them into 2 categories: accident and non-accident, that is the value of dependent variable is 1 indicating accident and 0 meaning non-accident.
Firstly, the original traffic flow data of divided-lane are aggregated into a cross-sectional traffic flow data, secondly, in order to avoid the data noise due to the short acquisition interval, the data are converged with 5-minute level in order to get averages and standard deviations, thus for 5-minute aggregation half an hour period is divided into 6 time slices: the interval between time of accident and 5 minutes prior to the accident is named as slice 1, interval between 5 to 10 minutes prior to the accident as slice 2, interval between 10 to 15 minutes prior to the accident as slice 3 and so on, interval between 25 to 30 minutes prior to the accident as slice 6.

Model parameters selection
We extract 9 statistical parameters as the chosen model parameters: the average values of speed as, the average values of volume av, the average values of occupancy ao, the standard deviations of speed ss, the standard deviations of volume sv, the standard deviations of occupancy so, the coefficients of variation of speed cvs (

Modeling steps
The binary Logistic model of the freeway traffic security operation situation deduction based on short-term traffic flow multi-parameters is established by using statistical analysis software with R programming language in this paper, the methods are as follows: (1) We use the correlation analysis method to select parameters in each time slice, which makes the highly correlated variables be not into the Logistic model; (2) For each parameter, we carry out a binary Logistic regression model in each time slice (single-variable Logistic regression), and select parameters which have significant impact on the traffic accident risk of freeway as the candidate variables of the subsequent steps; (3) Combine with the analysis results of step (1) and (2)

Logistic model regression and analysis
Firstly, we use the correlation analysis method to select parameters in each time slice, the results of slice 1 are shown in Table 1.
Generally, if the correlation coefficient between two parameters is 0.6 ! or 0.6 , they are strongly related to each other, and they can not enter the model at the same time. Through the experimental results, it can be seen that, for slice 1: Cvv and cvo, sv and av, ao and as, cvs and ss are strongly related to each other, so they can not enter the model at the same time.
The results of the other slices are obtained by the same method.
Slice 2: Cvv and cvo, sv and av, ao and as, ao and so, cvs and ss are strongly related to each other, so they can not enter the model at the same time.
Slice 3: Cvv and cvo, sv and av, ao and as, ao and so, cvs and ss are strongly related to each other, so they can not enter the model at the same time.
Slice 4: Cvv and cvo, ao and as, ao and so, cvs and ss are strongly related to each other, so they can not enter the model at the same time.
Slice 5: Cvv and cvo, sv and av, sv and so, ao and as, ao and so, cvs and ss are strongly related to each other, so they can not enter the model at the same time.
Slice 6: Cvv and cvo, sv and av, ao and as, ao and so, cvs and ss are strongly related to each other, so they can not enter the model at the same time.
Secondly, for accurately analyzing the impact of a single traffic flow parameter on the probability of accident (single-factor analysis), for each parameter, we carry out a binary Logistic regression model in each time slice, for 5minute aggregation there has 6*9=54 single-variable model, due to the same modeling method, the Logistic regression results of as as the independent variable are only shown here in It can be seen, for parameter as, the significance levels of parameters in six slices are all 0.001 P , indicating that the average value of speed at 5-minute level has significant effect on the traffic accident risk of freeway, which is selected as the candidate variable of the subsequent modeling process. Odds ratio of the traffic flow parameters can be used to quantify the impacts of different traffic flow parameters on the risk of accident. The odds of an event is defined as the ratio of the probability with occurrence and the probability without occurrence, therefore, for the Logistic regression model: It can be seen that when the i-th independent variable changes a unit, the change of odds is If the coefficient of an independent variable is positive, it means that odds will increase, and this value will be greater than 1; if the coefficient of an independent variable is negative, it means that odds will decrease, and this value will be less than 1; when the coefficient of an independent variable is 0, this value is equal to 1. The change percentage of odds ratio is Table 2, the odds ratios of as in six slices are all less than 1, and with the farther away from the time of accident, the value is increasing, which shows that the probability of accident will decrease with the increase of as.
The Logistic regression analysis results of the other independent variables are obtained by the same modeling method: for parameter av, the independent variables are not significant in slice 1~6, which shows that the correlation between the change of volume and the accident is relatively low; for parameter ao, the significance levels of parameters in six slices are all 0.001 P , indicating that the average value of occupancy at 5-minute level has significant effect on the traffic accident risk of freeway, which is selected as the candidate variable of the subsequent modeling process, and the odds ratios of ao in six slices are all greater than 1, and with farther away from the time of accident, the value is decreasing, which shows that the probability of accident will increase with the increase of ao; for parameter ss, the independent variables are not significant in slice 1~6; for parameter sv, the independent variables are significant in slice 1 and slice 4, the significance levels of them are 0.05 P and 0.1 P respectively, the other slices are not significant, and the odds ratios in slice 1~6 are all less than 1, which shows that the probability of accident will decrease with the increase of sv; for parameter so, the significance levels of parameters in slice 1~6 are all 0.01 P , indicating that the standard deviation of occupancy at 5-minute level has significant effect on the traffic accident risk of freeway, which is selected as the candidate variable of the subsequent modeling process, and the odds ratios of so in slice 1~6 are obviously greater than 1, showing that the probability of accident will increase with the increase of ao; for parameter cvs, the significance levels of parameters are more higher in slice 1, 2, 3 and 6, the significance levels are 0.01 P , the significance levels are both 0.1 P in slice 4 and 5, and the odds ratios are far greater than 1, thus the probability of accident will increase significantly with the increase of the coefficient of variation of speed; for parameter cvv, the independent variables are significant in slice 3 and slice 4, the significance levels of them are 0.1 P and 0.05 P respectively, the other slices are not significant, and some odds ratios are far less than 1 and some odds ratios are slightly greater than 1, which shows the impacts on the possibility of accident may be not consistent; for parameter cvo, the independent variables are all not significant in slice 1~6.
In summary, in slice 1, the modeling effects of parameters sv, so, ao, cvs, and as are the most significant; in slice 2, the modeling effects of parameters so, ao, cvs, and as are the most significant; in slice 3, the modeling effects of parameters cvv, so, ao, cvs, and as are the most significant; in slice 4, the modeling effects of parameters cvv, sv, so, ao, cvs, and as are the most significant; in slice 5, the modeling effects of parameters so, ao, cvs, and as are the most significant; in slice 6, the modeling effects of parameters so, ao, cvs, and as are the most significant.
Combining with the correlation analysis results of parameters and the single-parameter modeling significance results, the variables which have significant affect on the accident risk of freeway and are uncorrelated with each other are selected for modeling, so the final variables entering the Logisitic regression model of muti-parameters in each slice are respectively: for slice 1, cvv, sv, so, cvs, as; for slice 2, sv, cvo, so, cvs, as; for slice 3, cvv, sv, so, cvs, as; for slice 4, cvv, sv, so, cvs, as; for slice 5, av, cvo, so, cvs, as; for slice 6, sv, cvo, so, cvs, as.
The final regression results of model are shown in Table 3.  From the results, the regression variable of the optimal final model in slice 1, 2, 3 and 4 is only as; the optimal model of slice 4 retains two variables: cvv and as; the final model of slice 5 retains five variables: cvo, av, so, cvs and as. In that case, as (the average value of speed at 5-minute level) is the most closely related to the probability of accident, and the significance level of each slice achieves to 0.0001 P , indicating that as has the remarkable influence on the traffic accidents risk of freeway. Odds ratios of the traffic flow parameters are also given in this table to quantify the impact of different traffic flow parameters on the risk of accident, it can be seen that, the odds ratios of as in six slices are all less than 1, and with farther away from the time of accident, the value is increasing, which shows that the probability of accident will decrease with the increase of as. By comparing the values of AIC in final model of each slice, we get that, the farther away from the time of accident, the value of AIC is increasing, illustrating that the fitting effect of the final model is the better the closer to the time of accident. So, the change of the average speed in 5 minutes prior to the accident is the most effective to predict the probability of accident, which is used as the risk characterization factor of real-time safety assessment for the traffic flow operation. The final model of the security situation deduction is as follows: ' (3.21024 0.0577 ) where, as is the average speed of 5 minutes prior to the accident.
After specifying a reasonable threshold value, the calibrated model can predict the risk of freeway traffic accidents in real-time. Because in the total samples, the proportion of accidents is 20%, so the threshold is set to 0.2 in this paper, namely when the probability of the model output is greater than 0.2, it is discriminated as the traffic accident; and when the probability of the model output is less than 0.2, it is discriminated as the safety state with non-accident. The comparison of the model prediction accuracy of slice 1~6 is shown in Table 4.
In slice 1, it can predict 60.71% of accidents and 69.87% of non-accidents, the total prediction accuracy is 68.04%; in slice 2, it can predict 58.04% of accidents and 68.53% of non-accidents, the total prediction accuracy is 66.43%; in slice 3, it can predict 57.14% of accidents and 67.86% of non-accidents, the total prediction accuracy is 65.71%; in slice 4, it can predict 52.68% of accidents and 66.74% of non-accidents, the total prediction accuracy is 63.93%; in slice 5, it can predict 52.68% of accidents and 66.52% of non-accidents, the total prediction accuracy is 63.75%; in slice 6, it can predict 50.89% of accidents and 64.51% of non-accidents, the total prediction accuracy is 61.79%. It can be seen that, the farther away from the time of accident, the total prediction accuracy is decreasing, and for the accident prediction accuracy, the prediction accuracy of slice 1 is significantly higher than that of the other slices, so the accident risk prediction model established in slice 1 is capable of using the real-time traffic flow data to predict the freeway traffic accident risk in real-time well.

Conclusions
The traffic flow data of the nearest detectors prior to or back to the location of the typical accidents of Beijing section in Jing-ha freeway are extracted in this paper, on the basis, we select the binary Logistic regression method to establish the security risk prediction model and realize the real-time prediction of the probability of accident by using the average speed at 5-minute level, which providing the more scientific support for the traffic control and the traffic emergency management decisions of freeway.