on Short-term Prediction Model of Freeway Operation Situation

Based on the traffic flow data and accident data of Beijing-Tianjin-Tanggu freeway, the security situation short-term prediction model was established in the paper. Firstly, we established the risk prediction database, and developed the pre-analysis software of basic data; secondly, the traffic flow data between 10 to 15 minutes prior to the time of accident were aggregated at 5-minute level, and the volume, speed, occupancy as well as their statistical parameters were selected; finally, based on the correlation analysis results of parameter s, the multi-parameters Logistic regression model was established. The results indicate, the change of traffic flow parameters and their statistics can effectively predict the possibility of accident, in which the average value of speed of small car, the standard deviation of volume of large car and the average value of volume difference between large car and small car at 5-minute level have a significant impact on the risk of accident.


Introduction
Intelligent transportation system (ITS) is widely used in the global scope, which makes the traffic managers have large real-time traffic situation data.Many researchers and practitioners have been fully aware that all the advantages of ITS will not be recognized without realizing the ability of the traffic flow short-term prediction [1] (Brian L. Smith, 2002).The traffic flow prediction model can provide such a kind of ability, and can provide forward-looking traffic management as well as comprehensive travel information service.
At present, the object of traffic forecast is mainly urban road, and it focuses on the unblocked reliability and efficiency [2][3][4][5][6][7], including the assessment of traffic operation situation, short-term t raffic forecast, road traffic condition judgment and so on, but the researches on the traffic flow operation security situation prediction are rare.
In China, due to the lack of detailed accident data and microscopic traffic flow data as support, resulting in a serious shortage in the real-t ime traffic flow security analysis theory, leading to the current freeway security management lag behind the real-t ime traffic situation prediction in our country.The United States and Canada began to carry out researches on the traffic accident detection algorithms and the traffic flow harbinger characteristics before the traffic accident fro m 1990s [8][9][10][11][12][13][14][15].A mong them, Chris adopted the speed differences between upstream and downstream, and the variances of the cross-section speed as the characterizat ion factors of the traffic flow real-time risk d iscrimination [9], the results of which was referenced by Kansas state highway agency of USA in 2006.Ho wever, the main shortcomings were unknown of the risk reason and subjectivity of the risk ran k assessment, and its scientificity needs to be further examined.
So, the pred iction method of freeway operation situation based on short-term traffic flow mu lt iparameters regression is researched in this paper, thus achieving the short-term pred iction of the traffic flow operation situation, and the results are helpful to reduce the risk of accident, decrease traffic accidents, and improve the operation security of freeway.

Data processing
Beijing-Tianjin-Tanggu freeway is about 142.69 kilo meters, has 28 microwave detectors, and the average spacing of detectors is 5 kilo meters, in wh ich, the coils of Beijing section is more intensive, but some of the data is missing.All the traffic flo w data as well as the traffic accidents data in Beijing-Tian jin -Tanggu freeway are extracted in this paper, in which, the traffic acc idents include time, location, type, reason, and et.al., and the traffic flo w data are speed, volu me and occupancy of divided-lane at 1-minute level.
First, we select the nearest detector data within 2 kilo meters prior to or back to the location of accident, in order to screen data for first round, in wh ich, the choice of accidents should avoid these caused by external factors such as weather, linear conditions, drivers and vehicles as much as possible, only in this way we can accurately excavate the rule o f the traffic flo w fluctuation affect ing on the accidents, and in consideration that the relationship between the causes and effects of singlevehicle accidents and the traffic flow may not be strong under high service level, so the select of accident samples DOI: 10.1051/matecconf/201712402004 ICTTE 2017 is more emphasis on the mult i-vehicle accidents in large traffic volu me (under C-classical service level); and then, we screen data for second round: check the quality of the traffic flow data, including deleting and processing the singular values such as speed with 0 km/h, diagnose the outliers by the data spatio-temporal g raph method and the statistical method, and correct the abnormal data values by the simp le difference method and the filtering method, in order to improve the accuracy and reliability of the analytical results, thus selecting the speed, volume and occupancy of detectors with better data quality to pair with accidents, and extract the traffic flow data of the control groups by 1:4 ratio.
The control group data meet the following requirements: the date is different with the corresponding accident; the time, week, and location are the same as the corresponding accident; the control group has no accidents at this location at the same day.Then, we select the control groups with better data quality according to the same method mentioned above, so as to establish the database required in this paper.
Because the location of traffic accident recorded by the police department is a cross-sectional stake number, therefore, it is necessary to aggregate the divided-lane traffic flow data into a cross-sectional traffic flow data, thus taking the cross-sectional data as the foundation of the research, so, the divided-lane data are transformed into the cross-sectional data by weighting method: , in which, i q , i k , i v are respectively volu me, occupancy, speed of divided- lane, n is the number of lanes.
3 Mathematical model

Logistic regression model
Binary Logistic regression model is common ly used to quantitatively analyze the impact of explanatory variables on binary dependent variables, a lso can be used to estimate the occurrence probability of a category of the dependent variables, and from the traffic flo w operation results, the dependent variables just can be divided into two categories: accidents and non-accidents.The probability of the accident corresponding to one sample data is: The linear expression after logit transform is: Where, ( )   i P x represents the probability of traffic accident; ' i x represents the linear co mbination of explanatory variables: x , ki x is the value of variab le k in i -th sample; 0 is the regression intercept; 1 2  , , , k are the regression coefficients of explanatory variable ki x ; 0 1 2 , , , , k can be calculated by the maximu m likelihood estimat ion method: .

Logistic model testing
In Logistic regression, likelihood rat io test, Akaike informat ion criterion (Akaike Information Criterion, AIC) and Schwarz criterion can be used to reflect the goodness of the model fitting.
We adopts AIC to reflect the fitting effect of the final model in th is paper, that is 2 2( ) AIC LL K S , in which, K is the number of independent variables of the model, S is the total number of response variable categories minus 1, the range of 2LL is 0 to infin ity, which is the smaller the better.Under the same conditions, the smaller value of AIC indicates the better model fitting.
The prediction accuracy for classification is usually used to reflect the predict ion accuracy of the model.Using Logistic model to pred ict the classificat ions needs specifying a threshold of probability, that is, when the probability calculated by Logistic model is greater than a specified threshold, it is discriminated as the traffic accident, and when the probability is less than a specified threshold, it is discriminated as the security state with non-accident.The threshold value decides the forecast accuracy of each category and the total samples, the current researches commonly use the proportion of a category in the whole samples as the threshold value of this category prediction.Because of researching the prediction method of freeway traffic operation situation short-term in this paper, the proportion of accidents in the whole samples is adopted as the threshold value.

Data preparation Logistic regression model
In order to predict the traffic accident in advance, the calibrated traffic flo w data within 10~15 minutes prior to the time of the accident is extracted in this paper, mean while, the traffic flo w data of the control group in the corresponding period is extracted for each accident.Through screening of data quality, we ult imately retain 33 groups of accident samp les and 132 groups of nonaccident samples as the control groups for modeling research, and divide them into 2 categories: accident and non-accident, that is the value of dependent variable is 1 indicating accident and 0 meaning non-accident.
Firstly, the orig inal traffic flo w data of d ivided-lane are aggregated into a cross-sectional traffic flo w data, secondly, in order to avoid the data noise due to the short acquisition interval, the data are converged with 5-minute level in order to get averages and standard deviations.

ICTTE 2017
We extract 14 statistical parameters as the chosen model parameters, as shown in Tab.1:

Modelling steps
The binary Logistic model of the freeway traffic security operation situation deduction based on short-term traffic flow mu lti-parameters is established by using statistical analysis software with R programming language in this paper, the steps are as follows: (1) We use the correlation analysis method to examine the correlation between variab les, wh ich makes the highly correlated variables be not into the Logistic model; (2) We select the reasonable explanatory variables for modeling according to the backward selection method of the Logistic regression, the basic steps are as follows: a) First, all the variables are contained in the model; b) Then, calculate the z test values of all variab les, and get the corresponding P values; c) Last, find the largest P value, if the P value is greater than the significance level out , this variable is eliminated; d) Go back to step b) for the next round of elimination.Among them, the significance level of the reserved values is set as: including variables is 0.05 P , excluding variables is 0.1 P .

Logistic model regression and analysis
According to the above modeling steps (1), the results are shown in Tab. 2.
Generally, if the correlat ion coefficient between two parameters is 0.6 or 0.6 , they are strongly related to each other, and they cannot enter the model at the same time.It  By co mparing, we select lvavg, svavg, lvsd, svsd, gsd, lqsd, and qdavg as the variables of the model, and according to the above modeling steps (2), the results are as follows in Tab.III: It can be seen fro m the z value, the significance levels of parameters are all 0.05 P , indicating that svavg, lqsd and qdavg have significant effect on the traffic accident risk of the freeway.
Odds ratio of the traffic flow parameters can be used to quantify the impacts of different traffic flow parameters on the risk of accident.The odds of an event is defined as the ratio of the probability with occurrence and the probability without occurrence, therefore, for the Logistic regression model: ob event Odds e ob nonevent (4) It can be seen that when the i-th independent variable changes a unit, the change of odds is , that is: Odds X Odds ratio Exp Odds X (5) If the coefficient o f an independent variable is positive, it means that odds will increase, and this value will be greater than 1; if the coefficient of an independent variable is negative, it means that odds will decrease, and this value will be less than 1; when the coefficient of an independent variable is 0, th is value is equal to 1.The change percentage of odds ratio is Fro m Tab. 3, the odds ratios of svavg, lqsd and qdavg are respectively 0.96659, 1.49283 and 0.84473, indicating that when lqsd changes a unit, the risk of traffic accident will increase 49.3%, however when svavg and qdavg change a unit, the risk of traffic accident will decrease 3.3% and 15.5% respectively.
According to the analysis, when lqsd increases, it indicates that the difference between the volume of large car and the average value of volu me of large car increases, that is the distribution of volume of large car is more discrete, which easily causes the traffic flow be not stable: congestion state a while, or free flo w state a while, DOI: thereby increasing the risk of traffic accident; the vehicle speed at the location of the accident decreases quickly, and it propagates from downstream to upstream in the form of shock wave, therefore the speed of upstream coil will appear mutation, so svavg decreases, indicating that the risk of accident increases; when qdavg increases, that is the difference between the volume of large car and the volume of s mall car increases, easily causing the instability of traffic flow, so the risk of accident will increase too.So, the change of svavg, lqsd and qdavg between 10 to 15 minutes prior to the time o f the accident are the most effective to predict the probability of accident, wh ich are used as the risk characterization factors of real-t ime safety assessment for the traffic flow operation.
The final model of the security situation deduction is as follows: ' (0.48457 0.03398 0.40067 0.16874 ) Where, svavg, lqsd and qdavg are the average value of speed of small car, the standard deviation of volu me of large car and the average value of volu me difference between large car and small car between 10 to 15 minutes prior to the time of the accident respectively.
After specifying a reasonable threshold value, the calibrated model can predict the risk of freeway traffic accidents in real-time.Because in the total samples, the proportion of accidents is 20%, so the threshold is set to 0.2 in this paper, namely when the probability of the model output is greater than 0.2, it is d iscriminated as the traffic accident; and when the probability of the model output is less than 0.2, it is discriminated as the safety state with non-accident.The model predict ion accuracy is shown in Tab. 4.
As shown in Tab. 4, the Logistic model based on the traffic flow data of Beijing-Tianjin-Tanggu freeway can predict 60.61% of accidents and 65.91% of non-accidents, and the total predict ion accuracy is 64.85%.Therefore, the accident risk predict ion model es tablished in this section can use the real-time traffic flow data to predict DOI: 10.1051/matecconf/201712402004 ICTTE 2017 the risk of traffic accident of freeway well.In order to reduce the false-positive rate in actual applications, it can improve the prediction threshold value of the model according to the specific hu man and material resources allocation situation, such as increasing to 0.4 or 0.5.

Conclusions
The traffic flow data of the nearest detectors prior to or back to the location of the typical accidents in Beijing-Tianjin-Tanggu freeway are ext racted in this paper, on the basis, we select the binary Logistic regression method to establish the security risk predict ion model and realize the real-t ime predict ion of the p robability of accident by using the average value of speed of small car, the standard deviation of volume o f large car and the average value of volume d ifference between large car and small car between 10 to 15 minutes prior to the time of the accident, which provid ing the more scientific support for the traffic control and the traffic emergency management decisions of freeway.