Identification of potential traffic accident hot spots based on accident data and GIS

The problem of road traffic safety has been widely concerned in recent years. The identification of traffic accident hot spots can effectively improve the road traffic safety and let the traffic managers formulate targeted improvement measures and suggestions. The traditional identification method of accident hot spot does not consider the spatial attribute of the accident, so it has some limitations in the identification of traffic accident hot area. Therefore, this paper first proposes a method to identify the hot spot of traffic accidents based on geographic information system (GIS). The mathematical model and machine learning model are used to explore the correlation between traffic accidents and spatial characteristics from macro and micro aspects. Finally, taking Beijing as an example, the feasibility of the research method is proved by using the accident data of Beijing in 2015 and the geographic information of Beijing. The research results of this paper can realize the spatial effective transformation of accident records, comprehensively consider the micro and macro attributes of the accident itself, realize the automatic and efficient identification of the accident hot spot. In addition, the causality analysis results between each attribute and the distribution of accident hot spots can help decision makers to formulate safety and sustainable road strategies.


Introduction
China's traffic safety situation is grim. In recent years, the number of traffic accidents in China is increasing year by year, and the number of accident deaths ranks first in the world. Therefore, it is very important to identify potential traffic accident hotspots to ensure safe and smooth travel environment. The traditional traffic accident analysis usually relies on the map, engineering drawing and general information management system which lack of effective spatial data analysis, which results in not only the heavy workload of data processing, but also the analysis results are not comprehensive and accurate, and it is difficult to reveal the internal laws. Therefore, it is necessary to make full use of the information, technology and scientific methods to effectively analyze and identify the dangerous road sections, so as to find out the current dangerous areas of traffic accidents, and take measures such as identification, rectification and prevention. The occurrence of urban traffic accidents has clear spatial location characteristics, so it is necessary to analyze the relationship between traffic accidents and geographical location in the process of identifying traffic accident hot areas. Geographic Information System (GIS) is an important technical means to analyze and explore the problems related to traffic accidents and geographical location, and has the spatial and spatial database management function of visual interface. [1] [2][3] [4] The research significance of the text is as follows: (1) This paper uses ArcGIS software to study. ArcGIS is a complete GIS software system based on industrial standards. It has the characteristics of comprehensive function, good scalability and user-defined flexibility. It can combine the traditional traffic accident database with the visualization and spatial analysis ability of GIS system, and use the traditional mathematical model algorithm and machine learning algorithm to find out the various attributes hidden behind the traffic accident data. The potential connection of data reveals the hidden regularity.
(2) The effective identification of potential accident hotspots is effective and practical for management decision-making, reducing traffic accidents and improving safety operation management.
(3) It provides certain reference for traffic management departments to make decisions, which is of great significance to reduce the incidence of traffic accidents, reduce potential accident risks and improve the effectiveness and practicability of safety operation management.

Method
The purpose of this paper is to find out the relationship between the severity of vehicle accidents and macro and micro factors, and to provide the basis for the identification of traffic accident hot areas.

Binary logic regression
Based on the accident data, this study uses binary logistic regression to establish a model to analyze the impact of different road facilities on the severity of traffic accidents from the micro level. In the logistic regression model, the results are divided into two parts. The relationship between probability and events is described by the following link function: Where, Is the probability of occurrence of the event; It&apos;s a linear function that explains variables. In logistic regression model, the linear function is related to the expected value of response, which is composed of k independent variables and coefficients f(x n ) = β 0 + β 1 x 1n + β 2 x 2n + ⋯ + β j x jn + ⋯ + β k x kn (2) Where, Is the x jn vector of the independent variable, and is the corresponding coefficient.
When there are two or more independent variables in the experimental study, the effect of one of the independent variables on each level of the other is inconsistent. This phenomenon is called the interaction effect. This study takes the first-order interaction effect as the research object, which is limited to two explanatory variables. Therefore, the function It can be expressed by the following expression: ( ) = 0 + 1 1 + 2 2 + ⋯ + + +1 1 Where: K is the number of independent variables; m is the number of variables and interaction effects.

Random forest (RF)
RF is one of the most commonly used classifiers proposed by Breiman for training and predicting samples. Based on bootstrap sampling method, RF algorithm can change the training set and establish decision tree set. Because the classification tree is constructed with the guidance of data, and the candidate variable set is a random subset of variables at each split.In this paper, the Gini index of random forest algorithm is used to analyze the influence of different factors on the severity of traffic accidents. [5] [6] The Gini index is calculated as follows: Where: K represents K categories, Represents the sample weight of class K.
So the characteristics The importance on node m, that is, the Gini index changes before and after node m branching are as follows: Where Gil and Gini index of the two new nodes after branching respectively If the feature If the nodes in decision tree I are in set M, then the importance of X in the ith tree is Suppose there are n trees in RF, then Finally, all the obtained importance scores are normalized The denominator is the sum of all characteristic gains, and the molecule is the Gini index of characteristic J.

Spatial analysis method
Compared with the traditional statistical (Poisson) model, the spatial location identification of traffic accident ICTLE 2020 hotspots uses the spatial attributes of accident points. Hot spot analysis results of traditional accident prone section identification will determine that a single intersection or section with high accident rate is a dangerous section, while the accident hot area will identify multiple continuous single road sections as a hot spot area on the basis of considering the spatial autocorrelation of the spatial agglomeration of accident points. The traditional spatial analysis method (hotspot analysis) of traffic accidents may be affected by random factors rather than by road environment. Therefore, in order to explore the main factors of the distribution of traffic accident hot spots, this paper comprehensively considers the following two points: (1) the historical location of traffic accidents; (2) the spatial attributes of the historical locations of traffic accidents. [7] In this paper, the spatial analysis method based on Arc GIS software is used to evaluate the potential mutual dependence between the attribute values of observation data in a certain analysis range. If the similarity of the observed values of each spatial point becomes more similar with the reduction of spatial distance, it is spatial positive correlation, otherwise it is spatial negative correlation; if there is no obvious relationship between the observed values and spatial relationship, it is spatial uncorrelated

Study case
Most of the existing studies show that both macroscopic and microscopic considerations are conducive to the analysis of traffic accidents. Therefore, this study analyzes from two levels. [8] The map of Beijing is divided into 2023 traffic areas as analysis units. In addition to using ArcGIS software to obtain the basic information of traffic district such as area, edge length and location, this paper also obtains the point of information (POI) data of supermarkets, banks, school supermarkets and office buildings from Google map, and uses mobile phone signaling data to obtain the employment and living conditions of each traffic district.
Using the spatial analysis tool of ArcGIS software, the POI data of supermarkets, banks, school supermarkets, office buildings, the number and density of residents and working population in each traffic district are collected and calculated. Figure 1 shows the spatial distribution of the relevant POIs in this paper. The macro factors of traffic accidents studied in this paper are as follows:

Analysis of accident influencing factors
The graph shows the order of importance of macro factors in Gini coefficient. The top 10 most important factors include the density of restaurants, the density of bus stops, the area and length of TAZ, the density of banks, the Beijing have guardrails to protect waiting passengers, so there is a greater risk of traffic accidents. [9] Previous studies have discussed a positive correlation between population and traffic accidents. It is confirmed in this paper that more permanent population means more traffic activities. Therefore, in order to improve road safety, we should pay special attention to the area with relatively dense population, which may become the hot spot of traffic accidents. [10] In addition, school density is also an important factor affecting traffic accidents, because schools can attract a lot of traffic, so the relatively large traffic attraction will be accompanied by traffic accidents in the suburbs. Therefore, it is necessary to study the school density as an important factor in the identification of traffic accident hot spots.
In addition, road type is very important in micro factors, so binary logistic model is used to further analyze road type, and different symbols are used to indicate the degree of PR (> | z|): "* *" represents 0.001-0.01, "*" represents 0.01-0.05, "." represents 0.05-0.1, and "*" represents 0.1-1, as shown in Table 2. Table 3 shows class IV highways and general urban roads and serious pedestrian accidents. In addition, the firstclass highway and self -built road are also related to the accident.  The previous section has analyzed and introduced the influencing factors of traffic accidents from macro and micro aspects in detail, carried out correlation analysis on many related factors, and determined the spatial characteristics of dangerous road sections. This section mainly introduces the process of hot zone identification by using the influencing factors of traffic accidents, so as to provide strong scientific basis for improving urban traffic safety.

Figure 3. Traffic accident distribution map of the study area
This paper selects the inner area of the Fourth Ring Road as a case of traffic accident hot area identification for further analysis. The traffic accident hot zone is composed of a series of continuous road sections with a high number of traffic accidents. According to the above analysis of traffic accident influencing factors, the historical location of traffic accident, road type, supermarket density, school density and station density are selected as important indicators to identify the hot area. The specific flow chart of the identification process is shown in Figure 4.Firstly, the core density tool and linear density tool of ArcGIS software are used to identify the density of POI points, and then the density of each POI point is re sorted by the re classification tool in ArcGIS software according to the equal interval. The new classification categories are 10, and the higher the density, the higher the score. Finally, weighted stack is carried out. According to the order of Gini coefficient, the weight is set as shown in Table 4. The results of hot zone identification are shown in Fig.  5.It can be seen from Figure 5 that the hot spots of traffic accidents are scattered, relatively concentrated in the area from the second ring road to the Third Ring Road in the urban area of Beijing. Figure 6 shows the nuclear density map of traffic accidents. Traffic accidents are distributed intensively, and key traffic accident dangerous areas cannot be identified.  figure 5 and Figure 6, it can be seen that the traffic accident hot area in Figure 5 overlaps with the area where the traffic accident is concentrated as shown in Figure 6, but there is no great degree of risk differentiation. At the same time, if we don&apos;t consider the influence factors of spatial attributes among the traffic accident points, almost all the traffic accident points with high accident frequency will be considered as the accident hot area, which indicates that even the area with relatively concentrated traffic accident points in the historical data is not necessarily the potential traffic accident hot area. On the contrary, many areas with no high incidence of traffic accidents in history have the possibility of becoming high-risk areas of traffic accidents. Further inspection is needed to eliminate potential safety hazards.  In this paper, considering the attributes of the accident itself and the spatial attributes, the identification method of traffic accident hot area is proposed, which is of great significance to improve road traffic safety and formulate targeted improvement measures and suggestions.
2) In this paper, random forest and binary logistic model are used to identify the macro and micro influencing factors of traffic accidents. The influence of road infrastructure, road spatial environment and socioeconomic environment on the distribution of accident hot spots is comprehensively considered. Spatial statistics and mathematical statistics are combined to comprehensively consider the attributes and spatial attributes of accidents.At the same time, the priority level of hot area was set up, which was divided into 10 grades to rank the severity and risk.In general, in order to ensure a higher efficiency of investment improvement, priority can be given to the adjustment and optimization of the most dangerous accident hot area.
3) In this paper, there are still some deficiencies and areas that can be further improved in the exploration of traffic accident hot area identification: we can further explore the hierarchy of the weight setting of each index in the weighted analysis, and establish a scientific index system to identify the accident hot area.In addition, more factors can be considered to further explore the distribution of traffic accident hot spots.