Evaluation of various GIS-based methods for the analysis of road traffic accident hotspot

. In order to establish objective criteria for road traffic accident (RTA) hotspots, this paper examines the application of three different hotspot analysis methods to both identify and rank the RTA hotspots. The three methods selected are the network Kernel Density Estimation (KDE+) method, the Getis-Ord GI* method, and a recently proposed risk-based method that accounts for RTA frequency, severity and socioeconomic costs – STAA method. The study road, Jalan Tutong, is a major dual-carriageway connecting major residential and commercial areas from the west of Brunei-Muara district and beyond to the capital, Bandar Seri Begawan. The RTA data consists of cases reported to the police during a 5-year period from 2012 to 2016. The RTA data were digitised and prepared, before being imported into ESRI ArcGIS 10.2 software for analysis using each of these methods. The outcomes, particularly the location, extent and priority of the RTA hotspots, are subsequently compared to results from road safety audits, in order to determine the relative merits and drawbacks of each method. The findings from the comparative study would be useful to recommend the most suitable method to identify and rank the RTA hotspots for the study road.


Background
According to the World Health Organisation, 1.25 million die from road traffic accidents (RTAs) every year and in most countries, RTAs cost approximately 3% of the gross domestic product [1], with generally higher RTA deaths in low-and medium-income countries and lesser in highincome countries. Brunei Darussalam is a country of land area 5,765 km² is located on the north Shore of Borneo Island in South East Asia and shares land territorial borders with Malaysia and Indonesia and maritime borders with Malaysia and China. As of 2016, Brunei has as a total population of 422,678 and a road network totalling 3,404.8 km [2], mostly concentrated in the Brunei-Muara district. It was previously reported that Brunei's vehicle fleet comprise of 92% cars, 5% heavy goods vehicles and 3% motorcycles with relatively few vulnerable road users such as motorcyclists, pedal cyclists and pedestrians [3]. A survey conducted in 2014 revealed that 98% of the surveyed trips primarily involved of private cars, and there is a growing concern that RTA in Brunei is related to the high dependency on private cars [3].
Hence, in order to establish objective criteria to reduce RTA and improve road in the face of limited budgets, it is important to recognise how, where and when RTA occurs [4]. Understanding the spatial patterns of RTA allows road authority engineers, design consultants and maintenance teams to implement appropriate RTA reduction measures [4] and prioritise them through a ranking scheme [5]. Identifying RTA hotspots or blackspots along the road has been made easier in recent years with the integrated application of Geographic Information System (GIS) software and Global Positioning System (GPS) devices.
RTAs seldom happen randomly but rather in clusters [4] which become more evident with a high number of accidents per kilometre on a given road. The fundamental concept is that the greater the cluster strength, the greater the urgency to undertake countermeasures. Although there are a good number of RTA hotspot identification approaches, it is better to have at least 2 systematic approaches for relative comparisons.

Brunei Historical Road Traffic Accident Data
Ref. [3] reported the characteristics of RTA cases in Brunei between 2010 and 2015 and found that the number of RTA cases and slightly injury casualties decreased while the number of seriously injured and killed fluctuated as illustrated in Fig. 1. Most of the reported RTA cases comprised of multiple-vehicle and singlevehicle crashes involving private cars. This observation is consistent with the growing concern that RTA in Brunei relates to high dependency on private vehicles. In comparison with Malaysia and USA, Brunei RTA fatality rate is generally lower, but is higher when compared to Australia, Singapore and UK [3]. Some roads in Brunei appear to experience higher increases in RTA compared to national trends.  The RTA statistics for the 4.1 km section of Jalan Tutong in this study are shown in Fig. 2 and Fig. 3. From 2012 to 2016, this section of the road had 10.9 accidents per kilometre per year.

Jalan Tutong Historical RTA Data and Road Profile
The most common types of crash movement were rear-end (72 cases) and left-turn-in (28 cases) involving 2 vehicles. The 4.1-kilometre section of Jalan Tutong comprises dual carriageway which carries traffic northeast towards Gadong, southeast towards Bandar Seri Begawan, north towards Jerudong and south-west to the Tutong district, as illustrated in Fig. 4. There is a total of 3 signalised junctions, 85 unsignalised junctions, 6 U-turns, 4 rightturns and several stop-or yield-controlled intersections. On both sides of the road, there are high concentrations of residential and commercial areas. The posted speed along this road is 65km/hr but the 85 th percentile operating speed was observed to be 86.9km/h (southeast-bound).   Stopping sight distance (SSD) refers to the minimum sight distance required by a driver travelling at design speed to stop safely without collision [8]. Fig. 5 and Fig. 6 illustrate the sight distances for a curved section and 3arm intersection of a road respectively. The SSD for Jalan Tutong can be determined using Equation (1) The 78m will be used to create the hotspot zones/buffer for each of the three methods as analysis boundary.

RTA Hotspot Identification Approach
RTA occurring on a road section can be identified through spatial cluster detection. This paper will examine the application of two GIS-based statistical methods on RTA data -network Kernel Density Estimation (KDE+) and Getis-Ord GI*. A third one is a risk-based method that accounts for RTA frequency, severity and socioeconomic costs and is known as Spatial Traffic Accident Analysis (STAA) method. The output results will be presented using 4 levels of risk exposure using 4 different colour codes -'serious' in black, 'significant' in red, 'moderate' in yellow and 'minor' in green.

Digitising Jalan Tutong Road Centreline
The process of tracing a feature or features from an image into a vector data format is known as digitising. As shown in Fig. 7, the yellow line is the digitised road centreline of Jalan Tutong and it is in a polyline shapefile. The coordinate system used is Geocentric Datum Brunei Darussalam 2009 (GDBD2009). The centreline is taken as the centre of the two dual carriageways, i.e. along the elevated road divider.

Geocoding Jalan Tutong RTA Locations
The process of converting locations into their geographical coordinates is termed geocoding. Fig. 8 shows the RTA locations along Jalan Tutong and they are in point shapefile. The locations are recorded in eastings and northings of both World Geodetic System 1984 (WGS 1984) and GDBD2009 geographical coordinate systems. Using ArcMap, the spreadsheet is converted to a feature class of both WGS 1984 and GDBD2009 coordinate systems. The point shapefile is then projected to GDBD2009 coordinate system. Locations with adequate information to be analysed are selected while the rest are discarded. Vital parameters are recorded such as direction of traffic flow (i.e. BSBbound and Tutong-bound), types of intersection where accident takes place (e.g. T-junction, -junction, Yjunction and etc.), crash type (e.g. self-accident, 2vehicle, 3-vehicle and vehicle-motorcycle-bus, etc.), crash movement type (e.g. head-on, lane change, lost control, etc.), weather and road conditions (e.g. luminance, roughness, etc.), time of accident, number of people involved and severity, and contributory factors (e.g. driving under influence, sight distance issue, etc.). These allow investigators to distinguish the main causal factor -driver's behaviours, sight distance, road geometric design, and infrastructure or vehicle defects.

RTA Hotspot Identification Method
In Section 1.4, the three methods of RTA hotspot identification have been mentioned -KDE+, Getis-Ord GI* and STAA. In the following sub-sections, the methodology and results for each method will be discussed and presented.

Network Kernel Density Estimation (KDE+)
Network KDE analysis is considered to be more suitable to analyse event points (e.g. RTA) which occur inside a one-dimensional linear space (e.g. a road), i.e. networkrestricted incidents [9]. KDE+ is a relatively new software developed by Transport Research Centre (CDV) in Czech Republic to handle RTA data [10]. The KDE+ method is an extended KDE approach which estimates the probability density function of the aggregated event points using a kernel function. The '+' denotes that it provides objective selection of significant clusters and hotspot ranking [10]. KDE+ can be either be operated as a standalone running JavaScript or as an ArcGIS toolbox.
KDE by itself produces a range of clusters (local maxima) and if there is no objective defining a specific threshold, the clusters cannot be differentiated or ranked -see Fig. 9(a) and Fig. 9(b). Thus, to solve this, repeated random simulations using Monte Carlo method are performed 400 times [10]. Subsequently, a 95 th percentile level of significance is chosen as shown in Fig. 9(c). Ref. [11] explains the concept of integrating planar KDE and statistical testing of cluster significance for analysing RTA. The processing of RTA data for KDE+ requires that the event points intersect with the road centreline as shown in Fig. 10. Using the standalone KDE+, only two input shapefiles are required: the point shapefile of RTAs and a polyline shapefile of the road centreline. The bandwidth (search radius) is set to 100 units (100m) and the data accuracy is set to GPS. The 100m bandwidth is considered to be reasonable for Jalan Tutong with respect to sight and stopping distances.    In contrast, Fig. 13 displays the result from the planar KDE ArcGIS tool that turns point events into smooth density surface over the two-dimensional geographic space, covering 223 points.  The use of KDE+ has a restriction, that is, the analysis is more suitable for road between intersections and not suitable for the analysis of intersections [11]. For instance, the study road had of 223 RTA points, of which 34 RTA points were found within signalised intersections. Removing the 34 points (-18%) would result in the cluster strength as shown in Fig. 14, based on the remaining 189 RTA points.

Hot Spot Analysis (Getis-Ord GI*)
Spatial statistical mapping is a key component to understanding the spatial and temporal occurrences of event points [12], such as RTA. Spatial statistics consist of techniques to describe and model spatial data, e.g. aggregate event points. Using ArcMap 10.2 extensionsin particular, Spatial Analyst -spatial statistical analysis related to RTA can be performed.
ArcMap is a powerful geospatial processing programme that allows geographic information and corresponding attributes to be stored in layers of shapefiles and performs GIS tasks according to the user's objectives. Before performing the "Hot Spot Analysis (Getis-Ord GI*)", there are several other geoprocessing tasks need to be performed on the aggregated event points as described below and also summarised in Fig. 19.

Integrate Event Points
There is a possibility that the geographical coordinates of the event points are inaccurate. By integrating the event points, it allows the RTA locations within the assigned xy tolerance (30ft = 9.144m) to be considered as identical or coincident. This allows the integrity of the shared feature boundary to be maintained [13].

Collect Events
The Collect Events tool converts event data (i.e. event points) to weighted point data. This is done by combining the coincident points in a weighted point feature class -ICOUNT -and holding the sum of all the event data at each unique location [14] as shown in Fig. 15.   Fig. 15. Weight point data (field: ICOUNT).

Calculate Distance Band from Neighbour Count
In this task, an essential condition that needs to be taken into account, that is, at least 1 neighbour should be assigned to each dataset. The distance method assigned to this is Euclidean distance. The results give the minimum, average and maximum distances for each point with at least one neighbour as displayed in Fig. 16. This task is an important prerequisite to spatial autocorrelation.

Spatial Autocorrelation (Moran's I Method)
Moran's I is one of the oldest and the most common indicator of global spatial autocorrelations and is used to determine if the patterns expressed by feature locations (e.g. event points) and feature values (e.g. ICOUNT generated by collect events tool) are clustered, dispersed or random [12,15].
The Moran's I index and the z-score value for the frequency of RTA can be calculated using the Spatial Autocorrelation (Global Moran's I) tool and the result is a report as shown in Fig. 17. As indicated, the z-score is -0.759443 and the corresponding Moran's Index is -0.113923. The comment state that "the pattern does not appear to be significantly different than random". This is a clear-cut indication that there is no clustering and therefore, there it is not necessary to proceed with Getis-Ord GI* (Hot Spot Analysis). See Fig. 19 for the process flow chart of RTA hot spot analysis using Getis-Ord GI* method. Alternatively, one can use the Incremental Spatial Autocorrelation tool to measure the spatial autocorrelation for a series of distances that creates a line graph for the series of distances and corresponding zscores as shown in Fig. 18. The z-score reflects the intensity of spatial clustering and statistically significant peak z-score(s) indicate corresponding distances where spatial processes promoting clustering are most pronounced. There are some instances where a singular peak z-score does not exhibit itself, and that itself is already an early indication of the absence of spatial clustering, and such is the case for this RTA data.

STAA Method
This technique is a risk-based method that accounts for RTA frequency, severity and socioeconomic costs to analyse the recorded historical RTA data [5]. It was established by the Centre of Transport Research (CfTR) in Universiti Teknologi Brunei to identify RTA hotspots in Brunei's accident-prone roads and is thereafter named the Spatial Traffic Accident Analysis (STAA) method. Ref. [5] elaborates the procedures this method but the following sub-sections highlight the key procedures.

RTA Hotspot Prioritisation
The first step is to convert the 223 RTA into 223 polygons using a buffer radius of 78m and merge the over-lapping polygons into 8 hotspot polygons as shown in Fig. 20. The next step is to use the Join Data function from the hotspot polygons to consolidate the RTA data into the polygons, therefore, each hotspot polygon holds the sum of fatality, serious injury, minor injury and no injury cases of the RTA inside the polygon as shown in Fig. 21. STAA method identifies 3 parameters which contribute significantly to the magnitude of the hotspot, which are frequency (F), severity (S), socioeconomic impact (SEI) [5]. The fourth and fifth parameters comprise of two 4 X 4 matrices: normalised frequency (NF) versus normalised severity (NS) and normalised frequency (NF) versus normalised socioeconomic impact (NSEI) [5]. Equations (2), (4) and (6) were developed by CfTR [5]; Equation (3) is a weighting system adopted by the Belgium government as their hotspot detection method [16,17]; whereas the cost in USD for Equation (5) can be obtained from [5,18].
The third step is to create new fields in the hotspot polygon attribute table to calculate for the S, SEI, NF, NS and NSEI using Equations (3), (5), (2), (4) and (6) respectively as shown in Fig. 22. The final step is to assign each of the hotspot polygons to the 4 levels of risk exposure according to the intervals of NF, NS and NSEI as tabulated in Table 1 and Table 2. The tables are formulated from Brunei's most accidentprone major roads and are thus more suitable for highvolume traffic. They are less applicable to low-volume roads as the RTAs for the latter may be too highly dispersed to form hotspot polygons.

Proactive Road Safety Audit
In principle, proactive road safety audits (RSAs) should be conducted periodically to ensure that potential road hazards are identified and addressed before a probable accident happens; this is a strength of RSA, compared to traffic hotspot detection [19][20][21]. A reactive RSA is also essential as the observations and deductions from a reactive RSA will serve as precedent cases for subsequent proactive RSAs. Additionally, RSA costs significantly less than constructing, demolishing or reconstructing road infrastructure [19]. The nature of RSA should entail both qualitative and quantitative assessments. An integrated RSA quantifies road hazards via, for example, a Concern Assessment Rating Matrix where the level of risk of identified road hazards are 'quantified' as shown in Table  3. This matrix is produced by Ref. [22] and it provides guidelines for RSA procedures in New Zealand. Fig. 26 indicates the locations where RSA has been conducted for 2 of the 6 hotspots and the road hazards have been identified for both locations. The RSA rating for both locations was found to be at a 'Serious' level.

Discussion
An admirable advantage of KDE+ is its simplicity of application, despite its complex underlying concepts and procedures. This is favourable for road authority engineers and design consultants who may have limited familiarity with spatial statistics. One other advantage of using KDE+ when compared to statistical clustering (e.g. GI*) is that in the former, the uncertainty about the RTA precision can be enhanced by bandwidth (search radius) of the kernel [11]. Furthermore, KDE+ is able to specifically identify the segments of the roads with significant clustering and prioritise the segments based on the cluster ranking procedure [11]. Planar KDE is perhaps a more suitable tool to give better visualisation of the spread of risk. However, a limitation of KDE+ is that it does not inform the risk exposure of other segments of the roads with low significant clustering. Some segments of the road could be inherently dangerous with one or two deaths but because the frequency is so low the cluster strength does not surface. This could undermine the objective of reducing RTA deaths and the socioeconomic and grievance impacts associated with it. Getis-Ord GI*, like KDE+, takes into account the statistical significance of the RTA clusters, expressed using z-score, p-value or confidence level. The measure of statistical significance is absent in both planar KDE and STAA methods. Among other useful tools, one of them is the Ordinary Least Squares (OLS) tool, which can be used to determine the magnitude of correlations between the RTA and attributes such as socioeconomic costs, weather conditions, IRI, type of junctions, crash types, crash movement types, etc. Apart from spatial analysis, temporal analysis can also be performed with ease. KDE+ does not inherently have these advantages. It has been argued that Getis-Ord GI* may not be suitable for network-restricted incidents like RTA, as already validated in Section 3.2, since statistical significant spatial clusters are more difficult to detect on a one-dimensional road network than a two-dimensional planar area. Getis-Ord GI* is more suitable for area-wide incidents, such as crimes, disease outbreaks, forest fires and floods, among others. A concern of using ICOUNT alone and ignoring Weighted Severity Index (WSI) in Getis-Ord GI* analysis is that it may subvert the results, as the analysis only takes into account frequency without considering magnitude of severity.
One of the advantages of STAA method is that the hotspots can be presented according to the individual parameters or according to the composite parameters as shown in Fig. 23 and Fig. 24. It allows the road authority engineers, design consultants and maintenance teams to give precedence to cost-effective countermeasures either based on severity or cost. It is worthwhile to note that the frequency ratio of death to no injury is 1:379 and the cost ratio of death to no injury is 1:430 for Jalan Tutong. However, STAA method tends to overrate hotspot zones and one distinctive issue is the long stretch of hotspot zones. Comparing Fig. 12, Fig. 13 and Fig. 23, road segments where the level of risk exposures are 'Moderate' for KDE+, it is 'Serious' for the STAA method. This makes it challenging to distinguish which segment of that hotspot to be assessed. Henceforth, Micro-analysis (Fig. 25) was introduced to observe the spatial distribution of the RTA in each hotspot zone [5].
Comparing the identified risk between the RSA, Fig.  26, the results obtained from KDE+ and STAA methods indicate that location 1 is 'Serious' and this is a good indication that the two methods are consistent. However, in instances like location 2, where the result from KDE+ does not agree with STAA, the RSA investigation can provide a more reliable indication of the level of risk. This is because during an RSA, its investigation scope includes inspecting the road's operational characteristics: geometric parameters, road surface characteristics, visibility, signalisation, facilities, traffic control and other engineering aspects which impact on road safety [21]. These are factors, apart from driver's behaviours, that can be responsible for accidents and should thus be given relative importance.
The merits and limitations for each method have been addressed and it is evident that there is no one method that outperform others. KDE+ helps to identify specific segments of the road which are critical and STAA takes into account of RTA frequency, severity and cost. Conclusively, each of the method's strengths help to offset the weaknesses of another method, as demonstrated in Fig. 27 and further supported by the reactive RSA.

Conclusion
Time-efficient and cost-effective methods to identify RTA hotspots are in demand worldwide and with recent advances in GIS technology, researchers in transportation are gaining leverage over this. Planar KDE has long been used to detect RTA hotspots but it has been proven that network KDE is more appropriate for network-restricted incidents, which have been illustrated by KDE+. However, in addition to its discontinuous output results, it tends to undermine segments of the roads where frequency is low but severity may be high. Although Getis-Ord GI* does quantitative statistical assessment, which is lacking in STAA method; it is unable to detect statistically significant clusters of RTA along the onedimensional road. Lastly, the STAA method accounts for the combined consequences of frequency, severity and cost instead of just one parameter. One drawback of STAA method it that it may overestimate the level of risk at some segments of the road due to lengthy hotspot polygons. A micro-analysis may be required to look into these long hotspots. In spite of the benefits of quick RTA hotspot detection presented by the above methods, stopping at this stage is unwarranted. RSA should follow after RTA hotspot identification to validate the output results and determine potential road hazards that will cause eventual accidents and propose remedial actions for RTA hotspots.