Investigation of traffic conflicts at signalised intersections in Warsaw

Although traffic safety situation in general is improving, the numbers of pedestrians and cyclists hit when crossing a road have not significantly decreased recently. Based on police accident records for years 2010-2014, some 735 pedestrians and 505 cyclists were hit by motor vehicles in Warsaw. Investigation reported in this paper is a part of the European project InDeV. One aim of the project is to find correlation between accidents and traffic conflicts and thus provide a solid base for using surrogate safety measures as safety diagnostic tools. Three typical signalised intersections in Warsaw were selected for video recording. Relevant encounters between motor vehicles and vulnerable road users (pedestrians and cyclists) were identified and analysed using programs RUBA and T-Analyst. The paper describes the semiautomatic video data processing and problems regarding some technical and methodological aspects of conflict detection. Based on video analysis of 24 hours of recording for each intersection, preliminary characteristics of encounters between pedestrians/cyclists and motorised vehicles have been developed. Statistical distributions of encounter parameters such as time-to-collision (TTC) and post-encroachment time (PET) are presented. These will be used in the development of appropriate safety indicators.


Introduction
Vulnerable road users (VRU) which include pedestrians, cyclists and powered two-wheel users constituted almost half of all road accident victims in 2017 in the European Union [1]. As active travel is being encouraged for health, environmental and traffic congestion avoidance reasons, the safety of walking and cycling must be addressed urgently. EU countries have agreed to prioritise actions for the safety of vulnerable road users and safety in urban areas [2].
Although traffic safety in Warsaw has been gradually improving, the numbers of pedestrians and cyclists hit when crossing a road have not decreased significantly over the last six years. Based on police accident records for the years 2010-2014, some 735 pedestrians and 505 cyclists were hit by motor vehicles in Warsaw. Among those accidents, about 33% of pedestrian crashes and 30% of cyclist crashes occurred at signalized intersections [3].
Investigation reported in this paper is a part of EU Horizon 2020 project InDeV (In-Depth understanding of accident causation for Vulnerable road users) [4]. One aim of the project is to find correlation between accidents and traffic conflicts and therefore provide a solid base for using surrogate safety measures as safety diagnostic tools. The paper describes the semi-automatic video data processing and problems regarding selected technical and methodological aspects of conflict detection. Preliminary results of conflict investigations at two signalised intersections in Warsaw are also shown.

Background
Accident data analysis is the most commonly used method of evaluating traffic safety, but it suffers from certain limitations. First of all, accidents are relatively rare and happen randomly, which usually requires a significant amount of historical data before a reliable safety assessment can be made [5,6]. This implies that safety improvement measures can be introduced only after a long period of observing accidents, including a series of injuries or fatalities. It might take years to observe a conclusive number of crashes, especially that accident frequency is slowly but gradually declining. In such a long time, many conditions may change, including road geometry, speed limit, traffic signal program, or traffic volume, which can greatly distort the results of traffic safety analysis. In addition, not all accidents are properly reported and some of them are not reported at all. The latter most often concerns vulnerable road users who sustained slight injuries. This problem of accident under-reporting and incomplete reporting has been recognised and discussed in the literature [7][8][9][10].
Due to the above limitations, alternative methods of traffic safety assessment are being developed. One of them, investigated in this paper, is the use of surrogate safety indicators based on short-term filming of traffic and semi-automatic video analysis that could give concluding results after weeks and not years of observations. The method is based on detection and analysis of traffic conflicts or near-accidents -events involving a vehicle and a VRU moving in such a way that a traffic collision can be avoided only when at least one of them performs an evasive manoeuvre [11]. Naturally, such situations are not reported in any way and they are several hundred times more likely to happen than accidents.
The Swedish Traffic Conflict Technique, developed in 1980's, utilises the time-to-collision (TTC) value at the moment the first evasive action is started by one of the road users (TTC becomes then the time-to-accident, TA) to rate the conflict severity [12]. The Dutch Objective Conflict Technique for Operation and Research (DOCTOR), developed at a similar time [13], defines a conflict as the situation with or without a collision course, provided that the time margin is small enough. Both methods involved recording traffic conflicts by human observers, which was very labour and time consuming. This approach has also been criticized as humans are prone to provide subjective assessment that is strongly influenced by their training level, psychophysical state, distractions etc. [14].
Recently, there has been renewed interest in traffic conflict studies, as evidenced by the large number of scientific publications [14]. This has happened because of the recent development of various computer vision based methods for detection and tracking of road users [15][16][17][18]. These methods can help to detect and analyse conflicts automatically and thus eliminate the bias and uncertainty introduced by human observers. Nevertheless, surrogate safety methodologies taking advantage of automatic or semi-automatic video analysis are still under development [19][20][21]. It is hoped that they can enable unbiased safety assessment based on relatively short observation periods and will provide an objective evaluation of measures used to improve safety.

Traffic filming in Warsaw
Certain novelty of the InDeV project is the large-scale on-site data collection that involves video recording of traffic at 27 signalized intersections and pedestrian crossings in 7 European countries (Belgium, Denmark, the Netherlands, Norway, Poland, Spain and Sweden). At 24 sites filming took place for three weeks and three sites were designated for long-term filming over the entire year. This resulted in a total of over 220 weeks of video footage.
At each of the sites recordings were made by three synchronized cameras -two opposing high-resolution colour cameras and a third thermal camera as shown in Fig 1. Two RGB cameras provide a better perspective of the filmed intersection allowing for either a general view and a close-up of a pedestrian crossing or two different angles of observation of the same spot. The thermal camera is aligned with one of the RGB cameras and is used for easier detection of vehicles and pedestrians. Another advantage of using a thermal camera is that the acquired images do not allow for recognition of neither peoples' faces nor vehicle number plates -thus alleviating any privacy concerns. As part of traffic conflict investigations described in this paper, three signalised intersections in Warsaw were selected for video recording. This selection was based on available accident records and the following criteria: -all the crossings were located on four lane roads with a median or a pedestrian refuge island, -during green signal pedestrians were in conflict with turning vehicles, -there were at least 2 registered crashes within last five years at an intersection to be selected; -these crashes involved a motor vehicle and a vulnerable road user; -they resulted in a fatality or serious injury.
The most frequent types of crashes that meet the above criteria involve a right/left turning motor vehicle and a pedestrian/cyclist crossing the road at a zebra or cycle track crossing at an intersection [3], as shown in Fig. 2.
Two such intersections in Warsaw were filmed for 3 weeks (Fieldorfa-Meissnera, PL3 and Fieldorfa-Perkuna, PL4) and one was filmed for an entire year (Wałbrzyska-Harcerzy Rzeczpospolitej). The size of the recorded data for one site is approximately 60-100 GB per day depending on the particular site characteristics (filmed area, traffic intensity, weather etc.) as well as on video compression used.

Fig. 2.
Traffic movements investigated in the study; pedestrians are marked with blue, cyclists with green and motor vehicles with red arrows.

Identification of traffic encounters
Based on the video recordings, the relevant encounters between motor vehicles and VRUs (pedestrians and cyclists) were identified and analysed using two dedicated applications: RUBA and T-Analyst.
It is claimed that causes and the process of development are similar in case of conflicts and accidents. As conflicts are much more frequent it is easier to collect a dataset large enough for valid statistical analysis. The proposed approach is to identify factors that are related to the causes of accidents by focusing on the process of conflict formation, which will allow to extract the cause-and-effect chain leading to an accident.

RUBA
First, it is necessary to reduce hours of video footage to short clips showing only encounters between vehicles and VRUs. This can be accomplished with the help of RUBA (Road User Behaviour Analysis) which was developed within the InDeV project [22]. RUBA enables automatic detection of pedestrians, cyclists and motor vehicles with the help of virtual motion detectors (Fig.  3).
Detectors can be defined in any part of the image and can have any size. There are two general types of detectors: presence and motion. The first type is triggered when a road user appears within its bounds, while the second type detects road users passing through it [22]. Direction and the range of angles of motion to be detected can be preset when creating a motion detector. For example, in Fig. 3 two motion detectors were defined (red areas): a pedestrian/cyclist detector along the median and a vehicle detector for vehicles turning right. Detectors can be made to work in pairs: e.g. an encounter of interest takes place when the second detector is triggered not later than 5 seconds after the first one. This approach is convenient as it can detect more complex situations such as a vehicle passing within a given time behind a VRU. At the same time events that are too distant in time are ignored. It is important, however, to set the time gap of the paired detectors correctly as changing it later requires reprocessing of the video recordings which is time consuming. Another option is to create single detectors for each of the road users, analyse them separately and combine the results later based on their time stamps. This method generates far more events but is much more flexible in further analysis as it allows to change the time gap between the consecutive detector triggers without the need to process the video recording again. It should be also noted that data processing is faster with single detectors than with double detectors (similarly, processing of small detectors are less time consuming than large ones, which is obvious).
This semi-automatic approach to conflict detection requires an expert to define detections zones and optimize detectors' parameters. It is a tedious and timeconsuming process, but once it is done and the detection results compared with the expert's results are satisfactory, the whole process of conflict detection is performed automatically. Depending on the CPU used, it runs several times faster than real time and processing of a week-long recording can be accomplished overnight.

T-Analyst
At the next stage of the processing encounters detected by RUBA are being analysed in T-Analyst [23] in order to assess their severity. The program allows to determine trajectories of road users and their movement parameters such as speed and direction (Fig. 4).
The process of trajectory creation is semi-automatic: as consecutive video clip frames are being displayed, it is necessary for an operator to position a wireframe 'cage' of a car on top of that car image (see Fig. 4). The 'cage' can be scaled and rotated so as to be best aligned with the actual object in the video frame. This is done every 4 frames to achieve a reasonable accuracy. The process has to be repeated for every road user involved in an encounter.
Once the trajectories of the two road users who are potentially in a conflict are determined, several indicators of their proximity in time and space are automatically calculated by the program. The most important of these are: -TTC (time-to-collision) is the time before two vehicles collide if they continue at their present speed and on the same path [24]; -PET (post-encroachment time) is calculated as the time between the moment the first road user leaves the path of the second one and the moment that the second one reaches the path of the first one; PET indicates the extent to which they missed each other [25]; -T2 min -minimum expected arrival time of the second (later) road user to the potential collision point [19].

Practical problems
Successful conflict detection depends primarily on proper positioning of cameras. Cameras are mounted on lamp posts -lack of lamp post in the preferred place on the pavement or in the median and using a distant lamp post may lead to numerous vehicle-pedestrian or vehiclevehicle occlusions. This results in no detection of one of the road users and as a consequence, in the lack of detection of the actual event that took place (false negative error).
Another reason for not detecting of road users is that vehicles and/or pedestrians are moving too slowly in the video. The minimum value of detectable speed in RUBA is related to video resolution. From this perspective, higher resolution footage from RGB cameras rather than thermal ones seems to be a better choice. However, this also implies a greater sensitivity to noise, which in turn leads to the abundance of detections (false positive error). False triggering of detectors also occurs when vehicles or pedestrians stop within the detector area and trigger it several times. Detectors can also be triggered by image noise or irrelevant objects such as birds. Therefore it is crucial to choose the right resolution and to set the noise level accordingly. Moreover, detectors applied to RGB footage can be falsely triggered by shadows of people and vehicles -the use of thermal cameras eliminates this problem.
It is difficult to build detectors registering only vehicles moving from certain origins and/or to certain destinations. Setting up detectors in this manner is often impossible and they are triggered by all the vehicles including other than those desired in the analyses. This happens despite the introduction of directional detectors and is particularly frequent with relatively low noise level set in RUBA. In addition, it should be taken into account that small detectors are more sensitive and thus more susceptible to false triggering than relatively large detectors. The list of sources of error resulting in false and missed detections is presented in Table 1. If the camera is mounted too far away from the observed area, the perspective distortion and small size of the recorded vehicles make aligning a wireframe model with the object's image quite difficult. This is particularly evident with vehicles approaching the camera straight-on and causes a large alignment error, which significantly affects further results. In addition, the lack of stability of the lamp post or camera mast may cause vibrations and/or shifts of the image that need to be eliminated or taken into account, especially with long video footage spanning many months. Any errors in the recordings, missing frames, etc. may degrade the quality of the calculated parameters.
Such video based applications and analyses require fast computers and adequate resources for data acquisition and storage (a year-long Full HD RGB video footage from a single camera requires several terabytes of storage space). Video analysis is a CPU intensive task and can take a very long time, especially in the case of high resolution RGB footage. Colour video provides more visual information and turns out to be more reliable than thermal cameras that may become useless due to extreme weather conditions such as heat. During hot weather, heated asphalt is just as bright as pedestrians, which prevents automatic detection of people. Moreover, large surface detectors require longer analysis than small ones. It is the object trajectory analysis, however, that is the most time-consuming stage of the entire processing chain. This is due to the fact that trajectories of conflict participants are identified and entered manually to the system.

Preliminary analysis results
Preliminary analysis was carried out on two sites in Warsaw that were filmed for three weeks. In the first stage of video processing, a sample 24-hour video recording from each site was analysed manually. The objective was to identify and characterise all vehicle-VRU encounters, not only conflicts, in order to create a reference base (called 'ground truth') for evaluating results of the future automated processing. This approach was used to ensure that the errors in automatic conflict detection are minimised. An encounter was defined in the following way: (a) a situation when a vehicle and/or a VRU perform an evasive manoeuvre; (b) a situation when a vehicle enters a crossing area while a pedestrian or cyclist is still on it. Table 2 shows the results of 24-hour 'ground truth' manual video analysis for two Warsaw intersections: PL3 and PL4. At PL3 site 366 vehicle-pedestrian and 69 vehicle-cyclist encounters were identified. At PL4 site the number of vehicle-pedestrian encounters was slightly smaller and amounted to 307. There were only 5 vehiclecyclist encounters. The selected video sequences were input to T-Analyst for determination of road user trajectories. In case of each encounter wireframe car/cyclist/pedestrian models were aligned manually with objects on the screen (as described in section 4.2). Table 2 shows that T-Analyst was able to calculate TTC for about 40% of vehicle-pedestrian encounters. Apparently in the remaining cases the two road users were not on a collision course. On the other hand PET values were calculated for almost all the encounters. Fig.  5 shows the frequency distribution of TTC for vehicle-pedestrian encounters. There were no serious conflicts as all the TTC values were greater than 2.0 seconds.    6 shows the cumulative distribution of PET values for both vehicle-pedestrian and vehicle-cyclist encounters. There is a distinct shift between observed pedestrian PET values at the two sites: the median value at PL4 is 0.34 seconds smaller than that at PL3. As expected there are more conflicts at PL4 than PL3. This may be due to the geometry of the intersection, but at the present stage of the research it cannot be clearly indicated. Apart from geometrical differences, other factors such as traffic intensity, speed of left-turning vehicles and pedestrian pace should be taken into account.
There is an even bigger difference between pedestrian and cyclist PET values at intersection PL3. Median PET value for cyclists is almost 0.5 seconds smaller than for pedestrians (Table 2) and the cumulative curves are clearly shifted. This corresponds with a higher proportion of dangerous situations for cyclists (7.2%) than for pedestrians (1.6%). A possible reason for this is that vehicles cross the cycle lane first (see Fig. 3) and then move across the pedestrian zebra. This means that they are likely to pass behind a cyclist within a shorter time than behind a pedestrian.

Conclusions
RUBA and T-Analyst are useful tools for supporting detection and analysis of conflicts registered by vision systems. Unfortunately, both applications require an expert knowledge (RUBA) and significant human participation (T-Analyst). However, recent advances in the field of Artificial Intelligence, and particularly in Deep Learning and Convolutional Neural Networks, allow one to hope that the current semi-automatic conflict analysis will be fully automated in the near future [26].
Preliminary characteristics of encounters between pedestrians/cyclists and motorised vehicles have been developed. There were no serious conflicts identified using TTC as all the TTC values were greater than 2.0 seconds. However, when using PET as the criterion (PET smaller than 1.0 second) there were quite a few dangerous situations indicating a possible conflictbetween 1.6% and 7.2% of all encounters. The statistical distributions of encounter parameters such as time-tocollision (TTC) and post-encroachment time (PET) will be used in the development of appropriate safety indicators.
The study is part of research project InDeV sponsored by the European Commission under grant agreement No. 635895.