1. Introduction
According to previous research [
1,
2,
3,
4,
5,
6,
7,
8], traffic accidents are perceived as a social disaster that causes human, economic, and social damages. Every year, many people experience human losses such as death or injury due to traffic accidents, causing emotional distress to accident victims, their families, and society [
9]. In addition, traffic accidents entail various expenses, such as medical bills, rehabilitation costs, and insurance payouts for accident settlement. Indirectly, traffic accidents can cause congestion, leading to fuel waste and secondary accidents for surrounding vehicles, further increasing the economic burden [
10]. Moreover, traffic accidents generate recovery costs at the national level and require expenses for policy-making and implementation related to traffic safety, posing significant financial and social burdens to society [
11]. Therefore, reducing the scale of damages through effective traffic accident prevention measures is crucial to alleviate the overall burden on society and the nation.
Numerous studies have been conducted to analyze factors related to traffic accidents. Generally, traffic accidents are associated with various factors, including environmental factors (vehicle and road conditions), road characteristics, human factors, weather, and lighting conditions, which are widely recognized. In [
12], It has been identified that the severity of injuries among motorcycle riders is associated with factors such as speed limits, collision-related factors, driver-related factors, road characteristics, and weather conditions. In one paper [
13], an analysis of the leading causes of road traffic accidents in Hail revealed that 67% were caused to human factors, 29% by road conditions, and 4% by vehicle defects. Ref. [
14] analyzed the traffic accident mortality rate on roads in Selangor and Perlis states based on data from 2013 to 2017 and explained that traffic volume, physical road structure, and user concentration are the main factors leading to traffic accidents in certain areas. In another study [
15], the authors explored the behavioral factors that pose a higher risk of causing severe or fatal injuries to drivers, selecting four factors related to traffic accidents: vehicle, road, environment, and human factors. In Ref. [
16], the study investigated the effects of various factors on nighttime traffic accidents and identified low-light conditions as a significant factor. Specifically, the severity of nighttime vehicle accidents was found to be influenced by accident location, accident type, and the presence of median barriers.
On the other hand, directionality can be considered to be one of the factors associated with traffic accidents. All roads have a Northbound and Southbound direction, and traffic accidents can be biased toward one of these directions. In cases where there is a significant disparity in the ratio of traffic accidents between two opposing directions (Northbound and Southbound) at a specific location, it can provide an opportunity to analyze the factors contributing to the difference in the number of accidents by direction. This analysis helps improve traffic accident safety by exploring additional environmental factors. In certain advanced countries, such as the United States, investigators record directional information for all traffic accidents. However, many countries, including South Korea, still do not collect data on the driving direction of traffic accident vehicles. Consequently, the absence of such data prevents the establishment of traffic policies based on the Northbound and Southbound directions of the roads. It can lead to inefficient allocation of national budgets and police resources, resulting in inefficiencies within traffic safety policies. Moreover, it may cause delays in policy-making for high-risk areas.
In this study, we propose the driving direction of traffic accident vehicles as a new factor. Currently, the Korean National Police Agency (KNPA) in South Korea does not collect information on the driving direction of traffic accidents. Therefore, it is necessary to estimate the driving direction of the vehicles involved in traffic accidents using the data collected related to the directionality of traffic accidents. First, we set the latitude and longitude of the vehicle involved in the accident as the accident location. Second, since the KNPA traffic accident data does not include information on the destination or destination point, we extract this information from the “Report” field, which records the accident details. Since the pattern of writing a report is generally fixed, we use Named Entity Recognition (NER) to extract the destination. Third, we convert the destination to latitude and longitude coordinates through geocoding. Finally, we estimate the directionality (Norhthbound or Southbound) of the vehicle involved in the traffic accident by calculating the angle between the accident location and the destination. In the experimental results, we collected 5181 data items reported to the police in the Chungcheongnam-do region, which had a high incidence of traffic accidents, and estimated the directionality for all incidents. We also visualized the direction-specific occurrence of traffic accidents in each Administrative District (AD) using a cumulative histogram on a map. Analyzing the histograms by AD, it was found that in Chungcheongnam-do, accidents occurred more frequently in the Southbound direction.
The remainder of this paper is organized as follows.
Section 2 explains the analysis of the KNPA’s traffic accident dataset.
Section 3 describes the proposed method and each procedure in detail. The experimental results are described in
Section 4, and the discussion and conclusions follow.
2. Analysis of Traffic Accident Data from KNPA
In collecting traffic accident data, the KNPA relies on traffic investigators who collect information related to the accident, such as the coordinates of the accident location, photographs, and a general summary. Given that traffic accidents are typically the result of a complex set of factors, the investigation process records data related to the environmental, temporal, vehicle, and human-related factors (
Table 1).
First, environmental factors that may affect traffic accidents include road-related factors such as the absence of road markings, faulty signs or traffic lights, slippery road surfaces, and poor lighting. The conditions of the road, such as its width, length, slope, and position of intersections, can also influence the likelihood of an accident [
17,
18,
19,
20,
21]. Weather conditions such as rain, snow, fog, wind, and dust can reduce driver visibility and make vehicle control difficult, ultimately increasing the risk of accidents [
17,
19,
21]. Additionally, animals on the road, particularly at night, can create unpredictable situations for drivers and increase the risk of accidents. The KNPA records information related to environmental factors such as road surface conditions (paved-dry, paved-wet/moist), weather conditions (clear, rainy, cloudy), and others.
Second, when traffic volume is high during rush hour, the risk of accidents increases [
22,
23]. Moreover, drivers commuting during sunrise or sunset are vulnerable to accidents. Sunlight is very intense and can directly shine in drivers’ eyes, reducing their visibility and increasing the likelihood of accidents. Additionally, nighttime driving reduces visibility, making it difficult for drivers to detect sudden obstacles, pedestrians, or other vehicles. Furthermore, drivers tend to be less focused and more stressed at night, which can lead to fatigue and impaired concentration, ultimately leading to an increased risk of accidents [
24]. The KNPA records information related to time factors, such as the date, time, day of the week, and whether the accident occurred during the day or at night.
Thirdly, the size and type of vehicles involved in accidents are also related to accident risk [
25]. Collisions between large and small vehicles often result in more significant damage to the smaller vehicle. Additionally, technical failures such as faulty brakes, steering systems, tires, lights, seat belts, and other vehicle-related issues can lead to accidents. The KNPA records information related to the types of vehicles involved in accidents, including construction equipment, pedestrians, passenger cars, buses, bicycles, motorcycles, and others.
Finally, human factors related to the driver include physical, cognitive, and psychological factors. Physical factors include the driver’s health status, sleep deprivation, physical discomfort, and illness [
26]. Cognitive factors include the driver’s attention, concentration, experience, and judgment. Psychological factors include stress, anxiety, depression, anger, etc. These factors interact in complex ways, ultimately increasing the risk of accidents. The KNPA records information related to human factors, including attention status, environmental factors, psychological factors, driving behavior, vehicle operation errors, and mental and physical health status.
3. Proposed Method
In this paper, we propose a method to estimate the driving direction of traffic accident vehicles. To estimate the driving direction, basic information such as the location and destination of the accident site is required. Currently, there are no separate fields related to origin and destination in the traffic accident data collected by the KNPA, except for the accident location information (address, latitude, longitude). Therefore, we utilize NER by extracting the destination from the text content of the “
Report” field written by the investigator. We also use geocoding to convert the destination in text format to latitude and longitude coordinates. Finally, we calculate the angle between the incident location coordinates and the goal coordinates to estimate whether the person was traveling Northbound or Southbound. The overall process of the proposed method is shown in
Figure 1.
3.1. Problem Definition
All roads have both Northbound and Southbound lanes, and depending on the environment of the road, traffic accidents can occur more frequently in specific directions. In
Figure 2, the cars in the Southbound direction can drive without any problem. Conversely, in the Northbound direction, unpaved roads can serve as potential factors leading to traffic accidents. On paved roads, drivers rely on their line of sight to confirm the presence of vehicles on unpaved roads and make a right turn. Similarly, on unpaved roads, drivers must rely solely on their line of sight to merge onto paved roads. If obstacles obstruct the line of sight, such as trees, seeing other vehicles on different roads can be challenging.
To prevent traffic accidents, departments in charge of road safety, such as the police, need to analyze which direction traffic accidents are more likely to occur based on the environmental conditions in the Northbound and Southbound directions.
Some countries, such as the North Carolina Department of Transportation’s Division of Motor Vehicles in the United States, record the directionality of traffic accidents based on compass direction (east–west-south–north) [
27]. On the other hand, some countries, including Korea, do not collect such directional information; they only collect data such as the date, location, environmental conditions, description, type, and cause of the occurrence in order to collect statistics necessary for developing traffic safety measures and assessing the risk of traffic accidents. In these countries, as there is no policy requirement to collect directional data, there is a need to estimate the directionality from the available information in other fields being collected.
3.2. Data Preprocessing
Since investigators compile traffic accident data, it can contain duplicate entries, missing data, and erroneous information. In addition, even for a single accident, multiple entries may be recorded depending on the specific circumstances of the accident. To address these issues, duplicate data with the same “Case Number” is removed, and missing data due to omitted “Report” information is excluded from the analysis. For example, suppose the “Report” field is missing. In that case, information about the destination cannot be obtained and, therefore, must be excluded from the analysis.
3.3. Query for Information on the Location of Vehicle Traffic Accidents
The location of a traffic accident can be extracted from the “
Address”, “
Latitude”, and “
Longitude” fields. The “
Address” field contains textual address information about the actual location of the traffic accident. The “
Latitude” and “
Longitude” fields include direct coordinates information
about the location of the traffic accident. Geocoding is required to convert the address information in the “
Address” field into coordinates. Geocoding is a crucial part of text geoparsing, which involves several tasks. Specifically, it refers to converting the textual value of identified places into latitude and longitude coordinates on a map [
28]. The following steps are performed to obtain coordinate values using geocoding: First, a search is made of various address databases of cities and buildings to find the most similar address. Then, the geocoding engine converts that address to latitude and longitude coordinate values. Therefore, in this paper, the traffic accident location
was defined by directly using the “
Latitude” and “
Longitude” fields, which can be used directly without the need for a separate conversion process.
3.4. Rule-Based Named Entity Recognition of Arrival Destinations
In a traffic accident, a police officer arrives at the scene with coordinates and records information related to the accident. Detailed records of the accident are recorded in the “Report” field. The following are actual “Report” cases recorded by a police officer:
(Case 1) Car #1 was traveling on a one-way road from the direction of the Yeokmal Intersection toward Bukil on the first lane of the road. At the accident site, it collided with the rear of Car #2, which was traveling ahead of it, with the right front of Car #1. As a result, Car #1 caused a traffic accident resulting in personal and property damage and then fled the scene without taking any action.
(Case 2) Car #1 was traveling from Gongju toward Seocheon on the second lane of the two-lane road when it collided with the left side of Car #2, which was parked on the second lane of the road due to a single preceding accident, with the front of Car #1. The accident caused the driver of Car #2, who was sitting in the driver’s seat outside the car, to be hit.
(Case 3) Car #1 was traveling on the first lane (excluding the left turn lane) from Cheonan toward Gongju, and when it collided with the rear of Car #2, which was traveling ahead of it, Car #2’s driver fell onto the first lane, causing Car #3 to collide with Car #2’s driver.
(Case 4) Car #1 was traveling on the third lane of the three-lane one-way road toward Seoul direction when it collided with the rear of Car #2, which was traveling ahead of it, and then came to a stop on the third lane and the shoulder of the road at around 1 o’clock (first accident). Car #3, traveling on the third lane toward the left side of Car #1 and its driver, collided with Car #1, who had exited the car and proceeded on foot.
(Case 5) Car #1 collided with Pedestrian #2, who was crossing the crosswalk toward the direction of K-Mart from the direction of Shin-gwan Elementary School, with its right front side while traveling toward the direction of OO Bank from the direction of OO Mart.
(Case 6) Car #1 collided with Car #2, coming from the opposite direction and traveling straight through the signal while turning left.
In the given cases, the “Report” includes keywords such as ’banghyang (direction)’, ’bangmyeon (direction)’, and ’jjog (side)’, which allow us to estimate the accident vehicle’s path. Using these keywords, we can identify the information about the starting and ending points of the accident vehicle in the previous sentence (case 1 to case 3). The starting and ending points are expressed in various forms: such as location, business, building, and road names.
However, there are cases where the information about the starting or ending point is missing. In case 4, for example, information about the starting point is omitted. However, the information about the ending point is included. Second, there are cases where the location of the starting or ending point could be more explicit. In case 5, for example, it is difficult to estimate the exact destination because the country has identical place names, such as “OO Mart”. Therefore, it is necessary to estimate the destination based on the nearest location name to the accident site. The last extreme case is when important information is completely missing due to investigator negligence. In case 6, all information related to the path is omitted. In addition, cases where investigators write very brief reports or do not write reports at all, should be excluded from the analysis.
In this study, we use Rule-based Named Entity Recognition to estimate the destination. As seen in the case analysis, the destination is located between the ’banghyang/bangmyeon/jjog’ (noun) + ’eseo’ (postposition) and the ’banghyang/bangmyeon/jjog’ (noun) + ’eulo’ (postposition) associated with the destination. Algorithm 1 shows the method for extracting the destination from the “
Report”. First, we search for candidate phrases between the text-based list BD related to the starting point and the text-based list ED related to the destination after removing various symbols from the report. Finally, we set the shortest phrase among all candidates as the final destination.
Algorithm 1 Rule-based Named Entity Recognition. |
- 1:
procedure NER() - 2:
[“banghyang-eseo”, “bangmyeon-eseo”, “jjog-eseo”, “eseo”] - 3:
[“banghyang-eulo”, “bangmyeon-eulo”, “jjog-eulo”, “eulo”] - 4:
- 5:
for to do - 6:
for to do - 7:
if then - 8:
for to do - 9:
for to do - 10:
if then - 11:
- 12:
end if - 13:
end for - 14:
end for - 15:
end if - 16:
end for - 17:
end for - 18:
- 19:
for do - 20:
if then - 21:
- 22:
end if - 23:
end for - 24:
return - 25:
end procedure
|
3.5. Extracting Accident Vehicle Destinations Using Geocoding
This paper estimates the destination coordinates using the text-based destination information extracted from NER. This paper estimates the destination point through two stages of geocoding processes.
In the first step, if the extracted destination from NER represents a specific place name such as ’Busan’ or ’Seoul’, the geocoding engine directly outputs the corresponding coordinates . However, there are multiple locations with the same name, such as apartments or stores. In that case, obtaining specific coordinates from the geocoding engine becomes impossible. In such cases, it is reasonable to assume that the destination is located in the nearest area to the accident location.
In the second step, geocoding is performed in a restricted area by providing additional information about the destination. The reverse-geocoding engine provides text address information for the accident location coordinates
. By adding the “do” and “si” information about the area to the destination and inputting it into the geocoding engine, reliable location information can be obtained. For example, assuming a traffic accident in “358, Sagiso-dong, Dangjin-si, Chungcheongnam-do”, if the extracted destination from the NER results in “Himart”, the geocoding engine cannot return any results. Therefore, by reverse-geocoding the accident information
, “Chungcheongnam-do” and “Dangjin-si” can be additionally obtained. A new query, “Himart, Dangjin-si, Chungcheongnam-do” is created. If multiple candidate destination points are returned, the geocoding engine sets the closest point to
as the final
using the second geocoding.
Figure 3 illustrates the entire geocoding process.
3.6. Driving Direction Estimation of Vehicles Involved in Traffic Accidents
Finally, we calculate the angle between and to estimate the driving direction that includes the Northbound and Southbound directions.
The arctan obtains the angle using the slope values of
,
coordinates. However, when calculated by the slope
, the relative position of the destination location that changes according to the sign of the coordinate value is not reflected. As a result, the relative position (
) of
in
is not known precisely, so it is impossible to distinguish between Northbound and Southbound. Therefore, we use the
function, the inverse function of
, and provide the angle in the range of
to
to accurately determine which quadrant the angle belongs to [
29]. The angle
between the two points is defined as follows:
In Equation (
1), the value obtained from
is in radians, so we multiply it by
to convert it to degrees. The distances between the latitudes and longitudes of the two points,
and
, are calculated as follows:
Finally, the final driving direction is determined as follows:
In Equation (
3), if
is between 0 and 180, it is Northbound; if it is between −180 and 0, it is Southbound.
Figure 4 illustrates how the driving direction is determined based on the angle between
and
in the first quadrant. The driving direction is Northbound for
and
in the first quadrant because their relative position is
and for
in the second quadrant because its relative position is
. On the other hand, the driving direction is Southbound for
in the fourth quadrant because its relative position to
is
. Finally, for
in the third quadrant, the relative position is
, so the driving direction is Southbound.
5. Discussion and Conclusions
When formulating traffic safety policies, various statistical data on traffic accidents can be considered, including road conditions, weather conditions, traffic volume, and vehicle types. The directional aspect of traffic accidents can also be critical in formulating traffic policies. Analyzing high-risk roads by direction for each local jurisdiction can prevent the waste of budget and law enforcement resources, allowing proactive and direct use in traffic planning and improving transportation facilities. It can also influence the formulation of traffic safety policies. For example, improvements can be made to driving conditions that reduce driver visibility, such as curvy roads or hills. It can be used to proactively remove obstacles, such as trees or facilities that pose a threat to traffic safety.
This paper analyzes the likelihood of traffic accidents occurring based on the environment of the Northbound and Southbound directions. It proposes a method for estimating the driving direction of traffic accident vehicles using accident and destination information. We directly extract the latitude and longitude fields from the traffic accident data to extract the accident location. To extract the destination, we first use Named Entity Recognition to remove destination information from the “Report” field written by the investigator. We then use geocoding technology to convert the destination to latitude and longitude coordinates. Finally, we estimate the driving direction of the Northbound and Southbound directions by calculating the angle between the accident location and destination coordinates. In the experimental results, we analyzed 5181 traffic accident data from Chungcheongnam-do from 2016 to 2018. We examined the directionality of traffic accidents by AD. After the preprocessing stage, we analyzed 3593 data. We visualized the directionality information of the ADs at the village/town/district level on a map. The analysis revealed that Southbound traffic accidents accounted for 57.25%, higher than Northbound traffic accidents. In some areas, Southbound traffic accidents were as high as 71.42%. However, there were limitations, such as the need for more accurate direction information in some areas, making it difficult to estimate the directionality of traffic accidents.
The method proposed in this study can be effectively utilized for traffic safety enhancement and accident analysis, even in countries where traffic accident guidance records are not mandatory. The ideal solution is to establish a policy that obliges the recording of traffic accident directions. However, the proposed method is one of the alternatives available when these policies are not feasible. The proposed method can also be used to generate collision diagrams for traffic accident analysis. The crash diagram is an essential aspect of the crash report as the investigator can show the relationship that existed during the crash between the vehicle and the surrounding environment. According to DMV-349, written by the North Carolina Department of Transportation’s Department of Motor Vehicles, it is recommended to include the “direction of travel for each traffic lane” when drawing crash scene diagrams. Therefore, the proposed method can help create a collision diagram by effectively estimating directions by referring to accident points and destination points, even in scenarios where direction information is not documented.
On the other hand, the proposed method has some limitations. First, the utilization of estimated directional information raises concerns about reliability in terms of accuracy. Specifically, the proposed method estimates the destination information from the Report field. It selects the final destination as the closest location to the accident site in cases where a precise location is not found. As a result, the propagation of errors in multiple estimation stages compromises the overall reliability. Additionally, unlike the recording method used by the North Carolina Department of Transportation Division of Motor Vehicles, which includes four directions encompassing north, south, east, and west, the proposed method only considers the Northbound and Southbound directions, resulting in a limitation of limited directional information for traffic accidents. In addition, the proposed method generally overlooked various parameters related to traffic accident risk because it is intended only to estimate the direction.
From this perspective, future work will involve researching estimating directionalities in terms of angle units in addition to the compass directions. To accomplish this, a road navigation API can be utilized to generate a repository of road journeys, enabling the determination of the precise direction of movement from the accident site. Furthermore, there is a need for research on traffic safety from various perspectives, exploring risk levels based on multiple factors by integrating various parameters associated with traffic accidents. For instance, factors such as the speed of the vehicles and time of the day (off-peak and on-peak), and also traffic volumes during peak periods, can be incorporated in this regard. In addition, recent developments in “autonomous driving scenarios” are being researched based on existing traffic accident overview data [
31]. Therefore, the method defined in this paper, which uses rule-based NER, is expected to help develop new autonomous driving scenarios.