From Prediction to Prevention: Leveraging Deep Learning in Traffic Accident Prediction Systems

Jin, Zhixiong; Noh, Byeongjoon

doi:10.3390/electronics12204335

Open AccessArticle

From Prediction to Prevention: Leveraging Deep Learning in Traffic Accident Prediction Systems

by

Zhixiong Jin

^1,2

and

Byeongjoon Noh

^3,*

¹

Univ. Gustave Eiffel, ENTPE, LICIT-ECO7, F-69675 Lyon, France

²

Urban Transport Systems Laboratory (LUTS), École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland

³

Department of AI and Big Data, Soonchunhyang University, 22 Soonchunhyang-ro, Asan-si 31538, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(20), 4335; https://doi.org/10.3390/electronics12204335

Submission received: 13 September 2023 / Revised: 17 October 2023 / Accepted: 17 October 2023 / Published: 19 October 2023

(This article belongs to the Special Issue Machine Learning in Recommender Systems and Prediction Model)

Download

Browse Figures

Versions Notes

Abstract

:

We propose a novel system leveraging deep learning-based methods to predict urban traffic accidents and estimate their severity. The major challenge is the data imbalance problem in traffic accident prediction. The problem is caused by numerous zero values in the dataset due to the rarity of traffic accidents. To address the issue, we propose a grid-clustered feature map with the ideas of grids and cells. To predict the occurrence of accidents in the grid, we introduce an accident detector that combines the power of a Convolutional Neural Network (CNN) with a Deep Neural Network (DNN). Then, hierarchical DNNs are supposed to be an accident risk classifier to estimate the risk of each cell in the accident-occurrence grid. The proposed system can effectively reduce instances with no traffic accidents. Furthermore, we introduce the concept of the Accident Risk Index (ARI) to better represent the severity of risk at each cell. Also, we consider all the explanatory variables, such as dangerous driving behaviors, traffic mobility, and safety facility information, that can be related to traffic accidents. To improve the prediction accuracy, we further take into consideration all the explanatory variables, such as dangerous driving behaviors, traffic mobility, and safety facility information, that can be related to traffic accidents. In the experiment, we highlight the benefits of our method for urban traffic accident management by significantly improving model performance compared to the baselines. The feasibility and applicability of the proposed system are validated in the data of Daejeon City, Republic of Korea. The proposed prediction system can dynamically advise and recommend commuters, traffic management systems, and city planners on alternatives, optimizations, and interventions.

Keywords:

traffic safety; traffic accident prediction system; accident severity estimation; deep learning

1. Introduction

With the rapid growth of urbanization, urban road safety has emerged as one of the most pressing social issues around the world [1,2]. According to the Organization for Economic Co-operation and Development (OECD) statistics, the road safety situation has exhibited a negative trend worldwide over the last two decades [3]. Road traffic accidents lead to a significant problem in both human casualties and the social economy of the nations [4,5]. In 2013, the World Health Organization (WHO) reported that there are from 3.6 to 18.8 deaths per 100,000 individuals are involved in vehicle crashes in China [6]. Likewise, in the Republic of Korea, approximately 200 thousand accidents and 7.3 casualties per 100,000 people were recorded in 2018. This trend has continued to increase in the recent years [7]. Therefore, the ability to understand and forecast potential accidents in the future (e.g., where, when, or how) is very useful not only to public safety stakeholders (e.g., police), but also to transportation administrators and individual travelers. To be more specific, the correct understanding of prediction of the traffic accidents can dynamically advise and recommend commuters, traffic management systems, and city planners on alternatives, optimizations, and interventions. By avoiding high-risk areas based on our predictions, we aim to reduce the number of accidents, moving from mere prediction to active prevention.

The severity of traffic accidents is influenced by various factors, including human factors and road environmental factors [8,9,10]. Extensive research has been conducted to identify the significant factors that contribute to the severity of traffic accident injuries [11,12,13,14,15,16]. For example, the authors in [12] proposed a framework for analyzing and predicting the injury severity of traffic accidents, considering factors such as road types, weather conditions, and lighting. They utilized a stacked sparse autoencoder to incorporate comprehensive factors in the analysis of traffic accidents. Similarly, the authors in [13] examined factors such as fatigue, gender, and internal/external distractions (e.g., rushing to a destination, listening to music) and assessed their impact on perceived and observed aggressive driving behaviors through surveys and simulations. Such analyses provide insights for appropriate responses, including the enactment of laws, infrastructure repairs, and the deployment of additional speed cameras. In this research, we also comprehensively address the various factors that contribute to traffic accidents and the severity of risks.

Despite the fact that it is essential to determine the influencing factors of risk, proactive actions to prevent traffic accidents should be performed ahead of time. In recent years, deep learning methods have gained popularity as powerful techniques for extracting information from big data and have demonstrated their efficiency in several applications, typically in prediction tasks [17,18,19,20,21]. Therefore, much research has been conducted on forecasting road traffic accidents and predicting injury severity in urban areas using numerous types of data [22,23,24,25,26,27,28,29,30]. For example, the authors in [22] proposed a traffic accident casualty prediction model using neural networks and data mining techniques. Specifically, they used historical data such as the floating population, number of registered cars, and number of accidents to predict the casualties of traffic accidents. Similarly, in [23], the research presents a spatio-temporal deep learning model to predict citywide short-term crash risk using multiple data such as land use, weather, and crash risks. The authors in [24] also proposed a traffic accident count prediction model using a Bayesian hierarchical approach. The proposed model can rank the candidate sites, called hotspots, according to their potential risks for some future time period, and further provide simple diagnostics to validate the predictive capability of the proposed model. In [26], the authors introduced an end-to-end deep learning model that integrates satellite imagery, GPS trajectories, road maps, and accident histories to predict traffic accidents. The authors in [27] constructed a Long Short-Term Memory (LSTM) network-based model to predict the probability of traffic accidents based on spatio-temporal patterns of traffic accident frequency. The authors in [28] utilized logistic regression analysis on 400 sets of accident data from 10 major roads in Beijing to identify significant factors influencing traffic accidents and to develop an accident hotspot prediction model. The authors in [30] predicted the traffic risks as well as traffic speed and flow with the potential and broad usage of deep learning algorithms based on mobility data such as traffic data from infrastructure, trajectory data from vehicles, automatic fare collection devices widely deployed by urban transit systems.

In urban traffic accident prediction studies, one of the most crucial challenges is the sparsity of accident data, making it difficult to develop accurate prediction models [31,32]. Even though the global trend of urban traffic accidents tends to increase, traffic accidents are rare and infrequent events. Consequently, datasets for model training usually comprise a great portion of zero values, representing the non-occurrence of accidents. These data imbalances will cause a huge bias in model training and consequently affect the overall prediction performance [33,34]. Specifically, while such datasets might cause models to obtain high prediction accuracy, they also hide significant deficiencies in the actual prediction ability of the model. This imbalance not only misrepresents the efficiency of the model but also undermines its potential utility in real-world applications. Therefore, it is necessary for the prediction model to cope with such an imbalanced data problem to develop a high-performing model.

To address these challenges, this study introduces a novel system leveraging deep learning-based methods to predict urban traffic accidents and identify their severity, particularly under the imbalanced data environment. The proposed system not only forecasts the occurrence of urban traffic accidents but also estimates their severity as risk levels. In this study, we divided the whole target area into

100 \times 100

cells to serve as the fundamental units, which is beneficial to reduce the excessive zero values in the input and enhance the predictability of the occurrence of traffic accidents. To further reduce the effect of zero values, we aggregated cells into grids and used the power of Convolutional Neural Networks (CNN) to filter the non-accident grids. The CNN module, renowned for its spatial feature extraction capabilities, discerns intricate spatial correlations and dependencies within the urban grid. This spatial understanding is pivotal, given the heterogeneous distribution of traffic accidents. Subsequent to the non-accident cell filtering, the DNN module, which shows high performance in data classification, was utilized to evaluate the risk severity of each cell. By combining these architectures, we aim to harness both the spatial understanding of CNNs and the deep feature interpretation capabilities of DNNs to develop a high-performing prediction system. To improve the model performance, we further utilize large-scale datasets from a variety of sources in the urban area including mobility data (e.g., digital tachograph (DTG)-based risky driving behaviors, traffic flow and speed, etc.), and road environment data (e.g., information related to safety facilities, road infrastructure, geometry, etc.).

The proposed deep learning-based system has two main objectives. First, it aims to predict the occurrence of traffic accidents accurately, especially forecasting potential accident “hotspots” in urban environments. To achieve this, we introduce a grid-clustered feature map using concepts of grids and cells to deal with the data imbalance problem in training datasets. Throughout the feature map, we mitigate the bias problem, especially the high frequency of zero values existing in training datasets. This approach can effectively capture the different characteristics of urban areas and improve the model performance. In addition, we leverage various types of data from traffic accidents, urban mobility, and road safety facility data to enhance our model’s performance. While our primary emphasis remains on prediction capabilities, the potential applications within traffic recommendation systems are significant to overlook. Second, the proposed system estimates the severity of risks using the Accident Risk Index (ARI), which is based on accident attributes such as the number of fatalities, serious injuries, and minor injuries. The ARI is used to categorize the risk levels associated with a given set of data, ranging from level 0 (no accidents) to levels 1–4, representing varying degrees of accident severity. In addition to classifying risk levels based on actual traffic safety data, the proposed ARI also provides road users with a more intuitive and straightforward traffic safety condition.

This paper is organized as follows. Section 2 presents the data used in this paper. Section 2.1 introduces the concept of cells to represent the urban area. Section 2.2 presents the urban road traffic accident and accident risk index. Section 2.3 and Section 2.4 describe the urban mobility data road safety facility information. In Section 3, we introduce the methodology in this study. In Section 3.1, the overall system architecture is introduced. Section 3.2 presents the method to predict traffic accidents using an urban grid clustered feature map. In Section 3.3, how to estimate the risk level in each cell is presented. Section 4 describes the result of the paper. In Section 4.1, experimental design is introduced. Section 4.2 describes the experiment results, and a related discussion is presented in Section 4.3. Finally, in Section 5, conclusions and future works are presented.

2. Data Description

This section explains how to deal with various data and preprocess them for model training. To predict urban traffic accidents, we handle a variety of datasets, including traffic accidents, urban mobility, and road safety facility data. To be more specific, the Korean National Police Agency [35] released the statistics of Korean traffic accidents with severity information (e.g., death, serious injury, or slight injury). In addition, commercial vehicles (such as buses and taxis) which are registered with transportation corporations are required to equip the Onboard Unit (OBU). The equipment enables to acquisition of the drivers’ Digital Tachograph (DTG) data, including trip, location, and speed in real time. The top 11 dangerous driving behaviors, such as sudden start, sudden turn, and overspeed, have been identified by the Korea Transportation Safety Authority using the DTG data [36]. On the other hand, local governments are in charge of maintaining data on road geometry, such as the number of speed cameras, road signs, and school zone management. Some of them also run probe vehicles with OBUs deployed for the Cooperative Intelligent Transportation System (C-ITS). The proposed system is supposed to predict traffic accidents and their severity by leveraging dispersed data from each department. Figure 1 depicts the overall data preprocessing process.

2.1. Cell Representation of Urban Area for Accident Analysis

In addressing the task of traffic accident prediction, the selection of an appropriate scale is vital to both the precision and computational efficiency of the training model. In this study, we adopt a cell-based approach, where the entire geographical scope is divided into a matrix of

N_{G r i d} \times N_{G r i d}

cells, to serve as the units of analysis. This choice of scale offers several strategic advantages.

First, a much finer scale approach, such as a link-level analysis (the term link-level refers to a single road segment) although capable of providing detailed insights, comes with a significantly higher computational cost. Moreover, at the city scale, the link-based strategy may not be the most appropriate method of analysis, given that traffic accidents tend to concentrate in specific areas rather than evenly distributed across the entire road network. Consequently, utilizing link-level data could potentially generate a vast amount of non-accident data points, resulting in an imbalanced dataset that might affect the predictive ability and reliability of the model. On the contrary, the cell-based approach facilitates a more focused and efficient analysis by aggregating traffic data at the cell level. This not only improves the computational efficiency but also promotes a more balanced dataset. However, in the choice of cell size, if the cell is too large, even though we can reduce the occurrence of cells with very little or no accident data and increase the computation efficiency, we may lose spatial patterns and anomalies since critical localized events or conditions might be averaged out. On the other hand, smaller cells can capture very localized patterns, providing high-resolution predictions. Conversely, we may face increased data sparsity, with many cells potentially having zero or near-zero traffic accident events. It can also pose challenges in predictability, possibly leading the model to overfit noise or specific anomalies.

Therefore, in light of the above considerations, this research leverages a cell-based approach with appropriate size to address the complexities associated with traffic accident prediction at the city scale. The cell representation of the study area is shown in Figure 2. In this experiment, we discretized the study area into 100 by 100 cells (

N_{G r i d}

= 100). The width and height of each cell are approximately 230 m. The feature of the data used in the study is shown in Table 1. We assign these values to the respective cells. If there are several values in one cell, they are added together to represent the characteristics of the cell. For example, the total number of traffic accidents in a cell is determined by combining all the accidents that occurred in that specific cell.

2.2. Urban Road Traffic Accident and Accident Risk Index (ARI)

This subsection describes the representation to measure the severity of urban traffic accidents. In this study, we refer to it as the Accident Risk Index (ARI). The Korean National Police Agency keeps track of every road traffic accident and analyzes it with a range of criteria such as driver age, gender, and driving condition. Taking into account the statistics, we analyzed road traffic accidents in Daejeon City over the whole of 2019 (from 1 January to 31 December). In 2019, there were 8337 accidents in Daejeon city.

The number of accidents can be counted in each cell created using the Geographical Information System (GIS) from the previous step, and the associated spatial distribution of the traffic accidents is shown in Figure 3a. The figure shows that most traffic accidents happen in urban areas. Even though we can clearly identify the number of accidents on the map, it is difficult to clarify the accident risk on each cell. In other words, the total number of accidents in the cell might not be an appropriate measurement, since it does not reflect the severity of each accident and traffic volumes in certain cells. Therefore, we define the new measurement ARI to precisely measure the accident risk of the cells. The related expression is shown as follows:

A R I = \frac{w_{1} * D E A T H + w_{2} * S E R I + w_{3} * S L T W D}{\bar{V}}

(1)

where

D E A T H

,

S E R I

, and

S L T W D

represent the number of deaths, serious injuries, and slight injuries, respectively. We used a weighted sum approach to reflect the severity of each cell differently. According to the standards from Korea Transportation Safety Authority [36], the appropriate values are

w_{1}

= 1,

w_{2}

= 0.7, and

w_{3}

= 0.3. Additionally, the volume of traffic at each cell (

\bar{V}

) is utilized to normalize the summation appropriately. The distribution of ARI values in GIS-based is shown in Figure 3b. Compared to the previous one, the figure can better depict the accident risk in the target area. To be more specific, in the previous figure, traffic accidents are depicted as dispersed dots, which can make it challenging to identify consistent patterns or hotspots, especially when some accidents are rare occurrences. This scattered representation might not be as informative for decision-makers or urban planners aiming to prioritize areas for interventions. On the other hand, the ARI map aggregates this information, reducing data sparsity, and providing a clearer depiction of accident-prone regions or “hotspots.”. Therefore, the ARI map can offer a more focused and actionable insight into areas that consistently show higher accident rates.

2.3. Urban Mobility Data

This subsection describes the urban mobility data used in our experiment. We utilized basic traffic information, Digital Tachograph (DTG) data, and the top 11 dangerous driving behaviors exhibited by both commercial and general probe vehicles in this study. Specifically, the average speed and traffic volume are basic traffic information. All commercial and probe vehicles are equipped with onboard units (OBUs) for the purpose of gathering DTG data, which includes details about the trip duration, location, and speed. The top 11 dangerous driving behaviors can be determined from the DTG data based on specific rules. We hypothesize that urban mobility data, especially information on dangerous driving behaviors, is a contributing component influencing urban traffic accidents. Consequently, we incorporated these variables as features in our model to forecast accidents and assess the risk severity estimation. The DTG data includes GPS details logged every second, allowing us to graphically represent vehicle pathways, as depicted in Figure 4.

The top 11 dangerous driving behaviors include overspeed, overspeed time, sudden acceleration, sudden deceleration, sudden start, sudden stop, sudden left turn, sudden right turn, sudden u-turn, sudden overtaking, and sudden lane change. Figure 5 represents the process of extracting the top 11 dangerous driving behaviors from DTG data. These behaviors are visualized in Figure 6, illustrating the aggregation of all dangerous driving behaviors occurring in the cell. Similar to the previous figures, the dangerous drivings are predominantly concentrated in the urban area.

2.4. Road Safety Facility Information

In this subsection, we describe the utilization of road safety facility information in our study, a necessity arising from the inadequacy of warning signs in the target area [8]. The dataset includes a variety of safety facilities and road information, such as the location of traffic signals, controllers, various categories of warning signs, and CCTV installations. Furthermore, the dataset integrates land use information such as residential, commercial, industrial, and green zones. Similar to the previous processing procedure, we categorize each type of facility and undertake a quantitative analysis within individual cells.

3. Methodology

3.1. System Architecture

In this study, the primary goal is to develop a system that can predict urban traffic accidents and identify the severity at the cell level. Specifically, it is beneficial to choose the cell as a basic unit due to enhanced predictability. As illustrated in Figure 7, the proposed system consists of two primary components: an accident detector and an accident risk classifier. The accident detector is mainly designed to assess the likelihood that traffic accidents occur. The suggested model uses the transformed traffic accident history, urban mobility data, and road safety facility information as input, which is comprehensively described in the previous section. The accident risk classifier is responsible for categorizing risk levels based on the predicted Accident Risk Index (ARI). With risk levels ranging from level 0 (no accidents) through levels 1–4, which represent increasing degrees of accident severity, the ARI is employed as a measure of the severity of traffic accidents. The accident risk classifier employs ARI to categorize the risk levels associated with the given input data.

3.2. Predicting Traffic Accident Using Urban Grid Clustered Feature Map

The main goal of the proposed accident detector model is to predict the occurrence of accidents and filter out the non-accident area. The proposed model leverages a CNN-based model paired with a multi-layered feed-forward neural network (FFNN), symbolized as

C_{m} (\cdot)

and

f (\cdot)

, respectively. The related model structure is shown in the upper part of Figure 7. In previous studies, CNN has demonstrated remarkable efficacy in extracting features within the realm of computer vision [37,38,39,40]. Therefore, we applied the CNN-based model to extract features from the input data consisting of traffic accident history, urban mobility, and road safety facility data.

However, employing a conventional CNN-based model in traffic accident analysis presents two critical challenges. Initially, the prevalent occurrence of substantial zero values raises a significant concern. If we predict the traffic accident directly from the input data, a significant number of zero values might have a negative impact on the model performance. In other words, even when we utilize aggregated data at the cell level as our input data, there is still a high proportion of zero values, which are non-accidental cells. The second is the potential loss of local knowledge, because CNNs process input data from start to finish without taking specific local characteristics into account. Given that the suggested model targets urban-side issues, it is critical to take these factors into account during training.

To overcome these challenges, the accident detector adopts a new approach called a grid-clustered feature map. Similar to the previous approach described in Section 2, the urban area,

Z^{t}

, was divided into

N \times N

grids with notations of

z_{1}, z_{2}, \dots z_{N \times N}

, and each grid

z_{i}

consisted of the clustered

n \times n

cells at time t. This approach can improve CNN model training effectiveness and further reduce the issue of data imbalance. In this study, the overall urban area was divided into 10 × 10 (=100) grids, and the time unit is a day. Each grid

z_{i}

consists of 10 × 10 (=100) cells. The grid serves as the spatial unit for accident prediction, and all the features of the cells that make up the grid are utilized to define the grid’s features. This allows the model to capture specific regional characteristics and reduce the effect of training data that has an excessive number of zero values. The proposed model’s outputs can be successfully used to create the risk level estimation model.

The output of the accident detector provides a binary value in each grid, with 0 representing no accidents and 1 indicating at least one accident. If there are no accidents, the grid is rated as risk level 1. On the other hand, if the model predicts the accidents, the specific severity of the risk level for that grid will be categorized in the accident risk classifier. The overall process can be written as follows:

\begin{matrix} Z^{t} = {z_{i, t} | i = 1, 2, \dots, 10 \times 10, t = 1, 2, \dots, T} \\ z_{i} = {c_{1}^{i, t}, c_{2}^{i, t}, \dots c_{10 \times 10}^{i, t}} \\ c_{j}^{i, t} = {X_{u r b a n}^{i, t}, X_{r o a d}^{i, t}} \\ L (z_{i, t}) = y_{grid}^{i, t} = {0, 1} \end{matrix}

(2)

where

z_{i, t}

represents ith grid at time t. The

z_{i, t}

consists of 100 cell

c_{j}^{i, t}

. In addition, each

c_{j}^{i, t}

includes the feature sets of urban mobility data and road safety facility information, notated as

X_{u r b a n}^{i, t}

and

X_{r o a d}^{i, t}

, respectively. The ARI from traffic occurrence is notated as

L (z_{i, t})

or

y_{g r i d}^{i, t}

in ith grid at time t.

3.3. Risk Level Estimation in Each Cell

Based on the severity of accidents, the proposed system provides classified risk levels for each cell. If the accident detector cannot predict any traffic accidents in a certain grid, the cells in that grid are categorized as risk level 0. On the other hand, if it predicts the accident. we estimate the risk level of each cell in the grid from 1 to 4, which is defined by the quantiles of ARI values. Contrary to the accident detection model, the risk classifier divides each grid into cells and uses the features in each cell as input data to calculate the severity of the risk level.

The accident risk classifier is a sophisticated predictive model comprised three deep neural networks (DNNs), each designed to estimate specific levels of accident risk. The classification process proceeds in a sequential manner. Initially, the first DNN evaluates whether an input cell belongs to risk level 1 by outputting a binary value −0 or 1. If the output is 1, the instance is categorized under risk level 1. However, if the output is 0, the instance is passed on to the second DNN. This second DNN, in a similar fashion, determines whether the instance fits into risk level 2. If not, the data point proceeds to the final DNN, which then differentiates between risk levels 3 and 4. This hierarchical structure ensures a detailed and sequential classification of accident risk levels, ranging from 1 to 4. The overall process is depicted in the lower part of Figure 7.

In summary, we first propose the accident detector to predict whether or not an accident occurs within a specific grid unit, employing a binary classification system. Once an accident is detected, the procedure transitions to the second step, leveraging the results from the previous step. This subsequent step engages the accident risk classifier, a series of three deep neural networks (DNNs), to further estimate the risk level of the accident within the basic unit cell. This classifier assigns a risk level ranging from 1 to 4, thus providing a detailed gradation of the potential danger associated with an identified accident, facilitating informed and precise responses.

[Z_{i, t}, c_{j}^{i, t}, L (z_{i, t})] = \{\begin{matrix} [Z_{i, t}, \{X_{u r b a n}^{i, t}, X_{r o a d}^{i, t}\}, L (z_{i, t}) |_{\in \{0, 1\}}] & (In traffic accident prediction) \\ [\{c_{1}^{i, t}, c_{2}^{i, t}, \dots, c_{10 \times 10}^{i, t}\}, \{X_{u r b a n}^{i, t}, X_{r o a d}^{i, t}\}, L (z_{i, t}) |_{\in \{1, 2, 3, 4\}}] & (In risk level estimation) \end{matrix}

(3)

Table 2 presents a comprehensive overview of the model architectures and hyperparameters employed for our proposed models. For the Accident Detector, the model architecture consists of three convolutional neural network (CNN) layers and two fully connected (FC) layers. The Accident Risk Identifier integrates a deep neural network (DNN) configuration with three distinct layers. Both models consistently implement a uniform dropout ratio of 0.2 to avoid overfitting. The optimization strategy employs the Adam optimizer, set with a learning rate of 0.001. In addition, binary cross entropy was selected as the loss function.

We employed a hyperparameter tuning process to identify the optimal settings that yielded the best performance for our model. Specifically, we conducted and iterated a series of experiments to find the most suitable values for each hyperparameter. The values listed in Table 2 represent the configurations that maximized our model’s predictive accuracy and minimized the loss during training.

4. Result

4.1. Experimental Design

This subsection describes the experimental design to evaluate the proposed system, which uses a variety of data sources, including traffic accident data, urban mobility data, and road safety facility information, to predict urban traffic accidents and estimate risk levels. In this study, the main objective of this experiment is to find optimal models for predicting traffic accidents and estimating risk levels in urban areas. In this study, we used the dataset from the whole day in 2019 to predict traffic accidents and estimate daily risk levels. The basic unit of the time t is day. Specifically, our dataset covers data from 365 days, with each day divided into 10,000 spatial cells, resulting in a total of 3,650,000 data points. We split the supplied data into a training set and a test set for the validation, with a ratio of 0.8 and 0.2, respectively. To ensure that the model is tested on unseen days, we divided our dataset based on days that 80% of the days (292 days) were used for training and the remaining 20% (73 days) for testing.

First, we evaluated the performance of the accident detector for urban traffic accident prediction. We compared the performance of the proposed model with other baseline models, including Support Vector Machine (SVM), Linear Regression (LR), Naïve Bayes Classification (NBC), and Multi-Layer Perceptron (MLP). The performance of the accident risk classifier was then further assessed using the results that were obtained from the previous step. In this experiment, we classified the risk level on a scale of 0 to 4 using SVM and DNN as the baseline models.

Here are the detailed descriptions of baseline models.

Support Vector Machine (SVM) [41]: SVM is a supervised learning algorithm that aims to find the optimal hyperplane that best separates the data into classes. The method shows effectiveness in high-dimensional spaces;
Linear Regression (LR) [42]: LR is a linear approach to modeling the relationship between a dependent variable and one or more independent variables. It predicts the output based on the linear relationship with the input features;
Naïve Bayes Classification (NBC) [43]: NBC is a probabilistic classifier based on applying Bayes’ theorem with the “naïve” assumption of conditional independence between every pair of features;
Multi-Layer Perceptron (MLP) [44]: MLP is a class of feedforward artificial neural networks that consists of at least three layers of nodes: an input layer, a hidden layer, and an output layer. It is known for its ability to capture non-linear relationships of input and output data.

Meanwhile, as a classification problem, we converted the numerical values of ARI into categorical data that reflected risk levels from 0 to 4. These ARI values are grouped into quantiles with risk level 0 corresponding to the lowest quantile (numerical value 0.0), and risk level 4 being the highest quantile.

In this study, we used accuracy, precision, recall, and F1 score as evaluation metrics. In binary classification, the F1 score, which is calculated as the harmonic mean of recall and precision, is a widely used statistical measurement. The related expression is shown below.

F_{s c o r e} = \frac{2}{\frac{1}{r e c a l l} + \frac{1}{p r e c i s i o n}} = 2 \times \frac{p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}

(4)

4.2. Results of Traffic Accident Prediction and Risk Level Estimation

In this subsection, we present the results of two experiments conducted to evaluate the performance of our proposed system. The first experiment aims to validate the performance of accident prediction against other baseline models. In this task, the target unit to predict is the grid

z_{i}

, which is one of the main contributions of this study. The training set consists of about 29,200 grids (365 days × 0.8 × (10 × 10) grids), and the test set has 14,600 grids (365 days × 0.2 × (10 × 10) grids) in the traffic accident prediction task. The results of this experiment are presented in Table 3. The empirical results reveal that the proposed model significantly outperforms the comparative models in performance. One of the findings in the result is that even though the other models show high accuracy, the performance is low in the other measurements. Compared to the baseline models, the proposed model shows much better and more stable performance. One key insight from these findings is that CNN models are particularly adept at processing grid-structured input. Moreover, the proposed is beneficial to capture and consider the features of neighboring areas, contributing to their enhanced predictive accuracy.

In the next step, we estimate the risk levels exclusively for the cells in the non-accident grids, which is obtained from the previous results. We also divide the obtained data with the same ratio of 0.8 and 0.2 for model training and testing. Table 4 shows the result of risk level estimation in each cell with two comparative models. We first measure the model’s performance using accuracy. We conclude that SVM and MLP are inappropriate for application in target risk-level estimation tasks. On the other hand, the proposed accident risk classifier shows over 80% accuracy in our task. We further evaluate our proposed model with other measurements to examine the efficiency of the model and related results are shown in Table 5. We can find that the proposed model is also stable and shows high performance in the other measurements.

From the aforementioned results, it is evident that the proposed system is highly efficient in predicting traffic accidents and estimating risk levels. Its proficiency in analyzing grid-structured inputs and incorporating neighboring features enhances its predictive accuracy. In addition, a hierarchical accident risk classifier is also beneficial in multi-risk classification tasks. The proposed system can be efficiently used in the domain of traffic safety and management.

4.3. Discussion

The proposed system for predicting traffic accidents, estimating risk levels, and identifying risk sources is designed to provide valuable insights and tools for enhancing road safety. The system utilizes a variety of data sources, including traffic accident data, urban mobility data, and road safety facility information, to analyze and predict accident occurrences, and assess risk levels. In fact, although there are a variety of studies on predicting and analyzing traffic accident and their severity, to the best of our knowledge, it stands out as a pioneering attempt in utilizing a large amount and various data sources and implementing the actual system using deep learning-based models. Furthermore, the usage of cells as fundamental units instead of individual links can enhance the predictability of the occurrence of traffic accidents by reducing the number of zero values. To further reduce the occurrence of zero values, we aggregated cells into grids and used CNN to filter the non-accident grids.

Through a series of experiments, the system’s effectiveness is evaluated and compared with baseline models. The first experiment focuses on predicting traffic accidents using grid feature maps and compares different models, including SVM, NBC, and MLP. The basic unit in the experiment is the grid

z_{i}

. The results show that the proposed model utilizing grid feature maps outperforms other models, indicating the effectiveness of incorporating grid-represented input data in the proposed model.

The second experiment focuses on estimating risk levels for each cell, which is included in the grid

z_{i}

that has accidents. In this experiment, since our goal is to compare the application of the hierarchical approach and the direct classification, we use two classification models as baseline models: SVM and MLP. These models are highly adopted due to their widespread adoption and recognized performance in multi-label classification scenarios. The results showed that our proposed model shows high accuracy and effectiveness in the other measurements. These results reveal that hierarchical DNNs have the capability to simplify multi-task problems and improve overall performance.

There are several challenges that remain to be dealt with. The first one is related to expanding and applying the proposed model to other cities. The biggest challenge with expanding to other cities is the size of the cell and grid. While the initial focus has been on Daejeon City, it is important to note that other areas should utilize different standards, potentially being larger or smaller than the target area. In addition, the proposed methodology is specifically focused on traffic accident prediction. Other domains with different data characteristics might be required to adjust the methodology or might not observe the same efficiency. Moreover, it is necessary to collect more data with longer time periods to further capture the time-dependent characteristics of accident data, since the current data size is limited. Furthermore, the assumptions we made, and the models used, are based on the nature and distribution of traffic accidents. For domains where the underlying patterns, distributions, or influencing factors differ significantly, these assumptions and models might not hold. Moreover, the computational requirements of our approach, especially the handling and processing of spatial data, might not be suitable for domains with real-time or resource-constrained applications. In addition, we faced constraints in gathering comprehensive data related to these factors. Therefore, the proposed Accident Risk Index (ARI) calculation in this experiment did not directly incorporate geography and weather conditions. In addition, we will also consider employing feature selection techniques to further refine our model, optimizing the inclusion of relevant predictors and potentially enhancing overall predictive performance.

Overall, the proposed system offers a comprehensive approach to analyzing and addressing road safety issues. By integrating various data sources and advanced deep learning techniques, it shows the potential to be used in accident prediction to risk assessments. The system’s outcomes advance the field of road safety by informing decision-making processes, prioritizing interventions, and implementing effective measures to improve road safety in urban areas. In addition, the proposed system can be beneficial to commuters, traffic management systems, and city planners in making more safer and optimized decisions.

5. Conclusions

This study introduces a comprehensive system to predict traffic accidents and estimate risk levels in the urban area. The system takes into account various data sources, including traffic accident data, urban mobility data, and road safety facility information. It also uses the power of the deep learning method that shows the efficiency in extracting valuable information in big data. The cores of the proposed system are to use the gird-represented map as input for a CNN-based accident detector and use hierarchical DNNs to estimate multiple levels of risk. Specifically, the gird representation of input can effectively reduce the number of zero values in the input data and is efficient in CNN-based model training. In addition, the hierarchical DNNs can simplify the complexity of the multi-classification task and improve the total performance. Also, we propose the Accident Risk Index (ARI) to clearly measure the severity of risk at each cell.

In our experiment, we evaluate the performances of each component of the proposed system. It outperforms other models in predicting traffic accidents, and demonstrates high accuracy and effectiveness in the risk estimation, especially in the multiple binary class classification approach. Furthermore, we validate the feasibility and applicability of the proposed system by applying it to actual data in Daejeon City, Republic of Korea. The proposed system can provide valuable insights into the risk distribution across the urban and facilitates targeted interventions.

Overall, the proposed system offers a novel and comprehensive approach to enhancing road safety in urban areas. It not only serves as a predictive tool, but can also be adapted into a recommendation system that assists urban planners and authorities in implementing preventative measures efficiently. By integrating diverse data sources and utilizing advanced modeling techniques, the proposed system can facilitate the identification of high-risk zones and suggest targeted interventions based on analyzed patterns and trends. This makes it an invaluable asset for decision-makers and stakeholders to prioritize and implement strategies that focus on preventing accidents and reducing their severity when they occur. Furthermore, the system can recommend improvements in infrastructure and changes in traffic regulations, guided by insights drawn from real-time and historical data. These valuable insights, consequently, support the creation of safer urban environments, guiding not only immediate responses but also aiding in the planning and development of long-term road safety strategies. The findings of this research enrich the field of road safety, paving the way for groundbreaking advancements in accident prediction, risk assessment, and the formulation of more informed, data-driven road safety strategies. In addition, the proposed system can potentially be connected with real-world applications such as navigation and traffic management systems to actively recommend safer routes to road users, thus serving a preventing role.

Author Contributions

Conceptualization by B.N.; methodology by Z.J. and B.N.; formal analysis by Z.J.; data curation Z.J. and B.N.; writing—original draft by B.N.; writing—review and editing, Z.J.; supervision by B.N.; funding acquisition by B.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Soonchunhyang University Research Fund.

Data Availability Statement

Data sharing is not applicable.

Acknowledgments

We thank Transport Safety Authority in South Korea for providing DTG data.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cheng, Z.; Zu, Z.; Lu, J. Traffic Crash Evolution Characteristic Analysis and Spatiotemporal Hotspot Identification of Urban Road Intersections. Sustainability 2019, 11, 160. [Google Scholar] [CrossRef]
Yeo, J.; Lee, J.; Cho, J.; Kim, D.K.; Jang, K. Effects of speed humps on vehicle speed and pedestrian crashes in South Korea. J. Saf. Res. 2020, 75, 78–86. [Google Scholar] [CrossRef]
Janstrup, K.H. Road Safety Annual Report 2017; Technical University of Denmark: Lyngby, Denmark, 2017. [Google Scholar]
Demasi, F.; Loprencipe, G.; Moretti, L. Road safety analysis of urban roads: Case study of an Italian municipality. Safety 2018, 4, 58. [Google Scholar] [CrossRef]
Goniewicz, K.; Goniewicz, M.; Pawłowski, W.; Fiedor, P. Road accident rates: Strategies and programmes for improving road traffic safety. Eur. J. Trauma Emerg. Surg. 2016, 42, 433–438. [Google Scholar] [CrossRef]
World Health Organization. Global Status Report on Road Safety 2015; World Health Organization: Geneva, Switzerland, 2015. [Google Scholar]
Korea Index. KOREA INDEX. 2022. Available online: https://www.index.go.kr/potal/main/EachDtlPageDetail.do?idx_cd=1614&param=003 (accessed on 29 January 2022).
Kopelias, P.; Papadimitriou, F.; Papandreou, K.; Prevedouros, P. Urban freeway crash analysis: Geometric, operational, and weather effects on crash number and severity. Transp. Res. Rec. 2007, 2015, 123–131. [Google Scholar] [CrossRef]
De Oña, J.; Mujalli, R.O.; Calvo, F.J. Analysis of traffic accident injury severity on Spanish rural highways using Bayesian networks. Accid. Anal. Prev. 2011, 43, 402–411. [Google Scholar] [CrossRef]
Chang, L.Y.; Wang, H.W. Analysis of traffic injury severity: An application of non-parametric classification tree techniques. Accid. Anal. Prev. 2006, 38, 1019–1027. [Google Scholar] [CrossRef]
Zheng, L.; Sayed, T.; Mannering, F. Modeling traffic conflicts for use in road safety analysis: A review of analytic methods and future directions. Anal. Methods Accid. Res. 2021, 29, 100142. [Google Scholar] [CrossRef]
Ma, Z.; Mei, G.; Cuomo, S. An analytic framework using deep learning for prediction of traffic accident injury severity based on contributing factors. Accid. Anal. Prev. 2021, 160, 106322. [Google Scholar] [CrossRef]
Fountas, G.; Pantangi, S.S.; Hulme, K.F.; Anastasopoulos, P.C. The effects of driver fatigue, gender, and distracted driving on perceived and observed aggressive driving behavior: A correlated grouped random parameters bivariate probit approach. Anal. Methods Accid. Res. 2019, 22, 100091. [Google Scholar] [CrossRef]
Hou, Q.; Huo, X.; Leng, J.; Mannering, F. A note on out-of-sample prediction, marginal effects computations, and temporal testing with random parameters crash-injury severity models. Anal. Methods Accid. Res. 2022, 33, 100191. [Google Scholar] [CrossRef]
Arun, A.; Haque, M.M.; Washington, S.; Sayed, T.; Mannering, F. A systematic review of traffic conflict-based safety measures with a focus on application context. Anal. Methods Accid. Res. 2021, 32, 100185. [Google Scholar] [CrossRef]
Alnawmasi, N.; Mannering, F. The impact of higher speed limits on the frequency and severity of freeway crashes: Accounting for temporal shifts and unobserved heterogeneity. Anal. Methods Accid. Res. 2022, 34, 100205. [Google Scholar] [CrossRef]
Jin, Z.; Kim, J.; Yeo, H.; Choi, S. Transformer-based map-matching model with limited labeled data using transfer-learning approach. Transp. Res. Part C Emerg. Technol. 2022, 140, 103668. [Google Scholar] [CrossRef]
Jin, Z.; Noh, B.; Cho, H.; Yeo, H. Deep Learning-based Approach on Risk Estimation of Urban Traffic Accidents. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; pp. 1446–1451. [Google Scholar]
Choi, S.; Kim, J.; Yeo, H. Attention-based recurrent neural network for urban vehicle trajectory prediction. Procedia Comput. Sci. 2019, 151, 327–334. [Google Scholar] [CrossRef]
Choi, S.; Kim, J.; Yeo, H. TrajGAIL: Generating urban vehicle trajectories using generative adversarial imitation learning. Transp. Res. Part C Emerg. Technol. 2021, 128, 103091. [Google Scholar] [CrossRef]
Noh, B.; Yeo, H. A novel method of predictive collision risk area estimation for proactive pedestrian accident prevention system in urban surveillance infrastructure. Transp. Res. Part C Emerg. Technol. 2022, 137, 103570. [Google Scholar] [CrossRef]
Ali, G.A.; Tayfour, A. Characteristics and prediction of traffic accident casualties in Sudan using statistical modeling and artificial neural networks. Int. J. Transp. Sci. Technol. 2012, 1, 305–317. [Google Scholar] [CrossRef]
Bao, J.; Liu, P.; Ukkusuri, S.V. A spatiotemporal deep learning approach for citywide short-term crash risk prediction with multi-source data. Accid. Anal. Prev. 2019, 122, 239–254. [Google Scholar] [CrossRef]
Fawcett, L.; Thorpe, N.; Matthews, J.; Kremer, K. A novel Bayesian hierarchical model for road safety hotspot prediction. Accid. Anal. Prev. 2017, 99, 262–271. [Google Scholar] [CrossRef]
Zhang, Y.; Cheng, T. Graph deep learning model for network-based predictive hotspot mapping of sparse spatio-temporal events. Comput. Environ. Urban Syst. 2020, 79, 101403. [Google Scholar] [CrossRef]
He, S.; Sadeghi, M.A.; Chawla, S.; Alizadeh, M.; Balakrishnan, H.; Madden, S. Inferring high-resolution traffic accident risk maps based on satellite imagery and GPS trajectories. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 11977–11985. [Google Scholar]
Ren, H.; Song, Y.; Wang, J.; Hu, Y.; Lei, J. A deep learning approach to the citywide traffic accident risk prediction. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 3346–3351. [Google Scholar]
Lu, T.; Dunyao, Z.; Lixin, Y.; Pan, Z. The traffic accident hotspot prediction: Based on the logistic regression method. In Proceedings of the 2015 International Conference on Transportation Information and Safety (ICTIS), Wuhan, China, 25–28 June 2015; pp. 107–110. [Google Scholar]
Al-Dogom, D.; Aburaed, N.; Al-Saad, M.; Almansoori, S. Spatio-temporal analysis and machine learning for traffic accidents prediction. In Proceedings of the 2019 2nd International Conference on Signal Processing and Information Security (ICSPIS), Dubai, United Arab Emirates, 30–31 October 2019; pp. 1–4. [Google Scholar]
Liu, Z.; Li, Z.; Wu, K.; Li, M. Urban traffic prediction from mobility data using deep learning. IEEE Netw. 2018, 32, 40–46. [Google Scholar] [CrossRef]
Park, S.H.; Ha, Y.G. Large imbalance data classification based on mapreduce for traffic accident prediction. In Proceedings of the 2014 8th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, Birmingham, UK, 2–4 July 2014; pp. 45–49. [Google Scholar]
Park, S.H.; Kim, S.M.; Ha, Y.G. Highway traffic accident prediction using VDS big data analysis. J. Supercomput. 2016, 72, 2815–2831. [Google Scholar] [CrossRef]
Akbani, R.; Kwek, S.; Japkowicz, N. Applying support vector machines to imbalanced datasets. In Proceedings of the Machine Learning: ECML 2004: 15th European Conference on Machine Learning, Pisa, Italy, 20–24 September 2004; Proceedings 15. Springer: Berlin/Heidelberg, Germany, 2004; pp. 39–50. [Google Scholar]
Núñez, H.; Gonzalez-Abril, L.; Angulo, C. Improving SVM classification on imbalanced datasets by introducing a new bias. J. Classif. 2017, 34, 427–443. [Google Scholar] [CrossRef]
Korean National Police Agency. Available online: https://www.police.go.kr/eng/main.do (accessed on 27 April 2023).
Korea Transportation Safety Authority. Available online: https://www.kotsa.or.kr/eng/engMain.do (accessed on 27 April 2023).
Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
Zoumpourlis, G.; Doumanoglou, A.; Vretos, N.; Daras, P. Non-linear convolution filters for cnn-based learning. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4761–4769. [Google Scholar]
Zhang, K.; Zuo, W.; Gu, S.; Zhang, L. Learning deep CNN denoiser prior for image restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3929–3938. [Google Scholar]
Sharif Razavian, A.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 806–813. [Google Scholar]
Suthaharan, S.; Suthaharan, S. Support vector machine. In Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning; Springer: Berlin/Heidelberg, Germany, 2016; pp. 207–235. [Google Scholar]
Weisberg, S. Applied Linear Regression; John Wiley & Sons: Hoboken, NJ, USA, 2005; Volume 528. [Google Scholar]
Rosen, G.L.; Reichenberger, E.R.; Rosenfeld, A.M. NBC: The Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics 2011, 27, 127–129. [Google Scholar] [CrossRef] [PubMed]
Taud, H.; Mas, J. Multilayer perceptron (MLP). Geomatic Approaches for Modeling Land Change Scenarios; Springer: Berlin/Heidelberg, Germany, 2018; pp. 451–455. [Google Scholar]

Figure 1. Overall data preprocessing for model training.

Figure 2. Cell representation of Daejeon city.

Figure 3. Number of accidents and related ARI values of in Daejeon city, 2019. (a) Number of accidents; (b) ARI values.

Figure 4. Illustration of the vehicle trajectory collected from the DTG data at different regions.

Figure 5. Strategy for extracting top 11 dangerous driving behaviors from DTG data (Brown line is the actual road).

Figure 6. Visualization for top 11 dangerous driving behaviors in Daejeon City, 2019.

Figure 7. Overall architecture of the proposed system.

Table 1. Description of feature set.

Dataset	Acronym	Feature Description (Abbreviation)
Road traffic accident data	ACC_YMD	Accident year and month
	ORG_CD	Organization code in charge
	ACC_TME	Accident time
	ACC_TYP_CD	Accident type code
		• (Vehicle–pedestrian) on crossing;
		• (Vehicle–pedestrian) passing through
		the edge of road;
		• (Vehicle–vehicle) head-on collision;
		• (Vehicle–vehicle) collision on parking;
		• (Vehicle) off the road.
	GPS_X, GPS_Y	X and Y-coordinates in GPS
	WEA_STA_CD	Weather status code
		• Clear;
		• Rain.
	DEATH_CNT	Number of deaths
	SERI_CNT	Number of the seriously injured
	SLTWD_CNT	Number of the slight wounded
	WND_CNT	Number of casualties
	DMG_AMT	Damaged amount
	NODE_ID	Node id
	LINK_ID	Link id
Basic traffic information	SPD	Average speed
	VOL	Average traffic volume
	LINK_ID	Link id
	NODE_ID	Node id
Digital tachometer graph (DTG)	TRIP_KEY	Trip key
	CAR_SPEED	Vehicle’s speed
	RPM	Vehicle’s revolutions per minute
	BRK_SGN	Brake sign; 0 or 1
	GPS_X, GPS_Y	X and Y-coordinates in GPS
	DGREE	Vehicle’s heading angle
	ACC_VX, ACC_VY	Vehicle’s X and Y-accelerations
Top 11 dangerous driving behaviors	OBU_ID	On-board unit ID
	OVERSPD_IS	Number of overspeed when driving
		20 km/h over the road speed limit
	OVERSPD_TM	Number of instances of keeping overspeed
		for more than 3 min and exceeding the
		road speed limit by 20 km/h
	SDN_ACCEL_IS	Number of sudden accelerations when
		accelerating about 5.0–8.0 km/h per second
		at speed above 6.0 km/h
	SDN_DECEL_IS	Number of sudden decelerations when
		decelerating about 5.0–8.0 km/h per second
		at speed above 6.0 km/h
	SDN_START_IS	Number of sudden starts when
		accelerating 8.0–10.0 km/h per second
		at speed under the 5.0 km/h
	SDN_STOP_IS	Number of sudden stops when reaching
		speed under 5.0 km/h by decelerating
		8.0–14.0 km/h per second
	SDN_LTURN_IS	Number of sudden left turns when
		reaching cumulative direction angle at
		60–120°
	SDN_RTURN_IS	Number of sudden right turns when
		reaching cumulative direction angle at
		60–120°
	SDN_UTURN_IS	Number of sudden U-turns when
		reaching cumulative direction angle at
		60–120°
	SDN_OVERTKG_IS	Number of sudden overtakes when
		driving 20 km/h over the road speed limit
	SDN_COURSE_CHG_IS	Number of sudden course changes when
		driving 20 km/h over the road speed limit

Table 2. Hyperparameters of the proposed models.

Model	Model	Value
Accident Detector	CNN1	256 × 256
	CNN2	256 × 256
	CNN3	256 × 256
	FC1	1000 × 1000
	FC2	500 × 500
Accident Risk Identifier	DNN1	256 × 128 × 64
	DNN2	256 × 128 × 64 × 32
	DNN3	128 × 64
Uniform Dropout Ratio		0.2
Optimizer		Adam
Learning Rate		0.001
Loss		BCELoss

Table 3. Comparison of accuracy, prediction, recall, and F1 score between proposed accident predictor and baselines on the Daejeon dataset.

Model	Accuracy	Precision	Recall	F1 Score
SVM	0.90	0.67	0.29	0.41
LR	0.88	0.44	0.29	0.35
NBC	0.90	0.58	0.32	0.42
MLP	0.90	0.82	0.15	0.25
Proposed accident detector	0.94	0.85	0.63	0.72

Table 4. Comparison of accuracy between the proposed accident risk classifier and baselines on the Daejeon dataset.

Model	Accuracy
SVM	0.34
MLP (Multi-class classification)	0.38
Proposed accident risk classifier	0.82

Table 5. Accuracy, prediction, recall, and F1 score of proposed accident risk classifier on the Daejeon dataset.

Model	Accuracy	Precision	Recall	F1 Score
Proposed accident risk classifier	0.82	0.82	0.82	0.82

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jin, Z.; Noh, B. From Prediction to Prevention: Leveraging Deep Learning in Traffic Accident Prediction Systems. Electronics 2023, 12, 4335. https://doi.org/10.3390/electronics12204335

AMA Style

Jin Z, Noh B. From Prediction to Prevention: Leveraging Deep Learning in Traffic Accident Prediction Systems. Electronics. 2023; 12(20):4335. https://doi.org/10.3390/electronics12204335

Chicago/Turabian Style

Jin, Zhixiong, and Byeongjoon Noh. 2023. "From Prediction to Prevention: Leveraging Deep Learning in Traffic Accident Prediction Systems" Electronics 12, no. 20: 4335. https://doi.org/10.3390/electronics12204335

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

From Prediction to Prevention: Leveraging Deep Learning in Traffic Accident Prediction Systems

Abstract

1. Introduction

2. Data Description

2.1. Cell Representation of Urban Area for Accident Analysis

2.2. Urban Road Traffic Accident and Accident Risk Index (ARI)

2.3. Urban Mobility Data

2.4. Road Safety Facility Information

3. Methodology

3.1. System Architecture

3.2. Predicting Traffic Accident Using Urban Grid Clustered Feature Map

3.3. Risk Level Estimation in Each Cell

4. Result

4.1. Experimental Design

4.2. Results of Traffic Accident Prediction and Risk Level Estimation

4.3. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI