Evaluation of Contributing Factors Affecting Number of Vehicles Involved in Crashes Using Machine Learning Techniques in Rural Roads of Cosenza, Italy

Guido, Giuseppe; Shaffiee Haghshenas, Sina; Shaffiee Haghshenas, Sami; Vitale, Alessandro; Astarita, Vittorio; Park, Yongjin; Geem, Zong Woo

doi:10.3390/safety8020028

Open AccessEditor’s ChoiceArticle

Evaluation of Contributing Factors Affecting Number of Vehicles Involved in Crashes Using Machine Learning Techniques in Rural Roads of Cosenza, Italy

by

Giuseppe Guido

¹

,

Sina Shaffiee Haghshenas

¹

,

Sami Shaffiee Haghshenas

¹,

Alessandro Vitale

¹

,

Vittorio Astarita

^1,*

,

Yongjin Park

^2,* and

Zong Woo Geem

³

¹

Department of Civil Engineering, University of Calabria, Via Bucci, 87036 Rende, Italy

²

Department of Transportation Engineering, Keimyung University, Daegu 42601, Korea

³

College of IT Convergence, Gachon University, Seongnam 13120, Korea

^*

Authors to whom correspondence should be addressed.

Safety 2022, 8(2), 28; https://doi.org/10.3390/safety8020028

Submission received: 17 December 2021 / Revised: 27 March 2022 / Accepted: 30 March 2022 / Published: 8 April 2022

Download

Browse Figures

Versions Notes

Abstract

:

The evaluation of road safety is a critical issue having to be conducted for successful safety management in road transport systems, whereas safety management is considered in road transportation systems as a challenging task according to the dynamic of this issue and the presence of a large number of effective parameters on road safety. Therefore, the evaluation and analysis of important contributing factors affecting the number of vehicles involved in crashes play a key role in increasing the efficiency of road safety. For this purpose, in this research work, two machine learning algorithms, including the group method of data handling (GMDH)-type neural network and a combination of support vector machine (SVM) and the grasshopper optimization algorithm (GOA), are employed. Hence, the number of vehicles involved in an accident is considered to be the output, and the seven factors affecting transport safety, including Daylight (DL), Weekday (W), Type of accident (TA), Location (L), Speed limit (SL), Average speed (AS), and Annual average daily traffic (AADT) of rural roads in Cosenza, southern Italy, are selected as the inputs. In this study, 564 data sets from rural areas were investigated, and the relevant, effective parameters were measured. In the next stage, several models were developed to investigate the parameters affecting the safety management of road transportation in rural areas. The results obtained demonstrated that the “Type of accident” has the highest level and “Location” has the lowest importance in the investigated rural area. Finally, although the results of both algorithms were the same, the GOA-SVM model showed a better degree of accuracy and robustness than the GMDH model.

Keywords:

road safety; safety management; road transportation; GMDH; GOA-SVM; machine learning

1. Introduction

Road transport is one of the oldest methods of transporting goods and individuals. With the increase in the population and the development of towns, the expansion of road transport has become an inevitable issue that plays a significant role in increasing and improving economic and social development. Numerous issues affect the quality and quantity of road transport, and safety management is one of the most significant subjects. Therefore, identifying, studying, and evaluating the contributing factors affecting road safety is inevitable to increase the level of safety management. Several parameters affect road safety, and valuable studies have been conducted to investigate road safety, such as Driver Behavior [1,2,3,4,5,6,7], Age of Driver and Vehicle [8,9,10,11,12,13], Weather Conditions [14,15,16,17,18], Road Geometry [19,20,21,22,23], and Lighting Conditions [24,25,26,27,28,29]. Siliquini et al. (2010) investigated the link between driving performance and the use of psychoactive drugs in teen drivers who were enrolled at common consuming locations. For this purpose, they designed and implemented the TEND by Night project in several European countries. Based on their results, the key findings were the change in response time between entering and exiting the recreation site, as well as its relationship with psychoactive drug usage [30]. Alonso et al. (2017) studied a relatively large statistical community of drivers to explore the behavioral and representational characteristics of drivers that modulate the smoking–accidents relationship. They discovered that, despite being aware of the consequences of smoking while driving, few drivers thought it was a risky behavior. Finally, they suggested various recommendations based on their findings, such as increasing awareness and control of this habit [31]. Zoe et al. (2018) carried out an efficient analysis of the expansion trend in road safety research between 2000 and 2018. Their results indicated that road safety research focused on five major areas, including accident frequency data investigation, driver behavior questionnaire, safety in numbers for walkers and bicyclists, injury and prevention of road traffic, and driving speed and road accidents [32].

Moreover, other comprehensive studies have been performed to examine the role of other factors on road safety. Gichaga (2017) reviewed the historical and cultural backgrounds involving road development and road safety features in Kenya. Based on his results, he made some recommendations for improvements in aspects of road safety [33]. Elvik et al. (2019) conducted a review of the relationship between speed and road safety. They supported two mathematical models. Their results showed that the speed of individual drivers has a similar relationship with safety as the mean speed of traffic [34]. In another study, the effect of air quality on road safety was evaluated by Sager (2019). He investigated the impact of increased air pollution on the amount of road traffic crashes. Then, he found out that there is a relationship between the number of accidents and the amount of PM2.5 [35].

On the other hand, several techniques exist to evaluate the parameters affecting road safety. Traditionally, classical before–after research works, statistical modeling, and personal judgment-based approaches are applied to chronological data [36]. Zheng et al. (2018) investigated the proposal of a bivariate extreme value model. Their proposed model could reduce the uncertainty in crash estimates [37]. Moreover, according to a bivariate extreme value theory (EVT) framework, Wang et al. (2019) introduced an accident forecast method. Their results prove that the proposed crash prediction method can provide very promising results compared to univariate models [38]. In another research work, Zheng et al. (2020) carried out a comprehensive review of research based on traffic conflicts in the road safety analysis. Then, they discussed conceptual and methodological matters related to traffic conflict modeling. Based on their results, it has been determined that although suitable research studies have been performed on this issue, the need for more studies is necessary [39]. An overview of road traffic accidents was conducted in the Eastern Province, KSA, from 2009 to 2016 by Jamal et al. (2020). They developed logistic regression models to forecast crash severity. Finally, they recommended some suggestions to prevent road accidents [40].

Due to the uncertainty and unforeseen conditions of parameters affecting road safety and the high ability of machine learning algorithms to solve complex problems, recently, the use of these methods in combination with classical methods or, in many studies, alone has been widely used [41,42,43,44,45,46,47,48,49]. Xu et al. (2018) evaluated the effect of road lighting on road safety using an artificial neural network. The results clearly showed that road lighting greatly contributed to road safety levels [50]. In another study, two types of artificial intelligence methods were applied to estimate the severity of crashes by Amiri et al. (2020). They used the crash information from California in 2012. Their results indicated that both artificial intelligence models were reliable for predicting the severity of crashes, and also, the light condition played an essential role in the severity level compared to the other contributing factors [51]. Guido et al. (2020) investigated some potential factors for road safety using two artificial intelligence methods. Their results indicated that the PSO algorithm had a superior function compared to the GA algorithm for evaluating factors in road accidents [52]. Shiran et al. (2021) carried out crash severity analysis by applying data mining approaches and multinomial logistic regression. They used an accident dataset of State Highways in California, USA. They found that the C5.0 model provides a higher performance capacity in evaluating crash severity analysis than other models [53]. A brief review of the literature is shown in Table 1.

Studies of the past literature show that although valuable studies have been done, more research is needed to improve the quality of road transport based on the increasing expansion. Therefore, in this study, two machine learning algorithms, including the group method of data handling (GMDH)-type neural network and a combination of support vector machine (SVM) and the grasshopper optimization algorithm (GOA), are applied. Then, the obtained results of two algorithms are compared based on performance indicators to determine the performance of the models regarding the conditions and characteristics of a case study. Finally, a sensitivity analysis was conducted to prioritize the contributing factors affecting the number of vehicles involved in crashes in rural areas.

2. Site Description and Accident Monitoring

To evaluate the performance and usefulness of the proposed approaches for prioritizing the contributing factors affecting the number of vehicles involved in crashes in a rural area, a sample of 564 accident data was acquired from 2017 to 2018 on the rural road network of Cosenza province and has been analyzed (Figure 1). It should be noted that the most comprehensive data sets were available for modeling in this period.

The accident data has been acquired from the Automobile Club Italia (ACI) database, which collaborates with the National Institute of Statistics (ISTAT) in collecting road accident data in Italy. This database provides a lot of information about accidents, such as date and place, road category, pavement conditions, weather conditions, crash dynamics, the type of vehicle involved, the causes of the accident, and the consequences for the people involved (injuries or deaths). The information does not include Property Damage Only (PDO) events under existing Italian legislation, defining road accidents as accidents only when they cause at least one injury [54].

According to ISTAT, accidents are classified as fatal or with injuries; therefore, it is not possible to distinguish the injured according to the level of severity. Based on the previous comment, in this paper, the measure of the number of vehicles involved in an accident, as referred to in Section 4, is expressed as a dichotomous variable (0,1):

- When there is a vehicle involved in an accident, label 0 is considered.
- When there is more than one vehicle involved in an accident, label 1 is considered.

Other information on roads and traffic flows was considered to ensure a more detailed analysis, such as speed limits, average speed, and average annual daily traffic (AADT). The speed limits are regulated by the “Nuovo Codice della Strada” [55,56], and their values have been acquired from the database of Azienda Nazionale Autonoma delle Strade (ANAS). The average speed was obtained through the available data of the historical traffic statistics of TomTom (TomTom Move) and Octo Telematics (Octo IoT Cloud), referring to the road sections with the observed accidents. The AADT was acquired from the PANAMA system of ANAS (ANAS Platform for Monitoring and Analyzing).

This study considers seven independent variables (i.e., the factors affecting the number and severity of accidents), which were selected based on all the available data for the case study. These variables include four qualitative variables (Daylight (DL), Weekday (W), Type of accident (TA), and Location (L)) and three quantitative variables (Speed limit (SL), Average speed (AS), and Annual Average Daily Traffic (AADT)).

Table 2 shows the variables mentioned above, which are classified into various categories (codes) with their statements. The method of grouping variables into different categories depends on their characteristics and the number of observations involved in each study.

3. Methodology

The study of road safety is one of the inseparable issues of transportation engineering, which is usually defined by the absence of accidents and casualties. Lack of attention to road safety could impose irreparable financial and physical damages. Therefore, it is necessary to have a deep understanding of road safety, know all the effective components, and estimate the impact of each on this issue. Studying the literature reveals that most investigations in the field of road safety are based on logit models or regression models with artificial neural networks. On the one hand, the complexity and uncertainty of the factors affecting road safety, and on the other hand, the ability of machine learning algorithms to predict and navigate in the face of unexpected and uncertain issues, has resulted in the successful application of machine learning methods in road safety in recent years [57,58,59,60,61,62,63].

Accordingly, the main aim of this study is to investigate the factors affecting road safety in rural areas by using two machine training approaches, namely the GMDH model and the hybrid GOA-SVM model. For this purpose, the GMDH model was developed to achieve the best binary classification model by determining the best control parameters for GMDH. Moreover, the SVM was hybridized with the grasshopper optimization algorithm as a suitable evolutionary algorithm and was developed to optimize its three parameters. Finally, a sensitivity analysis was performed to evaluate and rank factors affecting the number of vehicles involved in an accident in rural areas. More discussions regarding machine learning models will be given in the next sections.

3.1. Group Method of Data Handling-Type Neural Network

In today’s science, ANNs (as one of the branches of artificial intelligence) play a valuable role in the development of new technologies. Therefore, the use of ANNs to solve many complex problems in various fields of science is inevitable [64,65,66,67,68,69,70,71]. The group method of data handling (GMDH) is one of the artificial neural networks that was introduced by Ivakhnenko (1971). The GMDH has been successfully used for computer-based mathematical modeling in complex systems, as well as data mining, optimization, and pattern recognition problems [72]. The process of GMDH is similar to a type of self-organizing network [73,74]. The mapping between the input and output variables in a GMDH neural network is a nonlinear function. The Polynomial Neural Network (PNN) is one of the basic algorithms used to construct GMDH models. In GMDH modeling, input data are entered into the initial layer, and, after preparation, they are considered input for the second layer, and this process continues until the algorithm converges and stops. Finally, in the convergence process of the algorithm, if the results in the layer (n + 1) are better than the layer (n), then the algorithm converges. Equations (1) and (2) indicate the relationship between the approximate function (

\overset{\land}{f}

) with the multi-input and single-output (

\overset{\land}{y}

) dataset and the least possible error between actual and predicted values [73,74].

\begin{array}{l} \overset{\land}{y} = \overset{\land}{f} (x_{i 1}, x_{i 2}, x_{i 3}, \dots \dots \dots x_{i m}) \\ i = (1, 2, 3, \dots \dots, n) \end{array}

(1)

{\sum_{i = 1}^{M} [\overset{\land}{f} (x_{i 1}, x_{i 2}, x_{i 3}, \dots \dots \dots x_{i m}) - y_{i}]}^{2} \Rightarrow M i n

(2)

Equation (3) indicates a relationship between a single output (y) and a multi-input (Input vector, X = (x₁, x₂, x₃, …, x_m)) according to the Kolmogorov–Gabor polynomial [75,76].

y = a_{0} + \sum_{i = 1}^{m} a_{i} x_{i} + \sum_{i = 1}^{m} \sum_{j = 1}^{m} a_{i j} x_{i} x_{j} + \sum_{i = 1}^{m} \sum_{j = 1}^{m} \sum_{k = 1}^{m} a_{i j k} x_{i} x_{j} x_{k} + \sum_{i = 1}^{m} \sum_{j = 1}^{m} \sum_{k = 1}^{m} \sum_{l = 1}^{m} a_{i j k l} x_{i} x_{j} x_{k} x_{l}, \dots \dots

(3)

where

a_{i}, a_{i j}, a_{i j k}, a_{i j k l}, \dots \dots

are considered to be coefficients of the polynomial, and m is the amount of data. Furthermore, Equation (3) can be considered as a quadratic polynomial for 2 inputs according to Equation (4) [77].

\hat{y} = G (x_{i}, x_{j}) = a_{0} + a_{1} x_{i} + a_{2} x_{j} + a_{3} x_{i}^{2} + a_{4} x_{j}^{2} + a_{5} x_{i} x_{j}

(4)

The total error (E) is considered by minimizing the difference between the actual output (y) and predicted output

(\hat{y} = G (x_{i}, x_{j}))

for each pair of input variables

x_{i}

and

x_{j}

based on Equation (5). Moreover, the coefficients of each quadratic function are optimized [78].

E = \frac{\sum_{i = 1}^{M} {(y_{i} - G_{i} (x_{i}, x_{j}))}^{2}}{M} \Rightarrow M i n

(5)

Out of a total of n input variables, all alternatives for two independent variables are provided in the elementary form of the GMDH algorithm for providing the regression polynomial in the form of Equation (4) [79]. Therefore,

(\begin{array}{l} n \\ 2 \end{array}) = \frac{n (n - 1)}{2}

neurons will be made in the primary layer of the feedforward neural network from the observations

{(y_{i}, x_{i}, x_{i q}); (i = 1, 2, 3, \dots \dots, M)}

for various

(p, q \in {1, 2, 3, \dots, n})

. The matrix form of Equation (4) can be considered to indicate the main form of the GMDH based on Equation (6) [80].

Y = A a

(6)

Y = (y_{1}, y_{2}, y_{3}, \dots, y_{m})

and

a = (a_{1}, a_{2}, a_{3}, a_{4}, a_{5})

are considered the observed output vector and the vector coefficient of the quadratic polynomial, respectively. A is computed based on Equation (7) [81,82].

A = [\begin{array}{l} 1 & x_{1 p} & x_{1 q} & x_{1 p} x_{1 q} & x_{1 p}^{2} & x_{1 q}^{2} \\ 1 & x_{1 p} & x_{1 q} & x_{1 p} x_{1 q} & x_{1 p}^{2} & x_{1 q}^{2} \\ . & . & . & . & . & . \\ . & . & . & . & . & . \\ . & . & . & . & . & . \\ 1 & x_{M p} & x_{M q} & x_{M p} x_{M q} & x_{M p}^{2} & x_{M q}^{2} \end{array}]

(7)

Finally, using the least-squares process from multiple regression analysis, a normal equation is achieved based on Equation (8), which calculates the vector of the best coefficients for Equation (4) [83].

a = {(A^{T} A)}^{- 1} A^{T} Y

(8)

3.2. Support Vector Machine

A support vector machine (SVM) is an effective machine learning method that was introduced by Cortes and Vapnik (1995) [84]. SVMs are a kind of supervised learning algorithms that are used in a wide range of modelings, such as regression and classification. The SVM presents a linear two-class classifier and aims to maximize the margin amongst two classes so that a classification hyperplane is formed in the center of the maximum margin. It provides many hyperplanes, while the goal of the support vector machine is to find the best hyperplane in n-dimensional space. Two labels are considered for this classification: label +1 is considered for cases that are above the hyperplane, and label −1 belongs to cases that are under the hyperplane. Equation (9) shows a group of the sample set that is used in classification learning data [85,86].

S = {{(x_{i}, y_{i})}_{i = 1}^{n} | x_{i} \in R^{N}, y_{i} \in {- 1, 1}, i = 1, 2, \dots \dots, l}

(9)

where

y_{i}

is the target variable for the observed i-th sample (the sample category). It is also assumed that

x_{i}

presents the i-th sample data. After the formation of hyperplanes, one of the hyperplanes has the highest margin, which is called the optimal hyperplane. This optimal hyperplane is determined by the existing support vectors and constraints. Equations (10) and (11) indicate the constraints [87,88].

M i n \frac{1}{2} ‖ w ‖^{2}

(10)

s . t . y_{i} (w x_{i} + b) \geq 1

(11)

where w and b are the weight vector and the bias vector, correspondingly. Then, considering an error coefficient, the constraints are rewritten and corrected according to Equations (12) and (13). This error coefficient is intended to ensure a more accurate classification [88].

M i n \frac{1}{2} ‖ w ‖^{2} + c \sum_{i = 1}^{n} ε_{i} (ε_{i} \geq 0)

(12)

s . t . {\begin{cases} y_{i} (w x_{i} + b) \geq 1 - ε_{i} \\ c \geq 0 \end{cases} (i = 1, 2, 3, \dots, n)

(13)

where c is the penalty coefficient. Then, using the Lagrange method, SVM classification problems are considered as the following dual optimization problem based on Equation (14) [88].

{\begin{cases} W (a) = \sum_{i = 1}^{n} a_{i} - \frac{1}{2} \sum_{i, j = 1}^{n} a_{i} a_{j} y_{i} y_{j} K (x_{i}, x_{j}) \\ s . t . \sum_{i = 1}^{n} a_{i} y_{i} (0 \leq a_{i} \leq c; i = 1, 2, 3, \dots, n) \end{cases}

(14)

where K is a mathematical function that is called the kernel function. There are different types of kernel functions, including linear (LIN), the radial basis function (RBF), and polynomial (POL), and their relationships are shown in Table 3. Gamma (

γ

) and d are necessary to define the kernel types. Gamma (

γ

) is used for RBF and POL and “d” represents the term of polynomial degree only for the POL kernel function [89,90]. The most important role of the kernel function is to take the dataset as an input and convert it into the required form. Knowledge of the use of various kernel functions in related situations can affect the quality of the category.

3.3. Grasshopper Optimization Algorithm

In recent years, meta-heuristic algorithms have played a very important role in dealing with complex and uncertain problems [91,92,93,94,95,96]. These algorithms have made significant progress in both academia and industry. The grasshopper optimization algorithm (GOA) is one of the newest meta-heuristic algorithms, which was presented by Saremi et al. (2017) [97]. This algorithm is based on swarm intelligence and is population-based, which was inspired by the group behavior of grasshoppers. Grasshoppers are usually seen individually or in groups. One of the essential features of the group of grasshoppers is their type of movement, which has a slow movement with small steps. In the GOA, two sections are defined for the search, including exploration and exploitation. In exploration, search agents are persuaded to move suddenly, whereas they want to move locally in exploitation. A mathematical model was introduced to simulate the swarming behavior of locusts according to Equation (15) [97,98].

X_{i} = S_{i} + G_{i} + A_{i}

(15)

where

X_{i}

represents the position of the i-th grasshopper.

S_{i}, G_{i}

, and

A_{i}

are the social interaction, the gravity force on the i-th grasshopper, and the wind advection, respectively. Then, to show the random behavior of grasshoppers in Equation (15), the random factors are used including r₁, r₂, and r₃, which are random values within [0, 1]. Equation (15) is rewritten according to Equation (16) [97,99].

X_{i} = r_{1} S_{i} + r_{2} G_{i} + r_{3} A_{i}

(16)

The gravity force on the i-th grasshopper (

S_{i}

) is presented based on Equation (17) [97].

S_{i} = \sum_{\begin{array}{l} j = 1 \\ j \neq 1 \end{array}}^{N} s (d_{i j}) {\hat{d}}_{i j}

(17)

where

d_{i j}

and

{\hat{d}}_{i j}

present the distance between the i-th and j-th grasshopper and a unit vector from the i-th grasshopper to the j-th grasshopper, which is computed as

d_{i j} = | x_{j} - x_{i} |

and

{\hat{d}}_{i j} = \frac{x_{j} - x_{i}}{d_{i j}}

, respectively. Based on Equation (18), S is a function to describe the strength of social forces. The motion of the grasshopper is affected by the repulsion and attraction factors between them, which is defined by s in the mathematical model of GOA based on Figure 2. According to Figure 1, there is a comfort zone (comfortable distance) in which neither the attraction nor the repulsion action takes place between two grasshoppers [97].

s (r) = f e^{\frac{- r}{l}} - e^{- r}

(18)

where f and l are the intensity of attraction and the attractive length scale, correspondingly. Changes in the amount of f and l indicate a change in the behavior of the grasshopper, so by changing them, the number of S changes, and the final results can change. Equations (19) and (20) show the gravity force on the i-th grasshopper and the wind advection, respectively [97].

G_{i} = - g {\hat{e}}_{g}

(19)

A_{i} = u {\hat{e}}_{w}

(20)

where

{\hat{e}}_{g}

is a unity vector towards the center of earth, and g is the constant of gravity. Moreover, u and

{\hat{e}}_{w}

present a constant drift and a unity vector in the direction of the wind, correspondingly. It should be noted that nymph grasshoppers do not have wings, so their movements are strongly influenced by the wind. By inserting the values of each of these definitions, the mathematical model of the algorithm expands based on Equation (21) [97,98,99,100].

X_{i} = \sum_{\begin{array}{l} j = 1 \\ j \neq 1 \end{array}}^{N} s (| x_{j} - x_{i} |) \frac{x_{j} - x_{i}}{d_{i j}} - g {\hat{e}}_{g} + u {\hat{e}}_{w}

(21)

It should be noted that if the locust population reaches the comfort zone quickly, the swarm will not converge to a specific point, and Equation (21) is not able to solve optimization problems directly. Therefore, the mathematical model presented in Equation (21) considers an upper bound and lower bound as well as two coefficients to balance the motion between the comfort, gravity, and repulsion zones, and Equation (21) is modified as Equation (22) [97,101].

X_{i}^{d} = c (\sum_{\begin{array}{l} j = 1 \\ j \neq 1 \end{array}}^{N} c \frac{u b_{d} - l b_{d}}{2} s (| x_{j} - x_{i} |) \frac{x_{j} - x_{i}}{d_{i j}} - g {\hat{e}}_{g} + u {\hat{e}}_{w}) + {\hat{T}}_{d}

(22)

where

u b_{d}

and

u b_{d}

represent the upper bound and lower bound in the D-th dimension, respectively.

{\hat{T}}_{d}

presents the value of the D-th dimension in the target, and c introduces a reducing coefficient to shrink the comfort zone, repulsion zone, and attraction zone that is shown in Equation (23) [97,102].

c = c \max - l \frac{c \max - c \min}{L}

(23)

where cmax and cmin are the maximum value and the minimum value, correspondingly. Moreover, l demonstrates the current iteration, and L shows the maximum number of iterations. Parameter c needs to be reduced proportionally to the number of iterations needed to balance exploration and exploitation. For more information and explanations regarding the grasshopper optimization algorithm, refer to Saremi et al. (2017) [97].

4. Results and Discussion

As mentioned earlier, for the binary classification modeling, two machine learning algorithms, namely GMDH and the hybrid GOA-SVM, were used and developed, and the number of vehicles involved in the accidents was evaluated in this study. For this purpose, a valuable dataset was collected and, as described in Section 2, the seven factors affecting the number of vehicles involved in the crashes, including the Daylight (DL), Weekday (W), Type of accident (TA), Location (L), Speed limit (SL), Average speed (AS), and Annual average daily traffic (AADT), of rural areas of Cosenza in southern Italy, were considered. It should be noted that, initially, a comprehensive study was conducted on the literature, and a set of parameters affecting road safety was identified; about 18 parameters. Then, due to the limitations of data access, including incomplete data, lack of data, and incorrect data, these seven parameters affecting road safety were selected. In this study, all the data has been classified into two classes. To check the number of vehicles involved in an accident, the first class with the label “0” was considered for cases where at most one car was involved in an accident. The second class was labeled “1” for cases in which at least two or more vehicles were involved in a crash. This classification was based on the assumption that the main criterion for class separation is to consider the minimum number of vehicles involved in an accident. This was conducted by developing and constructing the best classification model to determine the correct classes with the highest possible accuracy by determining a mapping between the input and output data.

By developing models, the best models for each method are determined, and then the results obtained are compared. Finally, by performing a sensitivity analysis, the importance of the effect of each of the factors is determined. It is necessary to mention that in binary classification modeling, the use of accuracy and error in the confusion matrix are considered the most practical performance indicators. Therefore, to compare the performance evaluation of the models, the confusion matrix is used according to Figure 3 and Equations (24) and (25). Moreover, due to the range of changes and the scale of measurement of each of the studied parameters, the normalization of this data has a significant role in the data-driven system modeling approaches with appropriate accuracy because if they are not normalized, a factor on a larger scale may cause a computational deviation. Therefore, all data are normalized using the min-max normalization before modeling.

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

(24)

E r r o r = \frac{F P + F N}{T P + F P + T N + F N} = 1 - A c c u r a c y

(25)

4.1. GMDH Modeling

In this work, GMDH was employed to construct a binary classification model for the assessment of the number of vehicles involved in an accident using the MATLAB environment. The number of vehicles involved in an accident was considered the dependent variable, and DL, W, TA, L, SL, AS, and AADT were set as the independent variables. Hence, the optimal architecture of GMDH contributes greatly to the high performance of GMDH models. Thus, the accurate determination of control parameters of the GMDH model is an essential issue. Although there are no specific relationships to accurately determine control parameters, most of these parameters are obtained from past studies, expert opinions, and trial and error. Therefore, according to experts’ opinions and previous studies, a range is considered for some control parameters of the GMDH model, which includes the set of the maximum number of layers (MNL) equal to 5, 10, 20, 40, and 50, and the maximum number of neurons in a layer (MNNL) equal to 5, 10, 20, 40, and 50. Moreover, the Selection Pressure (SP) is another control parameter of the GMDH model, and it is a dimensionless number where the sensitivity of the modeling error is affected by the SP, and it was considered equal to 0.5. Different rates for training and testing data are considered in modeling, and several studies have been conducted in this case [103,104]. Based on the suggestions and studies of Looney, 75% of the data (423 cases) were used for training, and 25% of the data (141 cases) were applied as test data for modeling [105]. In total, 25 models were constructed, and their results are demonstrated in Table 4.

After constructing different models and determining the accuracy performance of each model, all models were ranked based on a simple ranking method suggested by Zorlu et al. (2008), Table 5 indicates the ranking results [106].

According to Table 5, all models were ranked, and the seven developed models had the lowest rank, with a rank equal to 31 among the 25 developed models. While the 15th model had the best performance in comparison with other developed models. Its training accuracy and testing accuracy were 83.2% and 81.6%, correspondingly. The structure of the 15th developed model consisted of MNL, MNNL, and SP equal to 20, 50, and 0.5, respectively. The results of the confusion matrices of the training, the testing, and total data are indicated in Figure 4a–c, respectively.

As mentioned earlier, 75% of the data set (423) was considered for the training dataset, and the rest (141) was assigned to the testing dataset. According to Figure 4a, the 15th binary classification model correctly identified 80 cases of the first class with the label “0” (at most one car was involved in an accident), while 18 cases of the second class (at least two or more vehicles were involved in an accident) were incorrectly estimated in the first class with the label “0”. Note that it could classify the training dataset with an accuracy equal to 83.2%. Meanwhile, the results of the binary classification for the testing data according to Figure 4b show that 36 cases of the first class with label “0” and 79 cases of the second class with label “1” were correctly considered, while the 15th developed model wrongly predicted 5 and 21 cases of the first and second classes with labels “0” and “1”, respectively. Consequently, the 15th model was able to obtain an acceptable accuracy of classifying test data of 81.6%. Finally, according to the binary classification results for the whole data set according to Figure 4c, it is clear that the 15th binary classification developed model could correctly classify 467 cases of two classes, and 97 cases of both classes were wrongly classified. Consequently, the total accuracy of the binary classification modeling was 82.8%. This analysis shows that GMDH is a reliable modeling method for predicting the number of vehicles involved in an accident.

4.2. GOA-SVM Modeling

GOA and SVM were combined to develop a predictive model in the MATLAB environment. The GOA algorithm was used to optimize some parameters of the SVM so that the SVM model shows the highest performance. In the GOA-SVM modeling of this study, the same datasets performed in the analysis of GMDH were used. For modeling, 75% of the dataset (423 data) is randomly defined as the training dataset, and the remaining 25% (141 data) is considered as the testing dataset [105]. Like modeling with the GMDH algorithm, the two classes were considered for all data with labels, including “0” and “1”. To develop and optimize the parameters of the SVM model by the GOA, the control parameters of the GOA must be determined, which play an important role in the rapid and appropriate convergence of the model. Although there are no specific relationships to determine these parameters, based on previous studies and experts’ opinions, a range was determined for each of them, such as the number of grasshopper populations (5, 10, 15, 20, 30, and 40) and the number of iterations (10, 20, 40, 50, and 100). Then, the most appropriate ones were selected by a trial-and-error approach [107].

Moreover, to further evaluate the model, k-fold cross-validation was used, in which the data were subdivided into K subsets. In this system, one of them was used at each time for validation, while the other K − 1 was applied for training. This procedure was carried out K times, with each data set being used exactly once for training and once for validation. Finally, the average result of this K validation was chosen as the final estimate. There is no specific method for determining the amount of k-fold, and it is determined according to the number of data and the opinion of experts. Hence, k-fold was considered to be equal to 3 in this study [107]. In addition, the three different types of kernel functions, including RBF, POL, and LIN, were used. According to the number of control parameters, 30 models were built for each kernel function of GOA-SVM. That means, in total, 90 models were made for GOA-SVM. Preliminary analyses were performed, and a comparison of the performance indicators for the best-developed models with three different kernels is shown in Figure 5.

Although the GOA algorithm was able to train SVM very well with different kernels, according to Figure 5, it is clear that the best-developed model with an RBF kernel had the highest degree of accuracy in training, validation, and testing in comparison with the POL and LIN kernel functions. Therefore, the best developed GOA-SVM model is considered with the RBF kernel function and the optimal control parameters of the model are shown in Table 6. Moreover, the value of error in each iteration was calculated based on Equation (25), and the result is indicated in Figure 6. According to Figure 6, modeling started with an error of about 0.195 and had different values until the 18th iteration, reaching 0.153 in the 19th iteration, which remained constant until the last iteration (40th). The obtained values of accuracy and error indicated the proper convergence of the model.

4.3. Comparison of Models’ Performance and Sensitivity Analysis

Each year, many people pass away due to road traffic accidents. Therefore, knowing the impact of various contributing factors on road accidents and taking the necessary measures to reduce accidents can significantly increase the level of road safety. This research employed two machine learning methods, namely GMDH-type neural network and GOA-SVM, to conduct the binary classification modeling. Based on the accuracy of the modeling performance, the best GMDH and GOA-SVM models were chosen after multiple modeling. A comparison was made between the best model of GMDH and the best model of GOA-SVM based on the accuracy of training and testing that is shown in Figure 7. According to the explanations in the previous section, it should be noted that the value of the validation accuracy model is considered instead of the value of training accuracy in the GOA-SVM model. Hence the value of validation accuracy of the GOA-SVM model should be compared to the value of training accuracy of the GMDH model.

According to Figure 7, the best GOA-SVM model indicated higher performance than the best GMDH model for predicting the number of accidents by 84.6% and 83.4% for training and testing accuracies, in comparison with 83.2% and 81.6% for training and testing accuracies, respectively. Although it is worth mentioning that both models had acceptable degrees of accuracy and robustness, it can be concluded that they are reliable systems of modeling for predicting the number of vehicles involved in an accident and can be used as useful tools for modeling road safety involved in transportation engineering.

Road accidents can cause considerable economic and human losses to society. Therefore, assessing the impact of parameters affecting the number of vehicles involved in an accident can provide an in-depth insight for engineers involved in road safety management. A sensitivity analysis was performed to assess the impact of DL, W, TA, L, SL, AS, and AADT on the predicted number of vehicles involved in the accident. This sensitivity analysis is based on the cosine amplitude method according to Equation (26), in which r_ij is the strength of the relationship, n shows the number of datasets, and x_ik and y_ij explain the input variables and the predicted output, correspondingly [108,109].

r_{i j} = \frac{\sum_{k = 1}^{n} (x_{i k} \times y_{j k})}{\sqrt{\sum_{k = 1}^{n} x_{i k}^{2} \sum_{k = 1}^{n} y_{i k}^{2}}}

(26)

According to Figure 8, it is clear that the analyses of both models had similar results from the impact of the factors under consideration, which indicates the reliability of the results. Furthermore, the following remarks can be concluded:

-: The type of accident was the most significant factor among other contributing factors that affected the number of vehicles involved in the crashes. In general, certain types of accidents can be caused by a variety of issues, including a lack of traffic signs and poor road quality. The type of accident has an important effect on the number of vehicles involved in an accident.
-: The next factor is the average speed, which can increase the risk of accidents. Some researchers discovered that controlling other factors, such as traffic volume, road geometry, and the number of lanes, can reduce or eliminate the effects of average speed [110,111]. When the average speed is higher, the driver’s response time is shorter, which can lead to an accident. Therefore, it is possible to control the impact of average speed by providing some types of measures, such as improvement to the location of road signs, speed limit enforcement methods, pavement markings, and vertical centerline treatments.
-: The third factor influencing the number of vehicles involved in an accident after the type of accident and the average speed is the annual average daily traffic that plays a key role in the development needs and priorities of road development for transportation planning. Moreover, some studies indicate that increasing the amount of AADT can lead to an increase in the frequency of accidents [112,113]. Therefore, to reduce the effects of AADT on this case study, it is recommended to consider other intercity transportation systems, such as trains, which are being considered in the review of coming urban development plans.
-: The subsequent contributing factor is the speed limit, which has a significant role in the behavior and decisions of drivers. Generally, it is considered that the speed limit is determined by the road conditions. If the speed limit is selected incorrectly on a part of the route and the driver is aware of this error due to the road conditions, they may lose confidence in the speed limits in other sections of the road and increase or decrease the speed based on their interpretation [114,115]. Hence, given the impact of this factor in this case study, it is suggested that a general review be considered in selecting the speed limit for rural roads in Cosenza.
-: Several appropriate studies have been conducted on the relationship between weekdays and accidents [116,117]. In line with national statistics, road accidents are more concentrated on holidays on the road network in Cosenza’s province. In the present study, out of the 564 accidents, 65% (367) occurred on holidays. Due to the geographical location of Cosenza, the amount of traffic on holidays has experienced a relative increase. To reduce the exposure to the risk, the intensification of controls and monitoring of roads during the holidays would mitigate the effect of this factor by increasing police enforcement.
-: Extensive studies have been conducted on the effect of daylight on the number of vehicles involved in accidents, which shows this factor’s high importance. This factor has been given priority in many studies, among other factors [50,118]. The amount of impact this factor has is heavily influenced by its geographic location and road lighting systems. This factor was determined as the sixth most effective factor out of seven factors, and this result was matched with the location and road lighting system of rural roads in Cosenza.
-: The last studied factor was the location that had the most negligible impact on the rate of vehicles involved in an accident, based on the results of both artificial intelligence models. Based on the type of structure of the rural roads in Cosenza, the location has not had much effect on the rate of accidents. Therefore, in future studies on this case study, the effect of this factor can be ignored.

Human behavior is jointly responsible for road accidents, together with the vehicle and the environment (of which the infrastructure is part). It is strictly correlated to the speed and type of accident, and the author’s analysis of the importance and the proposed methodology determined the impacts. Moreover, these two factors have the highest impact compared to other parameters on the rate of vehicles involved in accidents on the rural roads of Cosenza.

Therefore, based on the results obtained and after consulting experts, it can be concluded that several measures, such as improving roadway geometry, pavement markings, and vertical centerline treatments, are the most significant measures that should be considered in reviewing the plan for rural roads in Cosenza. Finally, it should be noted that the presented models with specific structures are location-sensitive and cannot be directly used on other rural roads.

5. Conclusions

The evaluation and analysis of contributing factors affecting the number of vehicles involved in the crashes can lead to a deep understanding of the existing situation and increase road safety by planning and taking a series of necessary measures. Therefore, this study made an attempt to predict the number of vehicles involved in the crashes using the GMDH-type neural network and GOA-SVM methods. This study was accomplished using 564 accident cases from rural roads in Cosenza in southern Italy. Several crash-related parameters, including DL, W, TA, L, SL, AS, and AADT, were set as input variables, and the number of vehicles involved in the crash was considered as the output data. According to each technique’s number of control parameters, 25 models were made for GMDH and 30 models for each kernel function of GOA-SVM, so a total of 90 models were made for GOA-SVM. After modeling, the models for each technique were compared with each other in terms of the model’s performance. Among the binary classification of GMDH models, the 15th model was selected with the highest score, and the GOA-SVM model with RBF kernel function was chosen among the binary classification of GOA-SVM models. In the comparison between the GMDH and GOA-SVM models, the GOA-SVM model had a higher capability for the prediction of the number of vehicles involved in the accident, although the performance of both models indicates that both can be used as useful tools for modeling the number of vehicles involved in an accident in transportation engineering. Consequently, a sensitivity analysis was performed based on the results obtained from both models. In both models, the type of accident and location had the highest and lowest impact compared to other parameters on the rate of vehicles involved in accidents on the rural roads of Cosenza, respectively. The results of this study showed the role of humans in causing accidents as the most contributing factor in road safety, which was fully consistent with previous studies in this field. As a result, it is recommended that organizations concerned with road safety, in addition to taking the required steps to enhance road condition, establish a long-term strategy for raising awareness and assessing driver performance.

For future work, it is recommended to examine other factors that can impact the number of vehicles involved in an accident, such as the age of drivers, the age of cars, gender, and the geometry of roads.

Author Contributions

Conceptualization, G.G. and S.S.H. (Sina Shaffiee Haghshenas); methodology, S.S.H. (Sina Shaffiee Haghshenas) and S.S.H. (Sami Shaffiee Haghshenas); software, S.S.H. (Sina Shaffiee Haghshenas); formal analysis, S.S.H. (Sina Shaffiee Haghshenas); investigation, S.S.H. (Sina Shaffiee Haghshenas) and S.S.H. (Sami Shaffiee Haghshenas); resources, G.G., V.A. and A.V.; writing—original draft preparation, S.S.H. (Sina Shaffiee Haghshenas); writing—review and editing, G.G., S.S.H. (Sina Shaffiee Haghshenas), S.S.H. (Sami Shaffiee Haghshenas), V.A., Y.P. and Z.W.G.; supervision, G.G., V.A., A.V., Y.P. and Z.W.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to express our deepest thanks to Mahdi Ghaem for his excellent advice.

Conflicts of Interest

The authors declare no conflict of interest.

References

Nordfjærn, T.; Rundmo, T. Road traffic safety beliefs and driver behaviors among personality subtypes of drivers in the Norwegian population. Traffic Inj. Prev. 2013, 14, 690–696. [Google Scholar] [CrossRef] [PubMed]
Ellison, A.B.; Greaves, S.P.; Bliemer, M.C. Driver behaviour profiles for road safety analysis. Accid. Anal. Prev. 2015, 76, 118–132. [Google Scholar] [CrossRef] [PubMed]
Scott-Parker, B. Emotions, behaviour, and the adolescent driver: A literature review. Transp. Res. F Traffic Psychol. Behav. 2017, 50, 1–37. [Google Scholar] [CrossRef]
Farooq, D.; Moslem, S.; Duleba, S. Evaluation of driver behavior criteria for evolution of sustainable traffic safety. Sustainability 2019, 11, 3142. [Google Scholar] [CrossRef] [Green Version]
Farooq, D.; Juhasz, J. Statistical Evaluation of Risky Driver Behavior Factors That Influence Road Safety Based on Drivers Age and Driving Experience in Budapest and Islamabad. Eur. Transp. 2020, 80, 1–18. [Google Scholar] [CrossRef]
Rosli, N.; Ambak, K.; Shahidan, N.N.; Sukor, N.S.A.; Osman, S.; Yei, L.S. Driving behaviour of elderly drivers in Malaysia. Int. J. Integr. Eng. 2020, 12, 268–277. [Google Scholar] [CrossRef]
Alonso, B.; Astarita, V.; Dell’Olio, L.; Giofrè, V.P.; Guido, G.; Marino, M.; Sommario, W.; Vitale, A. Validation of simulated safety indicators with traffic crash data. Sustainability 2020, 12, 925. [Google Scholar] [CrossRef] [Green Version]
Christoforou, Z.D.; Karlaftis, M.G.; Yannis, G. Heavy vehicle age and road safety. In Proceedings of the Institution of Civil Engineers-Transport, London, UK, 22–23 November 2010; Thomas Telford Ltd.: London, UK, 2010; Volume 163, pp. 41–48. [Google Scholar]
Russo, F.; Biancardo, S.A.; Dell’Acqua, G. Road safety from the perspective of driver gender and age as related to the injury crash frequency and road scenario. Traffic Inj. Prev. 2014, 15, 25–33. [Google Scholar] [CrossRef] [Green Version]
Kim, S.; Kim, J.K. Road safety for an aged society: Compliance with traffic regulations, knowledge about traffic regulations, and risk factors of older drivers. Transp. Res. Rec. 2017, 2660, 15–21. [Google Scholar] [CrossRef]
Casado-Sanz, N.; Guirao, B. Analysis of the impact of population ageing and territorial factors on crosstown roads safety: The Spanish case study. Transp. Res. Procedia 2018, 33, 283–290. [Google Scholar] [CrossRef]
Török, Á. A Novel Approach in Evaluating the Impact of Vehicle Age on Road Safety. Promet Traffic Transp. 2020, 32, 789–796. [Google Scholar] [CrossRef]
Lyon, C.; Mayhew, D.; Granie, M.A.; Robertson, R.; Vanlaar, W.; Woods-Fry, H.; Thevenetb, C.; Furian, G.; Soteropoulos, A. Age and road safety performance: Focusing on elderly and young drivers. IATSS Res. 2020, 44, 212–219. [Google Scholar] [CrossRef]
Theofilatos, A.; Yannis, G. A review of the effect of traffic and weather characteristics on road safety. Accid. Anal. Prev. 2014, 72, 244–256. [Google Scholar] [CrossRef] [PubMed]
Lazarev, Y.; Medres, C.; Raty, J.; Bondarenko, A. Method of Assessment and Prediction of Temperature Conditions of Roadway Surfacing as a Factor of the Road Safety. Transp. Res. Procedia 2017, 20, 393–400. [Google Scholar] [CrossRef]
Malin, F.; Norros, I.; Innamaa, S. Accident risk of road and weather conditions on different road types. Accid. Anal. Prev. 2018, 122, 181–188. [Google Scholar] [CrossRef]
Ivajnšič, D.; Horvat, N.; Žiberna, I.; Kotnik, E.; Davidović, D. Revealing the Spatial Pattern of Weather-Related Road Traffic Crashes in Slovenia. Appl. Sci. 2021, 11, 6506. [Google Scholar] [CrossRef]
Abdella, G.M.; Shaaban, K. Modeling the impact of weather conditions on pedestrian injury counts using LASSO-based poisson model. Arab. J. Sci. Eng. 2021, 46, 4719–4730. [Google Scholar] [CrossRef]
Bajwa, S.; Warita, M.; Kuwahara, M. Effects of road geometry, weather and traffic flow on road safety. In Proceedings of the 15th International Conference of Hong Kong Society for Transportation Study, Hong Kong, China, 11–14 December 2010; pp. 773–780. [Google Scholar]
Orfila, O.; Coiret, A.; Do, M.; Mammar, S. Modeling of dynamic vehicle–road interactions for safety-related road evaluation. Accid. Anal. Prev. 2010, 42, 1736–1743. [Google Scholar] [CrossRef]
Alian, S.; Baker, R.; Wood, S. Rural casualty crashes on the Kings Highway: A new approach for road safety studies. Accid. Anal. Prev. 2016, 95, 8–19. [Google Scholar] [CrossRef]
Ewan, L.; Al-Kaisy, A.; Hossain, F. Safety Effects of Road Geometry and Roadside Features on Low-Volume Roads in Oregon. Transp. Res. Rec. 2016, 2580, 47–55. [Google Scholar] [CrossRef] [Green Version]
Gooch, J.P.; Gayah, V.V.; Donnell, E.T. Quantifying the safety effects of horizontal curves on two-way, two-lane rural roads. Accid. Anal. Prev. 2016, 92, 71–81. [Google Scholar] [CrossRef] [PubMed]
Roslak, J.; Wallaschek, J. Active lighting systems for improved road safety. In Proceedings of the IEEE Intelligent Vehicles Symposium, Parma, Italy, 14–17 June 2004; pp. 682–685. [Google Scholar]
Magar, S.G. Adaptive Front Light Systems of Vehicle for Road Safety. In Proceedings of the 2015 International Conference on Computing Communication Control and Automation, Pune, India, 6–27 February 2015; pp. 551–554. [Google Scholar]
Aldulaimi, M.; AmadorJimenez, L. 759 Road lighting and safety: A pilot study of Arthabaska region. Inj. Prev. 2016, 22, A272. [Google Scholar] [CrossRef] [Green Version]
Tetervenoks, O.; Avotins, A.; Fedorjana, N.; Kluga, A.; Krasts, V. Potential Role of Street Lighting System for Safety Enhancement on the Roads in Future. In Proceedings of the 2019 IEEE 60th International Scientific Conference on Power and Electrical Engineering of Riga Technical University (RTUCON), Riga, Latvia, 7–9 October 2019; pp. 1–5. [Google Scholar]
Liu, J.; Li, J.; Wang, K.; Zhao, J.; Cong, H.; He, P. Exploring factors affecting the severity of night-time vehicle accidents under low illumination conditions. Adv. Mech. Eng. 2019, 11, 1687814019840940. [Google Scholar] [CrossRef]
Saljoqi, M.; Behnood, H.R.; Mirbaha, B. Developing the crash modification model for urban street lighting. Innov. Infrastruct. Solut. 2021, 6, 59. [Google Scholar] [CrossRef]
Siliquini, R.; Piat, S.C.; Alonso, F.; Druart, A.; Kedzia, M.; Mollica, A.; Siliquini, V.; Vankov, D.; Villerusa, A.; Manzoli, L.; et al. A European study on alcohol and drug use among young drivers: The TEND by Night study design and methodology. BMC Public Health 2010, 10, 205. [Google Scholar] [CrossRef] [Green Version]
Alonso, F.; Esteban, C.; Useche, S.A.; Faus, M. Smoking while driving: Frequency, motives, perceived risk and punishment. World J. Prev. Med. 2017, 5, 1–9. [Google Scholar]
Zou, X.; Yue, W.L.; Vu, H.L. Visualization and analysis of mapping knowledge domain of road safety studies. Accid. Anal. Prev. 2018, 118, 131–145. [Google Scholar] [CrossRef]
Elvik, R.; Vadeby, A.; Hels, T.; van Schagen, I. Updated estimates of the relationship between speed and road safety at the aggregate and individual levels. Accid. Anal. Prev. 2019, 123, 114–122. [Google Scholar] [CrossRef]
Gichaga, F.J. The impact of road improvements on road safety and related characteristics. IATSS Res. 2017, 40, 72–75. [Google Scholar] [CrossRef] [Green Version]
Sager, L. Estimating the effect of air pollution on road safety using atmospheric temperature inversions. J. Environ. Econ. Manag. 2019, 98, 102250. [Google Scholar] [CrossRef]
Mahmud, S.M.S.; Ferreira, L.; Tavassoli, A. Traditional approaches to Traffic Safety Evaluation (TSE): Application challenges and future directions. In Proceedings of the 11th Asia Pacific Transportation Development Conference and 29th ICTPA Annual Conference, Hsinchu, Taiwan, 27–29 May 2016; pp. 242–262. [Google Scholar]
Zheng, L.; Ismail, K.; Sayed, T.; Fatema, T. Bivariate extreme value modeling for road safety estimation. Accid. Anal. Prev. 2018, 120, 83–91. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Xu, C.; Dai, Y. A crash prediction method based on bivariate extreme value theory and video-based vehicle trajectory data. Accid. Anal. Prev. 2019, 123, 365–373. [Google Scholar] [CrossRef] [PubMed]
Zheng, L.; Sayed, T.; Mannering, F. Modeling traffic conflicts for use in road safety analysis: A review of analytic methods and future directions. Anal. Methods Accid. Res. 2020, 29, 100142. [Google Scholar] [CrossRef]
Jamal, A.; Rahman, M.T.; Al-Ahmadi, H.M.; Mansoor, U. The Dilemma of Road Safety in the Eastern Province of Saudi Arabia: Consequences and Prevention Strategies. Int. J. Environ. Res. Public Health 2019, 17, 157. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Mussone, L.; Ferrari, A.; Oneta, M. An analysis of urban collisions using an artificial intelligence model. Accid. Anal. Prev. 1999, 31, 705–718. [Google Scholar] [CrossRef]
Halim, Z.; Kalsoom, R.; Bashir, S.; Abbas, G. Artificial intelligence techniques for driving safety and vehicle crash prediction. Artif. Intell. Rev. 2016, 46, 351–387. [Google Scholar] [CrossRef]
Castro, Y.; Kim, Y.J. Data mining on road safety: Factor assessment on vehicle accidents using classification models. Int. J. Crashworthiness 2015, 21, 104–111. [Google Scholar] [CrossRef]
De Luca, M. A comparison between prediction power of artificial neural networks and multivariate analysis in road safety management. Transport 2017, 32, 379–385. [Google Scholar] [CrossRef] [Green Version]
Shah, S.A.R.; Brijs, T.; Ahmad, N.; Pirdavani, A.; Shen, Y.; Basheer, M.A. Road safety risk evaluation using gis-based data envelopment analysis—Artificial neural networks approach. Appl. Sci. 2017, 7, 886. [Google Scholar] [CrossRef] [Green Version]
Liu, M.; Wu, Z.; Chen, Y.; Zhang, X. Utilizing Decision Tree Method and ANFIS to Explore Real-Time Crash Risk for Urban Freeways. In Proceedings of the CICTP 2020, Xi’an, China, 14–16 August 2020; pp. 2495–2508. [Google Scholar]
Guido, G.; Haghshenas, S.; Haghshenas, S.; Vitale, A.; Gallelli, V.; Astarita, V. Development of a Binary Classification Model to Assess Safety in Transportation Systems Using GMDH-Type Neural Network Algorithm. Sustainability 2020, 12, 6735. [Google Scholar] [CrossRef]
Mokhtarimousavi, S.; Anderson, J.C.; Hadi, M.; Azizinamini, A. A temporal investigation of crash severity factors in worker-involved work zone crashes: Random parameters and machine learning approaches. Transp. Res. Interdiscip. Perspect. 2021, 10, 100378. [Google Scholar] [CrossRef]
Kitali, A.E.; Mokhtarimousavi, S.; Kadeha, C.; Alluri, P. Severity analysis of crashes on express lane facilities using support vector machine model trained by firefly algorithm. Traffic Inj. Prev. 2020, 22, 79–84. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Ye, Z.; Wang, Y.; Wang, C.; Sun, C. Evaluating the influence of road lighting on traffic safety at accesses using an artificial neural network. Traffic Inj. Prev. 2018, 19, 601–606. [Google Scholar] [CrossRef] [PubMed]
Amiri, A.M.; Sadri, A.; Nadimi, N.; Shams, M. A comparison between Artificial Neural Network and Hybrid Intelligent Genetic Algorithm in predicting the severity of fixed object crashes among elderly drivers. Accid. Anal. Prev. 2020, 138, 105468. [Google Scholar] [CrossRef]
Guido, G.; Haghshenas, S.; Haghshenas, S.; Vitale, A.; Astarita, V.; Haghshenas, A. Feasibility of Stochastic Models for Evaluation of Potential Factors for Safety: A Case Study in Southern Italy. Sustainability 2020, 12, 7541. [Google Scholar] [CrossRef]
Shiran, G.; Imaninasab, R.; Khayamim, R. Crash Severity Analysis of Highways Based on Multinomial Logistic Regression Model, Decision Tree Techniques, and Artificial Neural Network: A Modeling Comparison. Sustainability 2021, 13, 5670. [Google Scholar] [CrossRef]
Mussone, L.; Bassani, M.; Masci, P. Analysis of factors affecting the severity of crashes in urban road intersections. Accid. Anal. Prev. 2017, 103, 112–122. [Google Scholar] [CrossRef]
Ministero delle Infrastrutture e dei Trasporti. Nuovo Codice della Strada, Decreto Legislativo N. 285 del 30/4/1992, G.U. n. 114 del 18/5/1992. Gazzetta Ufficiale, 1992. [Google Scholar]
Ministero delle Infrastrutture e dei Trasporti. Disposizioni urgenti per la sicurezza della circolazione dei veicoli e di specifiche categorie di utenti, Modifiche al Nuovo Codice della Strada, Decreto Legislativo n.121 del 10/9/2021, G.U. n. 267 del 9/9/2021. Gazzetta Ufficiale, 2021. [Google Scholar]
Pan, G.; Fu, L.; Thakali, L. Development of a global road safety performance function using deep neural networks. Int. J. Transp. Sci. Technol. 2017, 6, 159–173. [Google Scholar] [CrossRef]
Peng, Z.; Gao, S.; Li, Z.; Xiao, B.; Qian, Y. Vehicle Safety Improvement through Deep Learning and Mobile Sensing. IEEE Netw. 2018, 32, 28–33. [Google Scholar] [CrossRef]
Silva, P.B.; Andrade, M.; Ferreira, S. Machine learning applied to road safety modeling: A systematic literature review. J. Traffic Transp. Eng. 2020, 7, 775–790. [Google Scholar] [CrossRef]
Hernández, H.; Alberdi, E.; Pérez-Acebo, H.; Álvarez, I.; García, M.; Eguia, I.; Fernández, K. Managing Traffic Data through Clustering and Radial Basis Functions. Sustainability 2021, 13, 2846. [Google Scholar] [CrossRef]
Amiri, A.M.; Naderi, K.; Cooper, J.F.; Nadimi, N. Evaluating the impact of socio-economic contributing factors of cities in California on their traffic safety condition. J. Transp. Health 2021, 20, 101010. [Google Scholar] [CrossRef]
Jamal, A.; Zahid, M.; Tauhidur Rahman, M.; Al-Ahmadi, H.M.; Almoshaogeh, M.; Farooq, D.; Ahmad, M. Injury severity prediction of traffic crashes with ensemble machine learning techniques: A comparative study. Int. J. Inj. Contr. Saf. Promot. 2021, 28, 408–427. [Google Scholar] [CrossRef] [PubMed]
Hosseinzadeh, A.; Moeinaddini, A.; Ghasemzadeh, A. Investigating factors affecting severity of large truck-involved crashes: Comparison of the SVM and random parameter logit model. J. Saf. Res. 2021, 77, 151–160. [Google Scholar] [CrossRef] [PubMed]
Tehrany, M.S.; Pradhan, B.; Jebur, M.N. Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS. J. Hydrol. 2014, 512, 332–343. [Google Scholar] [CrossRef]
Jahed Armaghani, D.; Asteris, P.G.; Askarian, B.; Hasanipanah, M.; Tarinejad, R.; Huynh, V.V. Examining hybrid and single SVM models with different kernels to predict rock brittleness. Sustainability 2020, 12, 2229. [Google Scholar] [CrossRef] [Green Version]
Mikaeil, R.; Haghshenas, S.S.; Ozcelik, Y.; Gharehgheshlagh, H.H. Performance evaluation of adaptive neuro-fuzzy inference system and group method of data handling-type neural network for estimating wear rate of diamond wire saw. Geotech. Geol. Eng. 2018, 36, 3779–3791. [Google Scholar] [CrossRef]
Dormishi, A.R.; Ataei, M.; Khaloo Kakaie, R.; Mikaeil, R.; Shaffiee Haghshenas, S. Performance evaluation of gang saw using hybrid ANFIS-DE and hybrid ANFIS-PSO algorithms. JME 2019, 10, 543–557. [Google Scholar]
Naderpour, H.; Mirrashid, M. Moment capacity estimation of spirally reinforced concrete columns using ANFIS. Complex Intell. Syst. 2019, 6, 97–107. [Google Scholar] [CrossRef] [Green Version]
Shirani Faradonbeh, R.; Shaffiee Haghshenas, S.; Taheri, A.; Mikaeil, R. Application of self-organizing map and fuzzy c-mean techniques for rockburst clustering in deep underground projects. Neural Comput. Appl. 2019, 32, 8545–8559. [Google Scholar] [CrossRef]
Noori, A.M.; Mikaeil, R.; Mokhtarian, M.; Haghshenas, S.S.; Foroughi, M. Feasibility of Intelligent Models for Prediction of Utilization Factor of TBM. Geotech. Geol. Eng. 2020, 38, 3125–3143. [Google Scholar] [CrossRef]
Golafshani, E.M.; Behnood, A. Predicting the mechanical properties of sustainable concrete containing waste foundry sand using multi-objective ANN approach. Constr. Build. Mater. 2021, 291, 123314. [Google Scholar] [CrossRef]
Ivakhnenko, A.G. Polynomial Theory of Complex Systems. IEEE Trans. Syst. Man Cybern. 1971, SMC-1, 364–378. [Google Scholar] [CrossRef] [Green Version]
Sezavar, R.; Shafabakhsh, G.; Mirabdolazimi, S.M. New model of moisture susceptibility of nano silica-modified asphalt concrete using GMDH algorithm. Constr. Build Mater. 2019, 211, 528–538. [Google Scholar] [CrossRef]
Dag, O.; Kasikci, M.; Karabulut, E.; Alpar, R. Diverse classifiers ensemble based on GMDH-type neural network algorithm for binary classification. Commun. Stat. Simul. Comput. 2019, 1–17. [Google Scholar] [CrossRef]
Armaghani, D.J.; Hasanipanah, M.; Amnieh, H.B.; Bui, D.T.; Mehrabi, P.; Khorami, M. Development of a novel hybrid intelligent model for solving engineering problems using GS-GMDH algorithm. Eng. Comput. 2019, 36, 1379–1391. [Google Scholar] [CrossRef]
Harandizadeh, H.; Armaghani, D.J.; Mohamad, E.T. Development of fuzzy-GMDH model optimized by GSA to predict rock tensile strength based on experimental datasets. Neural Comput. Appl. 2020, 32, 14047–14067. [Google Scholar] [CrossRef]
Jeddi, S.; Sharifian, S. A hybrid wavelet decomposer and GMDH-ELM ensemble model for Network function virtualization workload forecasting in cloud computing. Appl. Soft Comput. 2019, 88, 105940. [Google Scholar] [CrossRef]
Morosini, A.F.; Haghshenas, S.S.; Geem, Z.W. Development of a Binary Model for Evaluating Water Distribution Systems by a Pressure Driven Analysis (PDA) Approach. Appl. Sci. 2020, 10, 3029. [Google Scholar] [CrossRef]
Khandezamin, Z.; Naderan, M.; Rashti, M.J. Detection and classification of breast cancer using logistic regression feature selection and GMDH classifier. J. Biomed. Inform. 2020, 111, 103591. [Google Scholar] [CrossRef]
Fiorini Morosini, A.; Shaffiee Haghshenas, S.; Shaffiee Haghshenas, S.; Choi, D.Y.; Geem, Z.W. Sensitivity Analysis for Performance Evaluation of a Real Water Distribution System by a Pressure Driven Analysis Approach and Artificial Intelligence Method. Water 2021, 13, 1116. [Google Scholar] [CrossRef]
Pusat, S.; Akkaya, A.V. Explicit equation derivation for predicting coal moisture content in convective drying process by GMDH-type neural network. Int. J. Coal Prep. 2020, 1–14. [Google Scholar] [CrossRef]
Vissol-Gaudin, E.; Kotsialos, A.; Groves, C.; Pearson, C.; Zeze, D.A.; Petty, M.C. Computing based on material training: Application to binary classification problems. In Proceedings of the 2017 IEEE International Conference on Rebooting Computing (ICRC), Washington, DC, USA, 8–9 November 2017; pp. 1–8. [Google Scholar]
Li, D.; Armaghani, D.J.; Zhou, J.; Lai, S.H.; Hasanipanah, M. A GMDH predictive model to predict rock material strength using three non-destructive tests. J. Nondestruct. Eval. 2020, 39, 81. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1985, 20, 273–297. [Google Scholar] [CrossRef]
Chen, W.H.; Hsu, S.H.; Shen, H.P. Application of SVM and ANN for intrusion detection. Comput. Oper. Res. 2005, 32, 2617–2634. [Google Scholar] [CrossRef]
Yan, X.; Jia, M. A novel optimized SVM classification algorithm with multi-domain feature and its application to fault diagnosis of rolling bearing. Neural. Comput. Appl. 2018, 313, 47–64. [Google Scholar] [CrossRef]
Zhou, S.; Chu, X.; Cao, S.; Liu, X.; Zhou, Y. Prediction of the ground temperature with ANN, LS-SVM and fuzzy LS-SVM for GSHP application. Geothermics 2019, 84, 101757. [Google Scholar] [CrossRef]
Maldonado, S.; López, J.; Jimenez-Molina, A.; Lira, H. Simultaneous feature selection and heterogeneity control for SVM classification: An application to mental workload assessment. Expert Syst. Appl. 2020, 143, 112988. [Google Scholar] [CrossRef]
Zeng, J.; Roussis, P.C.; Mohammed, A.S.; Maraveas, C.; Fatemi, S.A.; Armaghani, D.J.; Asteris, P.G. Prediction of Peak Particle Velocity Caused by Blasting through the Combinations of Boosted-CHAID and SVM Models with Various Kernels. Appl. Sci. 2021, 11, 3705. [Google Scholar] [CrossRef]
Zhou, J.; Qiu, Y.; Zhu, S.; Armaghani, D.J.; Li, C.; Nguyen, H.; Yagiz, S. Optimization of support vector machine through the use of metaheuristic algorithms in forecasting TBM advance rate. Eng. Appl. Artif. Intell. 2021, 97, 104015. [Google Scholar] [CrossRef]
Mikaeil, R.; Haghshenas, S.S.; Hoseinie, S.H. Rock penetrability classification using artificial bee colony (ABC) algorithm and self-organizing map. Geotech. Geol. Eng. 2018, 36, 1309–1318. [Google Scholar] [CrossRef]
Salemi, A.; Mikaeil, R.; Haghshenas, S.S. Integration of finite difference method and genetic algorithm to seismic analysis of circular shallow tunnels (Case study: Tabriz urban railway tunnels). KSCE J. Civ. Eng. 2018, 22, 1978–1990. [Google Scholar] [CrossRef]
Aryafar, A.; Mikaeil, R.; Haghshenas, S.S.; Haghshenas, S.S. Application of metaheuristic algorithms to optimal clustering of sawing machine vibration. MEAS 2018, 124, 20–31. [Google Scholar] [CrossRef]
Mikaeil, R.; Haghshenas, S.S.; Sedaghati, Z. Geotechnical risk evaluation of tunneling projects using optimization techniques (case study: The second part of Emamzade Hashem tunnel). Nat. Hazards 2019, 97, 1099–1113. [Google Scholar] [CrossRef]
Mikaeil, R.; Bakhshinezhad, H.; Haghshenas, S.S.; Ataei, M. Stability analysis of tunnel support systems using numerical and intelligent simulations (case study: Kouhin Tunnel of Qazvin-Rasht Railway). Rud. Geol. Naft. Zb. 2019, 34, 1–10. [Google Scholar] [CrossRef] [Green Version]
Haghshenas, S.S.; Faradonbeh, R.S.; Mikaeil, R.; Haghshenas, S.S.; Taheri, A.; Saghatforoush, A.; Dormishi, A. A new conventional criterion for the performance evaluation of gang saw machines. MEAS 2019, 46, 159–170. [Google Scholar] [CrossRef]
Saremi, S.; Mirjalili, S.; Lewis, A. Grasshopper optimisation algorithm: Theory and application. Adv. Eng. Softw. 2017, 105, 30–47. [Google Scholar] [CrossRef] [Green Version]
Mafarja, M.; Aljarah, I.; Faris, H.; Hammouri, A.I.; Ala’M, A.Z.; Mirjalili, S. Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst. Appl. 2019, 117, 267–286. [Google Scholar] [CrossRef]
Saxena, A. A comprehensive study of chaos embedded bridging mechanisms and crossover operators for grasshopper optimisation algorithm. Expert Syst. Appl. 2019, 132, 166–188. [Google Scholar] [CrossRef]
Pan, J.S.; Wang, X.; Chu, S.C.; Nguyen, T.T. A multi-group grasshopper optimisation algorithm for application in capacitated vehicle routing problem. Pattern Recognit. 2020, 4, 41–56. [Google Scholar]
Goodarzizad, P.; Mohammadi Golafshani, E.; Arashpour, M. Predicting the construction labour productivity using artificial neural network and grasshopper optimisation algorithm. Int. J. Constr. Manag. 2021, 1–17. [Google Scholar] [CrossRef]
Meraihi, Y.; Gabis, A.B.; Mirjalili, S.; Ramdane-Cherif, A. Grasshopper optimization algorithm: Theory, variants, and applications. IEEE Access 2021, 9, 50001–50024. [Google Scholar] [CrossRef]
Nelson, M.M.; Illingworth, W.T. A Practical Guide to Neural Nets, Addison-Wesley; Addison-Wesley Longman Publishing Co., Inc.: Reading, MA, USA, 1991. [Google Scholar]
Swingler, K. Applying Neural Networks: A Practical Guide; Morgan Kaufmann: London, UK, 1996. [Google Scholar]
Looney, C.G. Advances in feedforward neural networks: Demystifying knowledge acquiring black boxes. IEEE Trans. Knowl. Data Eng. 1996, 8, 211–226. [Google Scholar] [CrossRef]
Zorlu, K.; Gokceoglu, C.; Ocakoglu, F.; Nefeslioglu, H.A.; Acikalin, S.J.E.G. Prediction of uniaxial compressive strength of sandstones using petrography-based models. Eng. Geol. 2008, 96, 141–158. [Google Scholar] [CrossRef]
Ataei, M.; Mohammadi, S.; Mikaeil, R. Evaluating performance of cutting machines during sawing dimension stones. J. Cent. South Univ. 2019, 26, 1934–1945. [Google Scholar] [CrossRef]
Mikaeil, R.; Mokhtarian, M.; Haghshenas, S.S.; Careddu, N.; Alipour, A. Assessing the System Vibration of Circular Sawing Machine in Carbonate Rock Sawing Process Using Experimental Study and Machine Learning. Geotech. Geol. Eng. 2021, 40, 103–119. [Google Scholar] [CrossRef]
Yang, Y.; Zhang, Q. A hierarchical analysis for rock engineering using artificial neural networks. Rock Mech. Rock Eng. 1997, 30, 207–222. [Google Scholar] [CrossRef]
Quddus, M. Exploring the relationship between average speed, speed variation, and accident rates using spatial statistical models and GIS. J. Transp. Saf. 2013, 5, 27–45. [Google Scholar] [CrossRef]
Borsati, M.; Cascarano, M.; Bazzana, F. On the impact of average speed enforcement systems in reducing highway accidents: Evidence from the Italian Safety Tutor. Econ. Transp. 2019, 20, 100123. [Google Scholar] [CrossRef]
Dong, C.; Dong, Q.; Huang, B.; Hu, W.; Nambisan, S.S. Estimating Factors Contributing to Frequency and Severity of Large Truck–Involved Crashes. J. Transp. Eng. Part A Syst. 2017, 143, 4017032. [Google Scholar] [CrossRef]
Chang, G.L.; Xiang, H. The Relationship between Congestion Levels and Accidents (No. MD-03-SP 208B46); The National Academies of Sciences, Engineering, and Medicine: Washington, DC, USA, 2003. [Google Scholar]
Lee, Y.M.; Chong, S.Y.; Goonting, K.; Sheppard, E. The effect of speed limit credibility on drivers’ speed choice. Transp. Res. F Traffic Psychol. Behav. 2017, 45, 43–53. [Google Scholar] [CrossRef]
Vadeby, A.; Forsman, Å. Traffic safety effects of new speed limits in Sweden. Accid. Anal. Prev. 2018, 114, 34–39. [Google Scholar] [CrossRef] [PubMed]
Yu, R.; Abdel-Aty, M. Investigating the different characteristics of weekday and weekend crashes. J. Saf. Res. 2013, 46, 91–97. [Google Scholar] [CrossRef]
Mokhtarimousavi, S.; Anderson, J.; Azizinamini, A.; Hadi, M. Factors affecting injury severity in vehicle-pedestrian crashes: A day-of-week analysis using random parameter ordered response models and Artificial Neural Networks. Int. J. Transp. Sci. Technol. 2020, 9, 100–115. [Google Scholar] [CrossRef]
Asgarzadeh, M.; Fischer, D.; Verma, S.K.; Courtney, T.K.; Christiani, D.C. The impact of weather, road surface, time-of-day, and light conditions on severity of bicycle-motor vehicle crash injuries. Am. J. Ind. Med. 2018, 61, 556–565. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Rural roads accident map in the province of Cosenza (Italy) for the years 2017 and 2018 (source: Regional Center for the Collection of Data on Road Accidents in Calabria—CRISC).

Figure 2. Attraction force, repulsion force, and comfort zone in primary corrective patterns among individuals in a swarm of grasshoppers.

Figure 3. A basic form of a confusion matrix.

Figure 4. Confusion matrices of training data (a), testing data (b), and total data (c).

Figure 5. Comparison between the values of training accuracy, validation accuracy, and testing accuracy for the best-developed models.

Figure 6. Variations of error value in each iteration during GOA-SVM modeling.

Figure 7. Comparison between the best GMDH and GOA-SVM models for training and testing accuracies.

Figure 8. Comparison between the results of sensitivity analysis obtained from the GMDH model (a) and GOA-SVM model (b).

Table 1. A brief overview of some studies that employed machine learning techniques [41,42,43,44,45,46,47,48,49,50,51,52,53].

Researcher(s)	Type of Techniques	Description
Mussone et al. [41]	ANN	Modeling of urban vehicular accidents using ANN with an assessment of the main circumstances and causes of accidents.
Halim et al. [42]	Review of AI techniques such as: GA, GP, CRF, ANN, PCA, Fuzzy Logic, TD Learning, SVM	Assessment of studies based on AI approaches for accident prediction and identifying dangerous driving situations.
Castro et al. [43]	Bayesian Network, Decision Trees, and ANN	Evaluation of the impact of various factors on injury risk in order to improve the road safety level.
De Luca [44]	MVA, ANN	A comparison of road safety management prediction models on two-lane highways.
Shah et al. [45]	DEA-ANN	Identification and evaluation of the most important criteria in determining the level of road risk.
Liu et al. [46]	ANFIS, Logistic Regression, Decision Tree, and SVM	Examining real-time crash risk for urban freeways as a means of assessing road safety and traffic control decisions.
Guido et al. [47]	GMDH	Assessment of the effective parameters affecting accidents for the urban and rural areas.
Mokhtarimousavi et al. [48]	SVM, CS-SVM, and Logit Model	Temporal examination of accident severity determinants in worker-involved work zone crashes based on random parameters and machine learning methodologies.
Kitali et al. [49]	SVM-FA	Examination of the elements that influence the severity of injuries in crashes on express lanes facilities.
Xu et al. [50]	ANN	Study on the impact of road lighting on traffic safety.
Amiri et al. [51]	ANN, GA-ANN	Forecasting the severity of fixed object accidents among elderly drivers using two types of AI techniques.
Guido et al. [52]	GA, PSO	The use of clustering models to evaluate potential safety factors.
Shiran et al. [53]	ANN-MLP, CHAID, C5.0, and MLR	A comparison of collision severity analysis models for highways.

Table 2. Independent variables (qualitative and quantitative).

Data Field Type	Variable	Code/Unit	Description
Traffic flow characteristics	AADT (veh/day)	1 2 3 4	<5000 5000–9999 10,000–14,999 >14,999
Traffic flow characteristics	Avg Speed (km/h)	Not coded	Min 28 Max 122 Avg 91.43
Road environment	Location	0 1	Non intersection Intersection
Road environment	Speed Limit (km/h)	1 2 3 4 5	50 70 90 110 130
Environment characteristics	DayLight	0 1	Daylight Nighttime
Environment characteristics	Weekday	0 1	Weekend or Holiday Weekday
Accident characteristic	Accident Type	1 2 3 4	Collision with vehicle Collision with pedestrian Collision with obstacle Other

Table 3. Equations of different kernel functions.

No	Type of Kernel Function	Equations
1	Linear (LIN)	$G (x_{i}, x_{j}) = x_{i}^{t} x_{j}$
2	Radial basis function (RBF)	$G (x_{i}, x_{j}) = e x p (- γ {‖ x_{i} - x_{j} ‖}^{2})$
3	Polynomial (POL)	$G (x_{i}, x_{j}) = {(- γ x_{i}^{t} x + 1)}^{d}$

Table 4. The accuracies of training and testing of models based on the various control parameters.

Models No.	SP	MNL	MNNL	Training Accuracy (%)	Testing Accuracy (%)
1	0.5	5	5	78.5	78
2	0.5	5	10	79.9	73.8
3	0.5	5	20	79.7	74.5
4	0.5	5	40	80.1	73.8
5	0.5	5	50	80.9	77.3
6	0.5	10	5	78.3	75.2
7	0.5	10	10	77.8	76.6
8	0.5	10	20	77.8	76.6
9	0.5	10	40	79.2	75.9
10	0.5	10	50	82.7	80.1
11	0.5	20	5	78	75.9
12	0.5	20	10	80.9	73.8
13	0.5	20	20	79.9	77.3
14	0.5	20	40	82	80.9
15	0.5	20	50	83.2	81.6
16	0.5	40	5	79.9	77.3
17	0.5	40	10	80.1	78.7
18	0.5	40	20	82.7	78.7
19	0.5	40	40	81.3	79.4
20	0.5	40	50	80.1	76.6
21	0.5	50	5	77.8	76.6
22	0.5	50	10	80.9	74.5
23	0.5	50	20	77.8	76.6
24	0.5	50	40	78.3	75.2
25	0.5	50	50	82	77.3

Table 5. Ranking of developed models.

Models No.	SP	MNL	MNNL	Rating for Training Accuracy	Rating for Testing Accuracy	Total Rank
1	0.5	5	5	16	20	36
2	0.5	5	10	19	14	33
3	0.5	5	20	18	15	33
4	0.5	5	40	20	14	34
5	0.5	5	50	21	19	40
6	0.5	10	5	15	16	31
7	0.5	10	10	13	18	31
8	0.5	10	20	13	18	31
9	0.5	10	40	17	17	34
10	0.5	10	50	24	23	47
11	0.5	20	5	14	17	31
12	0.5	20	10	21	14	35
13	0.5	20	20	19	19	38
14	0.5	20	40	23	24	47
15	0.5	20	50	25	25	50
16	0.5	35	5	19	19	38
17	0.5	35	10	20	21	41
18	0.5	35	20	24	21	45
19	0.5	35	40	22	22	44
20	0.5	35	50	20	18	38
21	0.5	50	5	13	18	31
22	0.5	50	10	21	15	36
23	0.5	50	20	13	18	31
24	0.5	50	40	15	16	31
25	0.5	50	50	23	19	42

Table 6. Control parameters of the best developed GOA-SVM model.

No	Control Parameters	Values
1	Grasshoppers’ populations	40
2	Number of iterations	40
3	k-fold	3
4	C	897.25
5	Gamma ( $γ$ ) of RBF kernel	6.17

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guido, G.; Shaffiee Haghshenas, S.; Shaffiee Haghshenas, S.; Vitale, A.; Astarita, V.; Park, Y.; Geem, Z.W. Evaluation of Contributing Factors Affecting Number of Vehicles Involved in Crashes Using Machine Learning Techniques in Rural Roads of Cosenza, Italy. Safety 2022, 8, 28. https://doi.org/10.3390/safety8020028

AMA Style

Guido G, Shaffiee Haghshenas S, Shaffiee Haghshenas S, Vitale A, Astarita V, Park Y, Geem ZW. Evaluation of Contributing Factors Affecting Number of Vehicles Involved in Crashes Using Machine Learning Techniques in Rural Roads of Cosenza, Italy. Safety. 2022; 8(2):28. https://doi.org/10.3390/safety8020028

Chicago/Turabian Style

Guido, Giuseppe, Sina Shaffiee Haghshenas, Sami Shaffiee Haghshenas, Alessandro Vitale, Vittorio Astarita, Yongjin Park, and Zong Woo Geem. 2022. "Evaluation of Contributing Factors Affecting Number of Vehicles Involved in Crashes Using Machine Learning Techniques in Rural Roads of Cosenza, Italy" Safety 8, no. 2: 28. https://doi.org/10.3390/safety8020028

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluation of Contributing Factors Affecting Number of Vehicles Involved in Crashes Using Machine Learning Techniques in Rural Roads of Cosenza, Italy

Abstract

1. Introduction

2. Site Description and Accident Monitoring

3. Methodology

3.1. Group Method of Data Handling-Type Neural Network

3.2. Support Vector Machine

3.3. Grasshopper Optimization Algorithm

4. Results and Discussion

4.1. GMDH Modeling

4.2. GOA-SVM Modeling

4.3. Comparison of Models’ Performance and Sensitivity Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Models No.	SP	MNL	MNNL	Rating for Training Accuracy	Rating for Testing Accuracy	Total Rank
1	0.5	5	5	16	20	36
2	0.5	5	10	19	14	33
3	0.5	5	20	18	15	33
4	0.5	5	40	20	14	34
5	0.5	5	50	21	19	40
6	0.5	10	5	15	16	31
7	0.5	10	10	13	18	31
8	0.5	10	20	13	18	31
9	0.5	10	40	17	17	34
10	0.5	10	50	24	23	47
11	0.5	20	5	14	17	31
12	0.5	20	10	21	14	35
13	0.5	20	20	19	19	38
14	0.5	20	40	23	24	47
15	0.5	20	50	25	25	50
16	0.5	35	5	19	19	38
17	0.5	35	10	20	21	41
18	0.5	35	20	24	21	45
19	0.5	35	40	22	22	44
20	0.5	35	50	20	18	38
21	0.5	50	5	13	18	31
22	0.5	50	10	21	15	36
23	0.5	50	20	13	18	31
24	0.5	50	40	15	16	31
25	0.5	50	50	23	19	42

Models No.	SP	MNL	MNNL	Rating for Training Accuracy	Rating for Testing Accuracy	Total Rank
1	0.5	5	5	16	20	36
2	0.5	5	10	19	14	33
3	0.5	5	20	18	15	33
4	0.5	5	40	20	14	34
5	0.5	5	50	21	19	40
6	0.5	10	5	15	16	31
7	0.5	10	10	13	18	31
8	0.5	10	20	13	18	31
9	0.5	10	40	17	17	34
10	0.5	10	50	24	23	47
11	0.5	20	5	14	17	31
12	0.5	20	10	21	14	35
13	0.5	20	20	19	19	38
14	0.5	20	40	23	24	47
15	0.5	20	50	25	25	50
16	0.5	35	5	19	19	38
17	0.5	35	10	20	21	41
18	0.5	35	20	24	21	45
19	0.5	35	40	22	22	44
20	0.5	35	50	20	18	38
21	0.5	50	5	13	18	31
22	0.5	50	10	21	15	36
23	0.5	50	20	13	18	31
24	0.5	50	40	15	16	31
25	0.5	50	50	23	19	42

Models No.	SP	MNL	MNNL	Rating for Training Accuracy	Rating for Testing Accuracy	Total Rank
1	0.5	5	5	16	20	36
2	0.5	5	10	19	14	33
3	0.5	5	20	18	15	33
4	0.5	5	40	20	14	34
5	0.5	5	50	21	19	40
6	0.5	10	5	15	16	31
7	0.5	10	10	13	18	31
8	0.5	10	20	13	18	31
9	0.5	10	40	17	17	34
10	0.5	10	50	24	23	47
11	0.5	20	5	14	17	31
12	0.5	20	10	21	14	35
13	0.5	20	20	19	19	38
14	0.5	20	40	23	24	47
15	0.5	20	50	25	25	50
16	0.5	35	5	19	19	38
17	0.5	35	10	20	21	41
18	0.5	35	20	24	21	45
19	0.5	35	40	22	22	44
20	0.5	35	50	20	18	38
21	0.5	50	5	13	18	31
22	0.5	50	10	21	15	36
23	0.5	50	20	13	18	31
24	0.5	50	40	15	16	31
25	0.5	50	50	23	19	42