Next Article in Journal
Research on the Influence of Backfilling Mining in an Iron Mine with Complex Mining Conditions on the Stability of Surface Buildings
Previous Article in Journal
Environmental Communication on Twitter: The Impact of Source, Bandwagon Support, and Message Valence on Target Audiences
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Determination of Optimal Spatial Sample Sizes for Fitting Negative Binomial-Based Crash Prediction Models with Consideration of Statistical Modeling Assumptions

by
Mohammadreza Koloushani
1,*,†,
Seyed Reza Abazari
2,
Omer Arda Vanli
2,†,
Eren Erman Ozguven
1,†,
Ren Moses
1,†,
Rupert Giroux
3 and
Benjamin Jacobs
3
1
Department of Civil and Environmental Engineering, FAMU–FSU College of Engineering, Tallahassee, FL 32310, USA
2
Department of Industrial and Manufacturing Engineering, FAMU–FSU College of Engineering, Tallahassee, FL 32310, USA
3
Florida Department of Transportation, State Safety Office, Central Office, Tallahassee, FL 32399, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sustainability 2023, 15(20), 14731; https://doi.org/10.3390/su152014731
Submission received: 17 August 2023 / Revised: 4 October 2023 / Accepted: 9 October 2023 / Published: 11 October 2023

Abstract

:
Transportation authorities aim to boost road safety by identifying risky locations and applying suitable safety measures. The Highway Safety Manual (HSM) is a vital resource for US transportation professionals, aiding in the creation of Safety Performance Functions (SPFs), which are predictive models for crashes. These models rely on negative binomial distribution-based regression and misinterpreting them due to unmet statistical assumptions can lead to erroneous conclusions, including inaccurately assessing crash rates or missing high-risk sites. The Florida Department of Transportation (FDOT) has introduced context classifications to HSM SPFs, complicating the assumption of violation identification. This study, part of an FDOT-sponsored project, investigates the established statistical diagnostic tests to identify model violations and proposes a novel approach to determine the optimal spatial regions for empirical Bayes adjustment. This adjustment aligns HSM SPFs with regression assumptions. This study employs a case study involving Florida roads. Results indicate that a 20-mile radius offers an optimal spatial sample size for modeling crashes of all injury levels, ensuring accurate assumptions. For severe-injury crashes, which are less frequent and harder to predict, a 60-mile radius is suggested to fulfill statistical modeling assumptions. This methodology guides FDOT practitioners in assessing the conformity of HSM SPFs with intended assumptions and determining appropriate region sizes.

1. Introduction

Predictive crash models are used extensively by local and state transportation officials to understand the effects of various roadway and traffic-based factors on roadway crash rates, frequencies, and severities and to determine and assess the effectiveness of potential safety countermeasures. Previous studies have delved deeply into the examination of how environmental factors influence the frequency of traffic crashes within specific geographical areas. These investigations have aimed to uncover the intricate relationships between various environmental elements, such as road design, weather conditions, land use patterns, pavement condition, and geographic features, and their collective impact on road safety [1,2]. Lord et al. also conducted a study in 2005 that examined the performance of different regression models in predicting crash frequency, whereas the current study intends to focus primarily on a particular type of predictive model used by the Highway Safety Manual (HSM) [3]. The HSM includes specialized quantitative analysis tools known as safety performance functions (SPFs), which utilize existing data about road infrastructure to forecast the annual average number of accidents occurring at various types of road facilities, such as intersections and road segments, through the use of mathematical equations [4].
SPFs in the HSM have been developed through extensive research and statistical analysis of data collected across the United States’ roadways and are valuable to state and local transportation agencies for their ability to detect areas with safety concerns [5,6,7,8]. SPFs are mathematical models utilizing log-linear regression and adhering to the principles of the negative binomial distribution to estimate the probability of accidents occurring at roadway segments or intersections [9]. SPFs for segments are the subject of this study, and the basic form of the SPF equation consists of the Annual Average Daily Traffic (AADT) and segment length as the regressors. Due to the noticeable spatial difference across the US in terms of driving behavior, environmental characteristics, and roadway conditions, The American Association of State Highway and Transportation Official (AASHTO) recommends that the HSM SPFs need to be calibrated to better present regional conditions using multiplication Crash Modification Factors (CMFs) and Calibration Factors (CFs) [4]. While a considerable number of studies have focused on developing calibration methods to improve the effectiveness of SPFs, limited effort has been devoted to investigating the statistical reasons behind inaccurate crash frequency predictions, that may also cause incorrect interpretations of safety countermeasures, in terms of their effectiveness, to potentially decrease crash frequency. There may be instances where certain crash prediction statistical model assumptions may not be met in practice depending on the observed data; while this may not invalidate the analysis results, it is crucial that the practitioner is aware of the limitations [10]. To diagnose common modeling violations that may occur when using negative binomial-based SPFs with geocoded crash data, this paper examines several well-known statistical tests. It presents a new approach to determining the optimal spatial regional size (that determines how many segments are contained within the historical data) for implementing empirical Bayes adjustments for SPF crash prediction, in order to satisfy the regression assumptions of the model. The issues with model form adequacy (linearity), overdispersion, and the undercounting of zeros in modeling crash data with SPF models are studied and explicit diagnostic tests are developed. It is worth to emphasizing that the adoption of the negative binomial regression model was not a matter of our discretion but an inherent aspect of the methodology endorsed by the Highway Safety Manual (HSM) for the modeling of crash frequencies. In this paper, our principal objective does not pertain to the selection of a regression model; instead, it centers on offering guidance for the determination of the optimal spatial sample size for preexisting models, including those prescribed by the HSM. This methodology will be helpful for transportation practitioners to understand whether the intended modeling assumptions of the HSM SPF equations are in accordance with the crash data observed in the field.
To demonstrate the methodology, a case study that focused on modeling roadway segment crashes in the FDOT District 4 (including Broward, Indian River, Martin, Palm Beach, and St. Lucie counties) was presented. Multivehicle non-driveway crashes from the years 2015, 2016, 2017, and 2018 occurring on divided two-way four-lane urban and suburban arterial segments (U4D) of the study region are studied. SPFs’ crash models are estimated and assessed using generalized linear models. Throughout the following sections, we will discuss past calibration studies of SPFs, potential statistical violations in the crash models, the data we used for our research, methodology, results, and conclusions.

2. Literature Review

Although numerous studies extensively explore the effectiveness of various spatial models like Geographically Weighted Poisson Regression, Bayesian Conditional Autoregressive models (CAR), and various iterations of Extreme Gradient Boosting (XGBoost), this study has a distinct focus [11]. Furthermore, the adoption of innovative data collection techniques, such as Airborne LiDAR [12] and the Aerial Photography Look-Up System (APLUS) [13], facilitated the acquisition of high spatial resolution aerial imagery. This imagery, in turn, enabled the extraction of detailed information regarding drivable spaces and roadway geometry within intricate terrains. However, it is essential to note that the existing HSM SPFs models warrant further investigation for potential statistical violations that may not necessarily align with the requisite assumptions. Hence, the current study intends to introduce statistical diagnostic tests for the detection of model violations and put forward an innovative methodology for determining the optimal spatial regions for the empirical Bayes adjustment used in HSM SPFs. The capability of HSM SPFs has been validated for forecasting the number of accidents based on roadway facility classification, accident severity, and accident type. However, their overall efficiency in various regions compared to the ones similar to the base conditions is still under debate. To aid inter-jurisdictional transferability, recent studies have investigated the possibility of developing new SPF calibration methods instead of redeveloping SPFs [14]. For instance, Srinivasan and Carter conducted extensive research to calibrate the SPFs suggested by the HSM for North Carolina for nine crash types that occurred on 16 roadway types [7]. They also proposed a method to re-develop or re-calibrate SPFs in the future due to the expected changes in vehicle technology, engineering treatments, reporting practices, etc. In another research project, Sun et al., calibrated the HSM functions for local conditions in Missouri due to the significant variation in gathering required data in this across the state [15]. A highway safety management tool, called Safety Analyst™, was adopted by Kweon and Lim using SPFs developed and calibrated for multilane highway and freeway segments in Virginia [16]. Moreover, SPFs provided in the HSM have been regionalized by Donnell et al. for (1) rural two-lane highway segments and intersections; (2) rural multilane highway segments and intersections; and (3) urban and suburban arterial (non-freeway) segments and intersections in Pennsylvania [17]. Due to the ample dataset available for segments and intersections, they were able to fine-tune the SPFs at various levels including county, planning organization (both metropolitan and rural), and engineering district levels [17]. Khattak et al. [18] and Al-Deek et al. [19] also created customized calibrated models to predict the expected frequency of accidents for different types of road facilities for Tennessee and Florida, respectively.
Crash modification factors (CMFs) are employed to adjust for the influences of site-specific geometric design elements (e.g., lane width, shoulder width, horizontal curves) and traffic control features (e.g., automated speed enforcement). These adjustments are made to estimate accident frequencies for facilities that exhibit variations in design parameters from the baseline conditions upon which the safety performance functions (SPFs) were originally developed [20]. SPFs were established for roadway facilities based on fundamental criteria, including the number of lanes, lane widths, median widths, lighting conditions, and other relevant factors. Adjustments should be made as necessary when a roadway facility exhibits a design that deviates from the base conditions. The CMF for each geometric design or traffic control feature based on the SPF base condition is one, whereas features associated with higher crash frequencies than the base condition have CMFs that are greater than one, and features associated with lower crash frequencies than the base condition have CMFs that are less than one. Despite substantial endeavors in developing and calibrating SPFs within the HSM framework, discrepancies in prediction accuracy were noted in states where substantial variations exist in data collection practices, as well as in officer instructions for completing crash reports, among other factors. For example, Brimley et al. investigated the prediction ability of the models and identified that the SPFs in the HSM typically under-predicted the crash counts for rural two-lane two-way roadway segments based on their study in Utah [21]. Moreover, Gross et al. proposed a guideline in developing CMFs based on the available data and discussed the process for selecting an appropriate evaluation methodology [22]. Some states in the US calibrated the prediction models in Part C of the HSM using their own data, including crash frequency and the traffic and geometric features of the roadway. Some other jurisdictional agencies would like to develop their own SPFs instead of calibrating the existing ones from elsewhere to represent crash characteristics better [14,23,24]. In order to avoid the additional costs and efforts of developing local SPFs, Srinivasan et al., in 2016, proposed a methodology to develop a calibration function in case individual calibration factors were unable to provide a proper estimation for actual local crash data [25]. Subsequently, Farid et al. confirmed the proficiency of an innovative approach to developing calibration functions by employing K-Nearest Neighbor (KNN) regression [26].
In addition to the aforementioned CMFs, a multiplicative calibration factor (CF) has been defined by the HSM to improve SPF crash predictions by maintaining the original form of the model and the relation between independent variables and crashes. The HSM recommends that agencies used an unbiased sample of 30 to 50 sites to determine the jurisdiction calibration factor (CF) [4]. Srinivasan et al. offers an exhaustive, systematic guide for the development of SPFs and CFs, detailing each step of the process [27]. Certain jurisdictions may exhibit significant disparities in conditions within their boundaries, such as challenging snowy winter driving conditions or variations in the driving population, among other factors. Hence, any SPF calibration must consider these localized variations. The objective of calibration factors is to adjust the initially predicted average crash frequency, as per the default manual predictions, to align with specific local conditions, particularly tailored to the unique characteristics of the region, encompassing factors like climate, driver demographics, animal populations, crash reporting criteria, and crash reporting system protocols, as observed in the context of certain conditions (i.e., Florida conditions) [4].
Aside from the application of the CMFs and CFs to the base SPF equation, which enables practitioners to predict the number of crashes at sites with similar characteristics, the HSM recommends that an empirical Bayes (EB) adjustment be applied as a result of the recognition that the safety of a particular site can be more accurately assessed by taking into account the historical number of crashes previously observed at that location. This process allows us to mitigate the potential regression-to-the-mean effects that can arise due to the inherent inclination to choose high-incident crash sites for treatment. The EB method has been applied to the HSMs’ SPFs for many years and attracted even more attention in the literature [21]. As part of our study, we also reviewed the empirical Bayes method for dealing with overdispersed counts in crash prediction which included Hauer et al. [28], Hauer et al. [29], and Lord and Mannering [30]. More recently, Farid et al. proposed the modified empirical Bayes (MEB) method to develop segment-specific calibration factors for calibrating SPF [31]. The results indicated that the MEB method outperformed the calibration factor [31]; however, MEB’s practicality remains limited, since it requires sufficient observed crash data to provide a reliable prediction. Furthermore, Persaud et al. (2010) introduced a fully Bayesian (FB) method for conducting before–after treatment assessments in cases where obtaining a substantial set of reference observations, essential for calibrating conventional empirical Bayes (EB) approaches, is challenging. Their approach enhances flexibility in the utilization of crash frequency distributions [32]. While their method offers improved capability for traffic safety analysts to address uncertainty in the sample data compared to the EB approach, it has been acknowledged as a more intricate alternative [32].
In the current approach to the EB adjustment of SPFs, the study region from which the data are collected is assumed to be known, which is typically an entire state or a district. Das et al. developed new rules-based SPFs for low-volume rural local roadways on the basis of segment length and AADT as the most contributing factors and proposed a method to improve the model accuracy in terms of R-square and a cumulative residual (CURE) plot [33]. However, choosing a study region too large or too small can cause issues with the validity of the HSM-specified model form (goodness of fit) or the overdispersion parameter. Several authors have studied the effects of sample size with respect to goodness of fit [34] and overdispersion assumption (Lord, 2002) [35]; however, a systematic approach to determine a spatial sample size approach for a specific SPF model has not been studied. To address this gap, this research effort proposes a new method to determine an optimal spatial sample size for applying EB adjustment for SPF crash prediction analyses.

3. Study Area and Data Sources

The application of the diagnostic tests and the proposed spatial sample selection method is illustrated on crash data collected from the Florida Department of Transportation (FDOT) District 4 (including Broward, Indian River, Martin, Palm Beach, and St. Lucie counties). The selection of District 4 in Florida was based on several key factors, including its sponsorship by the FDOT, the imperative to scrutinize model violations, and the alignment of the district’s data and spatial variability with the objectives of our research. This choice has afforded us the opportunity to conduct a rigorous and enlightening case study that holds significant implications for the progression of road safety assessments. It is noteworthy that the proposed diagnostic test possesses the adaptability to be extended to other states and study regions with the incorporation of essential localized customizations. The current study intends to explore crashes involving multiple vehicles that occurred away from all types of intersection and were not under their influence. Accordingly, the proposed method filtered out the crash data using the attributes that represent the number of vehicles involved in the crash and remove crashes involving a single vehicle, pedestrians and bicycles, and intersections. The required crash dataset for the study area is acquired and assembled according to the following steps: (1) By excluding crashes associated with intersections by applying a filtering criterion based on a crash-related attribute that encompasses crashes taking place at intersections, those influenced by intersections, incidents at driveway accesses, and occurrences at railroad crossings; (2) Establishing a 250-foot-radius buffer around the central point of signalized intersections and eliminating crashes that fall entirely within these buffer zones. The HSM establishes a standard intersection influence zone with a default radius of 250 feet, measured from the intersection’s center [4]; (3) The extracting of multivehicle non-driveway-related and non-pedestrian/bicyclist-related crash types occurring on the divided two-way four-lane urban and suburban arterial segments (U4D); and (4) Utilizing the AADT line feature shapefile for segmentation and assigning crash counts to 50-feet buffer areas along the divided two-way four-lane urban and suburban arterial segments.
SPFs for two separate crash severities were calibrated for the final crash data set: (1) all crash types occurring on the U4D arterial segments, and (2) Fatal-and-Injury crash types occurring on the U4D arterial segments. Fatal-and-Injury (FI) crashes involve all levels of injury severity, i.e., fatalities (K), incapacitating injuries (A), non-incapacitating injuries (B), and possible injuries (C). A crash count can be determined by implementing the EB-based method on appropriate SPF data by counting the crash points and assigning them to their associated U4D arterial segments. Multivehicle non-driveway crashes from the years 2015, 2016, 2017, and 2018 are obtained from crash reports maintained by the Florida Department of Highway Safety and Motor Vehicles (FLHSMV). The crash data consists of individual points distributed across the road network, with each point denoting a vehicle crash and geospatially linked to the GIS shapefile through longitude and latitude coordinates. Table 1 summarizes the crash data categorized with respect to their associated KABCO scale that occurred in Florida District 4 during the study period. Furthermore, Figure 1 illustrates how the aforementioned crashes are distributed throughout the study area.
To follow the objectives of the research, the HSM SPFs, constructed for multiple-vehicle non-driveway crashes for U4D arterial segments, were examined to conduct a diagnostic test. The aforementioned SPFs formulate the predicted crash frequency based on several traffic and roadway geometric factors, including Average Annual Daily Traffic (AADT), segment length in mile, number of lanes, etc. The traffic and geometric attributes are typically furnished by the relevant divisions within transportation departments in the form of shapefiles or as-constructed drawings illustrating the roadway’s geometric characteristics. The current research validates the SPFs using data from following databases: (1) historical AADT volume measurements for state roadways through the FDOT Telemetered Traffic Monitoring Sites (TTMS) databases maintained by the Transportation Data and Analytics (TDA) office and (2) the FDOT Geographic Information System (GIS) system for roadway variables (e.g., speed limit, number of lanes, intersect angle) [36]. For the District 4 study area, we utilized the AADT shapefile for segmentation, which was calculated based on the average AADTs during the study period, i.e., 2015 to 2018. According to this segmentation criteria, District 4 has 1067 roadway segments. Based on the methodology described in the following section, the size of the geographical sub-region to be used in SPF modeling was identified.

4. Methodology

The Highway Safety Manual (HSM) provides a procedure in which 18 steps can be followed to estimate the expected average crash frequency using SPF crash prediction models [4]. The main objective of this study was to develop diagnostic tests for identifying modeling violations that may be encountered in using the negative binomial-based HSM SPFs in crash count modeling. As such, we intend to propose a methodology to determine the optimal size of the crash data set according to the associated level of injury in order to ensure that the modeling assumptions for SPF crash prediction are reasonably accurate when implementing the empirical Bayes (EB) method. Figure 2 provides a schematic overview of the steps for creating the data set and applying the SPF for crash prediction within the proposed methodology for the case study of multivehicle non-driveway crashes occurring along U4D arterial segments.
Accordingly, a subset of crash data including the multiple-vehicle crashes that were not affected by intersections and driveways was prepared. To simplify the estimation and avoid applying Crash Modification Factors (CMF) associated with base conditions, SPFs were constructed for multiple-vehicle non-driveway crashes for divided two-way (2 lanes in each directions) four-lane urban and suburban arterial segments in the study regions under the base conditions (i.e., the absence of automated speed enforcement, prohibited on-street parking, no lighting, no roadside fixed object density). As per the HSM’s description, U4D segments are characterized by four lanes with uninterrupted cross-sectional spaces, featuring two lanes allocated to each direction of travel, with a physical separation between them, which may include a distance or barrier [4]. The FDOT GIS database provides a shapefile containing a spatial attribute representing the number of lanes [36]. The HSM recommends the following negative binomial regression model for predicting multiple-vehicle non-driveway crashes:
N S P F = exp β 0 + β 1 ln ( A A D T ) + ln ( L )
where A A D T is the annual average daily traffic, L is the segment length (miles), and β 0 and β 1 are the regression coefficients. The HSM formulates regression coefficients for different crash categories based on the most severe level of injury [4]. Hence, we classify the crash data according to the KABCO scale and apply the relevant regression coefficients proposed by the HSM for the purpose of predicting crash frequencies. Table 2 presents the values of the coefficients β 0 and β 1 reported in the HSM to be used in applying Equation (1) for U4D segments.
In order to enhance the accuracy of the SPFs’ prediction results, a calibration is performed by applying a multiplicative Calibration Factor (CF), therefore its aggregate crash prediction within a whole jurisdiction is equal to the aggregate number of observed crashes. Accordingly, the observed crash data that occurred in the study area during 2015 and 2018 is counted by creating a buffer 50-feet wide along the homogenous roadway segments and counting the crashes that are located within the buffer and assigning them to the segments. Calibrating the model preserves the original model form and modifies the predicted average crash frequency from the default manual predictions to account for local characteristics (i.e., Florida). According to the FDOT’s recommendations, the CF for an urban four-lane divided roadway (U4D) is equal to 1.63 [37].
Given the precision of our crash data, the site-specific empirical Bayes (EB) method is a suitable choice. Consequently, Equation (2) has been incorporated to implement the empirical Bayes method. This decision is grounded in the understanding that the most accurate estimation of a site’s safety takes into account both the actual crash count at the site and the expected crash count at sites possessing comparable attributes as predicted by the SPFs. For District 4, the average crash frequency is predicted based on the crashes that occurred between 2015 and 2017 and validated using the observed crashes in 2018.
N E x p e c t e d = w × N S P F + 1.00 w × N O b s e r v e d
where w is a weight factor defined as a function of the SPFs’ overdispersion parameter (see Table 2), k , to combine the two estimates:
w = 1 1 + k i A l l   s t u d y   y e a r s N p r e d i c t e d , i
To test the adequacy of the SPF model, we fit a negative binomial (NB) regression model to the data using the functional form and the variables specified in the SPF for each crash severity and assess the adequacy of the model using statistical diagnostic tests. In particular, we employ three model adequacy tests: A test for linearity (adequacy of the functional form) of the NB model, a test for the closeness of the estimated overdispersion parameter to the HSM value, and a test for excess zeros in the NB model. For linearity, we employ a chi-square goodness of fit (GOF) test, which asks whether the assumed linear model functional form is adequate. Assuming that the model is valid, the deviance of the NB model is distributed with a chi-square distribution of degree of freedom equal to np. For a NB regression [38], deviance is calculated using Equation (4); the test rejects the hypothesis that the model functional form is adequate if the p-value of the test, found using Equation (5), is below some level of significance (typically 0.05).
D = 2 i = 1 n y i log ( y i / μ ^ i ) ( y i μ ^ i )
p G O F = P ( χ n p 2 > D )
Therefore, for a given geographical region, if p G O F is close to one, the functional form of the HSM-specified NB model is adequate. The second test checks whether the overdispersion parameter estimated from the data agrees well with the overdispersion values provided by the HSM published for a given facility (roadway segment or intersection) and crash type (see Table 2). The overdispersion parameter is estimated based on Equation (6).
k ^ = 1 n p i = 1 n y i μ ^ i 2 μ ^ i
where n is the number of years of data used to fit the model and p is the number of explanatory variables in the SPF, y i and μ ^ i are observed crash counts and crash counts predicted by the SPF, respectively. For the HSM-specified NB model to be adequate in terms of overdispersion parameter we want k ^ to be close to k H S M , the HSM published overdispersion parameters for a given facility (See Table 2). The third test determines whether there is a larger number of zeros or small counts in the crash data than what the NB regression model can represent. If the data set contains a large number of zeros, the zero-inflation problem, the predictive capability of the NB regression model can be adversely impacted. The likelihood ratio test compares the fit of a zero-inflated NB regression model (which contains a hurdle part) to an NB regression for the given data [38]. The two parts of the zero-inflated model, the hurdle (the zero) model, and the count (the nonzero) model can have different sets of explanatory variables. The likelihood ratio test based on Equation (7) determines whether a hurdle part, that models only the zeros using a binary logit, is needed in the NB regression model.
L L R = 2 ( ln L 1 ln L 2 )
L 1 is the maximized likelihood of a zero-inflated NB regression model (model 1), that includes the hurdle part, and L 2 is the maximized likelihood of the NB regression model (model 2) that does not include the hurdle part. Because the two models are hierarchical, with model 1 encompassing model 2, the likelihood ratio test statistic will conform to a χ 2 distribution, where the degrees of freedom are determined by the disparity in the number of parameters, denoted as p 1 p 2 . This holds true if both models 1 and 2 exhibit equivalent goodness-of-fit, meaning that the inclusion of the hurdle part does not enhance the model’s fit. If the p-value of the likelihood ratio test, defined as Equation (8), is significant (e.g., smaller than 0.05) then we conclude that there are excess zeros or small counts in the data and a zero-inflated NB should instead be used to model the crash data.
p Z I = P ( χ p 1 p 2 2 > L L R )
Utilizing the statistical diagnostic tests, this paper proposes a new spatial scan method in order to determine the best region size to use in estimating an EB-adjusted SPF model. The method considers a sequence of overlapping circular subregions and uses all the historical crash counts observed for the segments within the segment to fit an SPF model. For a circular subregion radius, a subregion i is constructed by pooling in all crash data for 3 years (2015–2017), the AADT, and length data for the ith segment and all segments contained within a subregion with a radius of R miles. A negative binomial model (both ordinary and zero-inflated) of the form given in Equation (1) is fitted to the data and the linearity test p-value p G O F , i is computed using Equation (5) and the overdispersion parameter k ^ i is computed using Equation (6). The scanning is repeated for n overlapping circular subregions in the study region and the metrics are computed for i = 1, 2,…, n. To combine the metrics obtained from all n subregions in the entire study region with a single number, the following summary metrics are defined:
S o v e r d i s p = 1 / n i = 1 n k ^ i / k H S M 1 2
S n o n l i n e a r = 1 / n i = 1 n p G O F , i 1 2
S o v e r a l l = S o v e r d i s p + S n o n l i n e a r
S o v e r d i s p combines the deviations of the estimated overdispersion parameters k ^ i using subregions of a radius of R miles from the ideal HSM overdispersion parameter k H S M . S n o n l i n e a r combines the deviations of the goodness of fit p-value p G O F , i using subregions of a radius of R miles from the ideal value of 1, which implies the linear SPF model form is a perfect fit. The analysis is repeated and S o v e r a l l is calculated for increasing R values. The goal of the spatial scan analysis is to identify a good subregion size that satisfies the statistical modeling assumptions and identifies the subregions within the study region that cannot be adequately represented using an NB regression model. Therefore, a large value of the combined metric S o v e r a l l indicates that the subregion radius R used in modeling could be revised (to a smaller or larger value) so that so that the modeling assumptions are better satisfied. The empirical results from the study area reveal that the subregion radius R has small impact on the zero-inflation test results. Therefore, in this method a summary metric for the zero-inflation test is not included.

5. Results and Discussion

In this section, we present the diagnostic results of the SPFs with data extracted from Florida roadways. Note that the objective of the diagnostic tests is to determine whether the N S P F functional form that was shown by Equation (1) earlier represents the observed crash data well. To compute the diagnostic test statistics, the coefficients are estimated from observed crash data, using the generalized linear modeling suite in the R statistical computing language. The estimated coefficients may therefore be different from the values given in the HSM (see Table 2) depending on the observed crash counts. In addition, the overdispersion parameter estimated from data using Equation (6) is compared to the HSM value (k = 1.32) provided in Table 2.
Negative binomial regression models are fitted to crash count, AADT, and segment length data for 2015, 2016, and 2017 in this subregion. According to the fitted models for a sample segment, the residual diagnostic tests results are found as shown in Table 3. The results indicate that the linearity of the NB model with a 10-mile radius is better than the NB model with a 15-mile radius (p-value of the 10-mile-radius model is larger than 15-mile-radius model). The Likelihood Ratio Test (LRT) for the zero-inflated NB model with a 10-mile radius is also better than the NB model with a 15-mile radius (p-value of the 10-mile-radius model is larger than 15-mile-radius model). The estimated overdispersion parameter of the 15-mile-radius model is closer to the HSM overdispersion parameter than the 10-mile-radius model.
From this single case, it appears that a smaller subregion is better in terms of the functional form accuracy and excess zeros; however, a larger subregion is better in terms of the overdispersion parameter estimation. To extend the analysis performed on a single roadway segment, the model fitting and diagnostic testing steps are repeated for all roadway segments within the study region considering the roadway network distance between the centroid of the adjacent segments. A spatial scan of all roadway segments in the study region is conducted and the diagnostic metrics are computed from overlapping circular subregions centered at the roadway segments. The scanning analysis is conducted for each of the 1068 roadway segments in the study area and all crash counts. Each summary metric, defined by Equation (9) to Equation (11), combines the deviations of the subregion models from an ideal model, and to have a good overall performance the value of the summary metric would be close to 0. Figure 3 illustrates segments within the study area that violate the diagnostic criteria based on the test statistics, considering a variety of values for the radius. For nonlinearity in negative binomial regression (Figure 3a), the segments for which p G O F , i < 0.015 are highlighted. For low overdispersion (Figure 3b), the segments for which k ^ i < 0.55 k H S M are highlighted (i.e., 55% of the HSM value is considered to be a cut-off value). For zero-inflation (Figure 3c) the segments for which p Z I , i < 0.05 are highlighted. It is worth mentioning that for a radius of less than 20 miles zero-inflated models are not estimable and for these radius values the maps for the zero-inflation test’s p-values are not shown.
The aforementioned results indicate that the number of segments violating the assumptions changes depending on the subregion radius. Edges of the region appear to have some effect in the diagnostic metric values. A very large radius makes the linear model unable to capture the variation in the crash counts and nonlinearity becomes an issue. A too small region results in inaccuracies with the overdispersion representation. Increasing the radius generally helps overdispersion but negatively affects linearity. The subregions with overdispersion issues usually are less bad when we use a larger radius. That is, a region identified as less overdispersed than the HSM value may not be such if we used a larger radius. When using a smaller neighborhood radius, the subregions with nonlinearity issues tend to be less problematic. This means that a region defined as nonlinear may not actually be so if a smaller radius was used. As the radius increases, the zero-inflation issue remains unchanged.
The same procedure has been followed for KABC crashes and the obtained results indicate that the zero-inflation problem is more prominent than was observed with all crashes, since the KABC crashes occur less frequently (See Table 1). However, the linearity in model parameters and overdispersion assumptions are more closely satisfied. Moreover, the summary measures (Equation (9) to Equation (11)) are computed for a wide range of radius values and shown in Figure 4. The overall summary measures are computed for a 7-, 10-, 13-, 16-, 20-, and 25-mile radius and plotted in Figure 4a. The results indicate that concerning all crash data, the 20-mile radius provides the best tradeoff between nonlinearity and overdispersion (See Figure 4a). Note that because the summary metrics are defined as averages of the model results from all segments in the study area, the recommended 20-mile radius is a good enough subregion size to analyze any roadway segment regardless of its location within District 4. Our recommendation is to use the HSM-recommended SPF with a 20-mile radius for all segments except for the segments shown in purple in Figure 3c, for which a zero-inflated NB regression is recommended. While for the KABC crashes Figure 4b illustrates that the overall summary metrics continue to decrease with increasing subregion radius, the decrease in the reduction tapers off around a 60-mile radius. Therefore, for modeling KABC crashes, at least a 60-mile radius region is recommended to satisfy the assumptions of the SPF model. Summary measures are computed for a 20-, 30-, 40-, 50-, 60-, and 80-mile radius are shown in Figure 4b.
While the proposed radii of 20 and 60 miles ensure that the statistical assumptions associated with HSM SPFs are met, the relatively large region probably contributes to the unexplained heterogeneity within the specified region in terms of its special, geometric, and traffic characteristics. In order to resolve this issue, the FDOT developed and implemented a classification method that categorizes roadway segments based on the existing land use and development pattern into eight main categories, that is, C1—Natural, C2—Rural, C2T—Rural Town, C3R—Suburban Residential, C3C—Suburban Commercial, C4—Urban General, C5—Urban Center, and C6—Urban Core [19,39,40]. Hence, the recommended diagnostic examination has the capacity to perform a segmented analysis by leveraging the FDOT’s context classification system, which classifies a diverse range of roadway network facilities.

6. Conclusions and Future Work

This paper intended to develop an innovative diagnostic test to determine how and to what extent crash data could probably violate the statistical assumptions associated with nonlinearity in negative binomial regression, overdispersion, and zero inflation. Based on the methodology developed in this research, the size of the geographical subregion to be used in SPF modeling is identified. We concluded that if the interest is in the SPF prediction of any type of crash, crash data for 3 years from all roadway segments in a 20-mile radius around the roadway segment that is under study should be used to implement the EB method. By contrast, if the interest is on predicting severe injury crashes (KABC), which is a rarer event, data should be gathered from a larger region which should have at least a 60-mile radius. The selection of the 20-mile and 60-mile radii for our diagnostic test is rooted in rigorous mathematical analysis based on crash data and statistical modeling. These radii are chosen to ensure the fulfillment of statistical assumptions, particularly in negative binomial regression, despite the inherent variability within transportation networks. Transportation systems naturally exhibit variations in road attributes, weather conditions, and traffic, even within relatively small areas. Our approach acknowledges these real-world complexities and guides practitioners in assessing SPF conformity with statistical assumptions while allowing flexibility for adjustments to suit specific conditions and goals. This method strikes a balance between statistical rigor and practical applicability, offering valuable insights for road safety assessments within the dynamic context of transportation systems. Knowledge gained from this study can help practitioners to decide on how to take corrective action to satisfy the assumptions. If the linearity in the model parameters’ assumption is in question, different algebraic transformations of AADT and/or length, such as the square-root or reciprocal, may be tried instead of logarithmic transformations. If the overdispersion parameter is in question, then the overdispersion parameter estimated from the data, instead of the HSM recommendation, may be used. If the excess 0s are in question then a zero-inflated NB regression may be used. In practice, some of the modeling assumptions would not be met depending on the observed data; however, this does not invalidate the analysis results as long as the practitioner is aware of the limitations. Some of the assumptions would have small impacts while others would be more detrimental on crash modeling. Thus, the suggested diagnostic test can conduct a segmented analysis using the context classification system implemented by the FDOT, which categorizes various types of roadway network facilities. Furthermore, the proposed method for determining an appropriate sample size has the potential to be integrated into machine learning-based validation techniques, thereby enhancing the accuracy and consistency of predictive tools for network screening [41]. In summary, the benefits of our method include more accurate road safety assessments, improved resource allocation, context-aware analysis, and the potential for advanced integration with machine learning techniques. These advantages contribute to the overall goal of reducing traffic accidents and enhancing road safety, which is of paramount importance for transportation authorities and society at large. It is noteworthy that although our case study was conducted in Florida, the methodology we have devised and the principles we have established possess transferability and can be effectively employed by transportation authorities and regions outside of Florida. Through the adoption of our approach, authorities have the potential to enhance the precision of their road safety assessments, optimize resource allocation, and ultimately make strides in reducing traffic accidents and elevating the road safety conditions within their specific jurisdictions.

Author Contributions

Conceptualization, M.K., O.A.V., E.E.O., R.M., R.G. and B.J.; methodology, M.K., S.R.A., O.A.V., E.E.O., R.M., R.G. and B.J.; software, M.K. and O.A.V.; validation, M.K., S.R.A., O.A.V., E.E.O. and R.M.; formal analysis, M.K. and O.A.V.; investigation, M.K., O.A.V., E.E.O. and R.M.; resources, O.A.V., R.G. and B.J.; data creation, M.K.; writing—original draft preparation, M.K., O.A.V., E.E.O. and R.M.; writing—review and editing, M.K., O.A.V., E.E.O. and R.M.; visualization, M.K.; supervision, O.A.V., E.E.O., R.M., R.G. and B.J.; project administration, O.A.V.; funding acquisition, O.A.V. All authors have read and agreed to the published version of the manuscript.

Funding

This study was sponsored by the State of Florida Department of Transportation (FDOT) grant BDV30-945-001. The opinions, findings, and conclusions expressed in this paper are those of the authors and not necessarily those of the State of Florida’s Department of Transportation.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The necessary crash data were acquired from the FDOT Safety Office through the Department of Highway Safety and Motor Vehicles (DHSMV) Crash Analysis Reporting (CAR) system. Access to this source is limited to authorized users, including FDOT staff, consultants, governmental agencies, and universities, pending approval by the FDOT. As this paper is affiliated with the FDOT project (grant BDV30-945-001), the authors possess access to the database.

Conflicts of Interest

The authors declare that they have no competing financial interest or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Bavar, M.S.; Naderan, A.; Saffarzadeh, M. Evaluating the spatial effects of environmental influencing factors on the frequency of urban crashes using the spatial Bayes method based on Euclidean distance and contiguity. Transp. Eng. 2023, 12, 100181. [Google Scholar] [CrossRef]
  2. Jaber, A.; Juhász, J.; Csonka, B. An Analysis of Factors Affecting the Severity of Cycling Crashes Using Binary Regression Model. Sustainability 2021, 13, 6945. [Google Scholar] [CrossRef]
  3. Lord, D.; Washington, S.P.; Ivan, J.N. Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: Balancing statistical fit and theory. Accid. Anal. Prev. 2005, 37, 35–46. [Google Scholar] [CrossRef] [PubMed]
  4. AASHTO. Highway Safety Manual; AASHTO: Washington, DC, USA, 2010. [Google Scholar]
  5. Abdel-Aty, M.A.; Lee, C.; Park, J.; Wang, J.H.; Abuzwidah, M.; Al-Arifi, S. Validation and Application of Highway Safety Manual (Part D) in Florida. 2014. Available online: https://rosap.ntl.bts.gov/view/dot/27272 (accessed on 8 October 2023).
  6. Alluri, P.; Saha, D.; Liu, K.; Gan, A. Improved Processes for Meeting the Data Requirements for Implementing the Highway Safety Manual (HSM) and Safety Analyst in Florida. 2014. Available online: https://rosap.ntl.bts.gov/view/dot/27226 (accessed on 8 October 2023).
  7. Srinivasan, R.; Carter, D. Development of Safety Performance Functions for North Carolina. 2011. Available online: https://rosap.ntl.bts.gov/view/dot/23607 (accessed on 8 October 2023).
  8. Wang, J.H.; Abdel-Aty, M.; Lee, J. Examination of the Transferability of Safety Performance Functions for Developing Crash Modification Factors: Using the Empirical Bayes Method. Transp. Res. Rec. J. Transp. Res. Board 2016, 2583, 73–80. [Google Scholar] [CrossRef]
  9. Poch, M.; Mannering, F. Negative Binomial Analysis of Intersection-Accident Frequencies. J. Transp. Eng. 1996, 122, 105–113. [Google Scholar] [CrossRef]
  10. Dong, Q.; Jiang, X.; Huang, B.; Richards, S.H. Analyzing Influence Factors of Transverse Cracking on LTPP Resurfaced Asphalt Pavements through NB and ZINB Models. J. Transp. Eng. 2013, 139, 889–895. [Google Scholar] [CrossRef]
  11. Ziakopoulos, A.; Vlahogianni, E.; Antoniou, C.; Yannis, G. Spatial predictions of harsh driving events using statistical and machine learning methods. Saf. Sci. 2022, 150, 105722. [Google Scholar] [CrossRef]
  12. Dow, R.; Zhang, S.; Bogus, S.M.; Han, F. Drivable Space Extraction from Airborne LiDAR and Aerial Photos. In Construction Research Congress 2022; American Society of Civil Engineers: Reston, VA, USA, 2022; pp. 154–163. [Google Scholar] [CrossRef]
  13. Karaer, A.; Kaczmarek, W.; Mank, E.; Ghorbanzadeh, M.; Koloushani, M.; Dulebenets, M.A.; Moses, R.; Sando, T.; Ozguven, E.E. Traffic Data on-the-Fly: Developing a Statewide Crosswalk Inventory Using Artificial Intelligence and Aerial Images (AI2) for Pedestrian Safety Policy Improvements in Florida. Data Sci. Transp. 2023, 5, 7. [Google Scholar] [CrossRef]
  14. Lu, J.; Haleem, K.; Alluri, P.; Gan, A.; Liu, K. Developing local safety performance functions versus calculating calibration factors for SafetyAnalyst applications: A Florida case study. Saf. Sci. 2014, 65, 93–105. [Google Scholar] [CrossRef]
  15. Sun, C.; Brown, H.; Edara, P.; Claros, B.; Nam, K. Calibration of the Highway Safety Manual for Missouri; Mid-America Transportation Center: Lincoln, NE, USA, 2013. [Google Scholar]
  16. Kweon, Y.; Lim, I. Development of Safety Performance Functions for Multilane Highway and Freeway Segments Maintained by the Virginia Department of Transportation. 2014. Available online: https://trid.trb.org/view/1311856 (accessed on 8 October 2023).
  17. Donnell, E.T.; Gayah, V.V.; Li, L. Regionalized Safety Performance Functions. Final Report Pennsylvania Deptement Transportation FHWA-PA-2016-001-PSU WO 17. 2016. Available online: https://rosap.ntl.bts.gov/view/dot/39904 (accessed on 8 October 2023).
  18. Khattak, A.; Ahmad, N.; Mohammadnazar, A.; MahdiNia, I.; Wali, B.; Arvin, R. Highway Safety Manual Safety Performance Functions & Roadway Calibration Factors: Roadway Segments Phase 2, Part; Department of Transportation: Nashville, TN, USA, 2020.
  19. Al-Deek, H.; Sandt, A.; Gamaleldin, G.; McCombs, J.; Blue, P. A Roadway Context Classification Approach for Developing Safety Performance Functions and Determining Traffic Operational Effects for Florida Intersections; Department of Transportation: Tallahassee, FL, USA, 2020.
  20. Kitali, A.E.; Sando, T.; Castro, A.; Kobelo, D.; Mwakalonge, J. Using Crash Modification Factors to Appraise the Safety Effects of Pedestrian Countdown Signals for Drivers. J. Transp. Eng. Part A Syst. 2018, 144, 4018011. [Google Scholar] [CrossRef]
  21. Brimley, B.K.; Saito, M.; Schultz, G.G. Calibration of highway safety manual safety performance function: Development of New Models for Rural Two-Lane Two-Way Highways. Transp. Res. Rec. 2012, 2279, 82–89. [Google Scholar] [CrossRef]
  22. Gross, F.; Persaud, B.; Lyon, C. A Guide to Developing Quality Crash Modification Factors. 2010. Available online: http://www.cmfclearinghouse.org/collateral/cmf_guide.pdf (accessed on 8 October 2023).
  23. Young, J.; Park, P.Y. Benefits of small municipalities using jurisdiction-specific safety performance functions rather than the Highway Safety Manual’s calibrated or uncalibrated safety performance functions. Can. J. Civ. Eng. 2013, 40, 517–527. [Google Scholar] [CrossRef]
  24. Ulak, M.B.; Ozguven, E.E.; Karabag, H.H.; Ghorbanzadeh, M.; Moses, R.; Dulebenets, M. Development of Safety Performance Functions for Restricted Crossing U-Turn Intersections. J. Transp. Eng. Part A Syst. 2020, 146, 04020038. [Google Scholar] [CrossRef]
  25. Srinivasan, R.; Colety, M.; Bahar, G.; Crowther, B.; Farmen, M. Estimation of Calibration Functions for Predicting Crashes on Rural Two-Lane Roads in Arizona. Transp. Res. Rec. J. Transp. Res. Board 2016, 2583, 17–24. [Google Scholar] [CrossRef]
  26. Farid, A.; Abdel-Aty, M.; Lee, J. A new approach for calibrating safety performance functions. Accid. Anal. Prev. 2018, 119, 188–194. [Google Scholar] [CrossRef]
  27. Srinivasan, R.; Bauer, K. Safety Performance Function Development Guide: Developing JurisdictionSpecific SPFs; Federal Highway Administration, Office of Safety: Washington, DC, USA, 2013.
  28. Hauer, E. Overdispersion in modelling accidents on road sections and in Empirical Bayes estimation. Accid. Anal. Prev. 2001, 33, 799–808. [Google Scholar] [CrossRef]
  29. Hauer, E.; Harwood, D.W.; Council, F.M.; Griffith, M.S. Estimating Safety by the Empirical Bayes Method: A Tutorial. Transp. Res. Rec. J. Transp. Res. Board 2002, 1784, 126–131. [Google Scholar] [CrossRef]
  30. Lord, D.; Mannering, F. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transp. Res. Part A Policy Pract. 2010, 44, 291–305. [Google Scholar] [CrossRef]
  31. Farid, A.; Abdel-Aty, M.; Lee, J.; Eluru, N.; Wang, J.H. Exploring the transferability of safety performance functions. Accid. Anal. Prev. 2016, 94, 143–152. [Google Scholar] [CrossRef]
  32. Persaud, B.; Lan, B.; Lyon, C.; Bhim, R. Comparison of empirical Bayes and full Bayes approaches for before–after road safety evaluations. Accid. Anal. Prev. 2010, 42, 38–43. [Google Scholar] [CrossRef]
  33. Das, S.; Tsapakis, I.; Khodadadi, A. Safety performance functions for low-volume rural minor collector two-lane roadways. IATSS Res. 2021, 45, 347–356. [Google Scholar] [CrossRef]
  34. Wood, G.R. Generalised linear accident models and goodness of fit testing. Accid. Anal. Prev. 2002, 34, 417–427. [Google Scholar] [CrossRef] [PubMed]
  35. Lord, D. Modeling motor vehicle crashes using Poisson-gamma models: Examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter. Accid. Anal. Prev. 2006, 38, 751–766. [Google Scholar] [CrossRef]
  36. Florida Department of Transportation (FDOT). Statewide Traffic Data Files. Available online: https://www.fdot.gov/statistics/trafficdata/default.shtm (accessed on 8 October 2023).
  37. FDOT Safety Office. FDOT Highway Safety Manual User Guide 2015. Available online: https://www.fdot.gov/safety/safetyengineering/publications-and-manuals.shtm (accessed on 8 October 2023).
  38. Faraway, J.J. Extending the Linear Model with R; Chapman and Hall/CRC: New York, NY, USA, 2016. [Google Scholar] [CrossRef]
  39. Florida Department of Transportation. FDOT Context Classification Guide. 2020. Available online: https://fdotwww.blob.core.windows.net/sitefinity/docs/default-source/roadway/completestreets/files/fdot-context-classification.pdf?sfvrsn=12be90da_2 (accessed on 8 October 2023).
  40. Gamaleldin, G.; Al-Deek, H.; Sandt, A.; McCombs, J.; El-Urfali, A.; Uddin, N. Developing context-specific safety performance functions for Florida intersections to more accurately predict intersection crashes. J. Transp. Saf. Secur. 2020, 14, 607–629. [Google Scholar] [CrossRef]
  41. Tayebikhorami, S.; Sacchi, E. Validation of Machine Learning Algorithms as Predictive Tool in the Road Safety Management Process: Case of Network Screening. J. Transp. Eng. Part A Syst. 2022, 148, 04022068. [Google Scholar] [CrossRef]
Figure 1. Crash distribution for all levels of injury in District 4 during 2015–2018.
Figure 1. Crash distribution for all levels of injury in District 4 during 2015–2018.
Sustainability 15 14731 g001
Figure 2. Schematic overview of the methodology.
Figure 2. Schematic overview of the methodology.
Sustainability 15 14731 g002
Figure 3. Map of subregion centroids that do not meet statistical assumptions: (a) segments with severe nonlinearity in negative binomial regression, (b) segments with low overdispersion, and (c) segments with zero inflation (for radius < 20 miles zero-inflated models are not estimable).
Figure 3. Map of subregion centroids that do not meet statistical assumptions: (a) segments with severe nonlinearity in negative binomial regression, (b) segments with low overdispersion, and (c) segments with zero inflation (for radius < 20 miles zero-inflated models are not estimable).
Sustainability 15 14731 g003aSustainability 15 14731 g003b
Figure 4. Summary measures for increasing subregion radii in District 4: (a) all crashes and (b) KABC crashes.
Figure 4. Summary measures for increasing subregion radii in District 4: (a) all crashes and (b) KABC crashes.
Sustainability 15 14731 g004
Table 1. Annual crash counts for District 4 during 2015 to 2018.
Table 1. Annual crash counts for District 4 during 2015 to 2018.
Crash TypeYear
2015201620172018
All Crashes in Florida374,342395,785402,385403,626
All Crashes in District 476,02585,61183,68886,596
All Considered Crashes *31,01337,00334,53838,630
All Considered Crashes on U4D6154753864746289
U4D KABC Crashes **1334164714401371
U4D PDO Crashes ***3870488141634113
* Multiple Vehicle, Non-Driveway, Not at Intersection, No Pedestrian, and No Bicyclist. ** KABC Crash: Fatal (K), Incapacitating (A), Non-Incapacitating (B), and Possible Injury (C). *** PDO Crash: Property Damage Only Crash.
Table 2. HSM SPF coefficients for multiple-vehicle non-driveway crashes on U4D [4].
Table 2. HSM SPF coefficients for multiple-vehicle non-driveway crashes on U4D [4].
KABCO ScaleCrash Type
(Level of Injury)
β 0 β 1 Overdispersion
Parameter
KABCOTotal Crashes−12.341.361.32
KABCFatal-and-Injury Crashes−12.761.281.31
PDOPDO Crashes−12.811.381.34
Table 3. Diagnostic tests for 10-mile and 15-mile radius for a sample segment.
Table 3. Diagnostic tests for 10-mile and 15-mile radius for a sample segment.
Diagnostic Test10 Miles15 Miles
GOF test p-value0.02040.0054
Estimated k0.64690.6600
Zero inflation p-value0.98510.2623
HSM k-value1.321.32
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Koloushani, M.; Abazari, S.R.; Vanli, O.A.; Ozguven, E.E.; Moses, R.; Giroux, R.; Jacobs, B. Determination of Optimal Spatial Sample Sizes for Fitting Negative Binomial-Based Crash Prediction Models with Consideration of Statistical Modeling Assumptions. Sustainability 2023, 15, 14731. https://doi.org/10.3390/su152014731

AMA Style

Koloushani M, Abazari SR, Vanli OA, Ozguven EE, Moses R, Giroux R, Jacobs B. Determination of Optimal Spatial Sample Sizes for Fitting Negative Binomial-Based Crash Prediction Models with Consideration of Statistical Modeling Assumptions. Sustainability. 2023; 15(20):14731. https://doi.org/10.3390/su152014731

Chicago/Turabian Style

Koloushani, Mohammadreza, Seyed Reza Abazari, Omer Arda Vanli, Eren Erman Ozguven, Ren Moses, Rupert Giroux, and Benjamin Jacobs. 2023. "Determination of Optimal Spatial Sample Sizes for Fitting Negative Binomial-Based Crash Prediction Models with Consideration of Statistical Modeling Assumptions" Sustainability 15, no. 20: 14731. https://doi.org/10.3390/su152014731

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop