Traffic Status Prediction Based on Multidimensional Feature Matching and 2nd-Order Hidden Markov Model (HMM)

Li, Fei; Liu, Kai; Chen, Jialiang

doi:10.3390/su152014671

Open AccessArticle

Traffic Status Prediction Based on Multidimensional Feature Matching and 2nd-Order Hidden Markov Model (HMM)

by

Fei Li

,

Kai Liu

and

Jialiang Chen

^*

School of Transportation and Logistics, Dalian University of Technology, Dalian 116024, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(20), 14671; https://doi.org/10.3390/su152014671

Submission received: 9 September 2023 / Revised: 3 October 2023 / Accepted: 8 October 2023 / Published: 10 October 2023

(This article belongs to the Special Issue Sustainable Transportation and Urban Planning)

Download

Browse Figures

Versions Notes

Abstract

:

Spatiotemporal data from urban road traffic are pivotal for intelligent transportation systems and urban planning. Nonetheless, missing data in traffic datasets is a common challenge due to equipment failures, communication issues, and monitoring limitations, especially the missing not at random (MNAR) problem. This research introduces an approach to address MNAR-type missing data in traffic status prediction, utilizing a multidimensional feature sequence and a second-order hidden Markov model (2nd-order HMM). First, this approach involves extracting spatiotemporal features for the preset data sections and spatial features for the sections to be predicted based on the traffic spatiotemporal characteristics. Second, using the extracted features, distinctive road traffic features are generated for each section. Furthermore, at specific intervals within the defined time period, nearest distance feature matching is introduced to ascertain the traffic attributes of the road section under prediction. Finally, relying on the matched status results, a 2nd-order HMM is employed to forecast the traffic status for subsequent moments within the defined time period. Experiments were carried out using datasets from Shenzhen City and compared against the hidden Markov models and contrast measure (HMM-C) method to affirm the efficacy of the proposed approach.

Keywords:

intelligent transportation; traffic status; multidimensional feature sequences; feature matching; 2nd-order HMM

1. Introduction

The spatiotemporal data of urban road traffic are crucial for intelligent transportation systems and urban planning and are the basis for identifying traffic status and building sustainable transportation networks. Traffic status is computed through the use of V2I or V2V roadside Internet of Things devices or in-vehicle equipment, which collect data such as vehicle speed and traffic flow [1]. These computations yield metrics such as congestion mileage ratios, travel time ratios, and comprehensive evaluations, all of which serve as indicators for assessing traffic conditions [2]. Traffic status prediction can help urban traffic management authorities take measures to optimize traffic flow, improve traffic efficiency, and reduce traffic congestion, thereby reducing energy waste and environmental pollution. On the other hand, predicting traffic conditions helps plan more efficient transportation routes, encourages people to use sustainable modes of transportation, and consequently reduces the necessity for vehicles to come to a halt for extended periods of congestion. This, in turn, diminishes the emission of carbon dioxide and other pollutants from vehicles and contributes to mitigating the impact of transportation on climate change.

However, due to various reasons, such as equipment failures, communication issues, and monitoring range limitations, traffic data often suffer from missing values. Depending on the mechanism of data missingness, missing data can be categorized into three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) [3]. In the realm of traffic analysis, MCAR denotes missing traffic data that occur randomly and independently of other observed values, such as the occasional data point missing due to equipment malfunctions. MAR, on the other hand, signifies that traffic data are missing randomly, but the missingness is related to one or several other observed values, for example, data missing during specific time intervals related to time observations. MNAR, which stands apart from MCAR and MAR, represents missing data where the mechanism of missingness is related to unobserved spatiotemporal information, as in the case of block missing data [4,5,6].

In the context of MNAR, the probability of missing data is not random but rather related to the traffic flow itself or correlated with the distribution of monitoring devices. Presently, most approaches for handling MNAR data are based on neural network methods [7]. However, deep learning models come with challenges such as computational complexity, the need for tuning numerous hyperparameters, and the risk of overfitting [8]. The hidden Markov model (HMM) establishes probabilistic relationships between hidden status and observations, making it well suited for handling discrete traffic status sequences. Nevertheless, traditional HMM methods require substantial labeled data for training, and they assume that the current status only depends on the previous status, limiting their ability to capture longer-term dependencies, especially when handling extended temporal relationships.

To address these limitations, we propose a traffic status prediction method specifically designed for MNAR-type missing data based on multidimensional feature sequences and a 2nd-order hidden Markov model (2nd-order HMM). The research’s key contributions are summarized as follows.

(1): Extraction and Matching of Multidimensional Spatiotemporal Features in Urban Road Traffic. In this step, we extract intricate traffic features from various sections of urban roads, considering both the temporal and spatial dimensions. Utilizing multidimensional spatiotemporal data, we capture a wide array of attributes that influence traffic conditions in these sections, accounting for their mutual correlations and impacts. These attributes undergo quantification through normalization, and we employ nearest-neighbor matching techniques to mitigate the influence of long-term cyclic patterns on prediction outcomes. Moreover, this approach comprehensively considers the interconnections among these factors, effectively addressing the issue of MNAR in traffic data.
(2): Integration of Feature Matching and 2nd-order HMM for Traffic Status Prediction. To enhance the efficiency and accuracy of traffic status prediction, we adopt a “match first, then predict” strategy within specific time intervals. Within a designated timeframe, we obtain the traffic status for certain preceding timeslices based on spatiotemporal traffic features using feature matching techniques. Building upon this initial matching, we predict the traffic status for subsequent timeslices, utilizing a 2nd-order HMM to forecast forthcoming statuses.

The remainder of this article is structured as follows. Section 2 presents state-of-the-art related studies on traffic status recognition and prediction and imputation techniques. In Section 3, the details of the traffic status prediction method based on the multidimensional feature sequence and 2nd-order HMM are explained. Following this, Section 4 presents the evaluation of the proposed method for traffic status prediction. Finally, Section 5 presents the conclusions and suggestions for future work.

2. Literature Review

State-of-the-art related studies on traffic status recognition and prediction and imputation techniques are discussed in this section.

2.1. Traffic Status Recognition

Traffic status recognition involves various methods and techniques, including traditional statistical approaches, machine learning-based methods, and deep learning-based methods.

Traditional methods for traffic status recognition include statistical and mathematical model-based approaches. Among them, time series analysis is a commonly used method. By analyzing the temporal patterns of traffic data, it is possible to assess traffic flow, congestion levels, etc.—for instance, annual average daily traffic [9]. Another approach is rule-based, where traffic status is determined based on predefined rules and standards. Examples include the volume-to-capacity ratio [10], travel time reliability [2], recurrent congestion index [11], and traffic performance index (TPI) [12]. While these traditional methods are effective in certain scenarios, they are limited by the constraints of rules and may struggle to cope with complex and dynamic traffic situations.

Statistical methods are more suitable for long-term trend analysis of traffic flow, while machine learning models can be used for real-time congestion detection. In recent years, with the advancement of machine learning technology, data-driven machine learning methods have been widely applied in traffic status recognition. Among them, decision trees [13], support vector machines [14], random forests [15], and others are common methods. These approaches learn patterns and rules of traffic status from training data and then apply them to determine new data. Additionally, deep learning models have found relevance in traffic status recognition. Convolutional neural networks (CNNs), long short-term memory (LSTM) networks, recurrent neural networks (RNNs), and improved transformers [16,17,18,19] are deep learning models capable of automatically extracting features from raw data, enabling more accurate recognition. For example, in traffic camera image recognition, CNNs can identify vehicles and pedestrians and assess traffic congestion [16].

Among the methods mentioned for traffic status recognition, deep learning approaches, in contrast to traditional methods, require less involvement of specialized personnel in analyzing the underlying causes of traffic dynamics. This is advantageous for interdisciplinary integration and applications. However, deep learning models exhibit issues related to the interpretability and fuzziness of hyperparameters.

2.2. Prediction and Imputation Algorithms

In cases of data scarcity, reliance on existing spatiotemporal traffic information becomes crucial for data prediction and imputation to ensure accurate analysis and management of traffic conditions. For scenarios involving MCAR and MAR, statistical metrics or regression models can be employed [20]. However, when dealing with MNAR scenarios, more intricate methods are necessary for imputation and prediction, and most approaches for handling MNAR data are based on neural network methods [7]. This includes techniques that combine generative adversarial networks (GANs) [21,22], LSTM [21], CNNs [23], and AGNPs [24] and utilize probabilistic models to estimate the distribution of missing values [25,26]. Moreover, there exists a region-based approach for handling urban similarity matching [27]. This method employs deep learning to compute the similarity of service data between two cities, facilitating the replication of complete data from one city to another. However, this approach typically employs a relatively simplistic method for dissecting road network shapes from map images, which might result in a loss of detail to some extent. Simultaneously, it may not seamlessly integrate with more refined spatiotemporal data information, and its interpretability may not be sufficiently clear.

HMM is a classical sequential modelling method applicable to sequential data with temporal evolution, such as traffic status [28,29,30]. The fundamental concept behind HMM is the temporal transition of status, where each status corresponds to a probability distribution of observed values. This characteristic makes HMM well suited for short-term traffic status prediction, particularly when dealing with temporal relationships and evolutions of status. Additionally, the hidden status in HMM corresponds to the real-world system status, while observed values are associated with each of these statuses. In traffic status prediction, hidden status can represent real-world traffic conditions, providing strong interpretability. Hence, the HMM can effectively model these real-world statuses.

3. Methods

The schematic diagram of the algorithm in this paper is shown in Figure 1. The study first carries out urban road traffic status presetting and assigns road traffic status labels to known data sections. Second, based on the analysis of the temporal and spatial characteristics of the traffic flow of the urban road network, multidimensional spatiotemporal feature extraction is carried out for road traffic. Again, road traffic feature normalization and nearest distance feature matching are carried out for certain partial moments in front of the cycle time period. Finally, based on the status matching results, the 2nd-order HMM traffic status prediction model is utilized to predict the traffic status of the subsequence moments within the time period. The moments here represent the timeslice of data collection, for example, 5 min.

The variables and their interpretations in this paper are shown in Table 1.

3.1. Traffic Status Labeling

For road sections with available traffic data, a status label is assigned to each section based on the TPI. TPI is a comprehensive index that quantitatively assesses the overall operational level of the road, providing a traffic intensity rating from 1 to 5. A higher value indicates more severe congestion. The formula for calculating TPI, used as a predefined indicator for the traffic condition of urban roadways, is as follows [32]:

R_{T}_{s e g m e n t} = {\bar{T}}_{s e g m e n t} / T_{d} = \frac{l_{s e g m e n t} / {\bar{v}}_{s e g m e n t}}{l_{s e g m e n t} / v_{d}} = v_{d} / {\bar{v}}_{s e g m e n t},

(1)

T P I_{s e g m e n t} = F_{l} ({\bar{T}}_{s e g m e n t} / T_{d}) = F_{l} (R_{T}_{s e g m e n t}) = \{\begin{matrix} 0 & R_{T}_{s e g m e n t} \in (0, 1] \\ 4 - 4 / R_{T}_{s e g m e n t} & R_{T}_{s e g m e n t} \in (1, 4 / 3] \\ 4.75 - 5 / R_{T}_{s e g m e n t} & R_{T}_{s e g m e n t} \in (4 / 3, 20 / 11] \\ (17 - 20 / R_{T}_{s e g m e n t}) / 3 & R_{T}_{s e g m e n t} \in (20 / 11, 2.5] \\ 5 - 5 / R_{T}_{s e g m e n t} & R_{T}_{s e g m e n t} \in (2.5, + \infty] \end{matrix}

(2)

Among them, it is generally believed that vehicles in the early morning hours (3:00 a.m.–4:00 a.m.) have the desired speed (km/h), and the specific range should be defined according to the actual situation of the city, for example, the average speed of vehicles in the early morning hours of the urban road network according to the different grades of statistics, as the desired speed of the corresponding grade of the road.

The indicators describing the status of urban road traffic are the classes corresponding to the TPI, the range of which is shown in Table 2.

Specifically, the congestion degree in Table 2 denotes the description of congestion status corresponding to the cases of TPI and R results. In traffic status labeling, the TPI results are used to label the traffic status instead of the textual descriptions of the congestion degree.

3.2. Multidimensional Feature Extraction

Since matrix-structured data can be better used for imputing incomplete datasets [33], this section delves into the spatiotemporal characteristics of traffic flow to extract multidimensional features, thus constructing a spatiotemporal feature matrix that describes the road section’s status.

First, we define the urban road topology unit, as illustrated in Figure 2. In the diagram, the direction of the traffic flow trajectory is one-way. The topology unit comprises the middle road section and its adjacent inflow and outflow road sections. Inflow and outflow road sections are the immediate neighbors of the middle road section. The intersection points between the middle road section and the inflow road sections constitute inflow nodes, while the intersection points with outflow road sections form outflow nodes. Both inflow and outflow nodes are adjacent to the middle road section.

In the urban road traffic operation system, due to the randomness of travel behavior, the vehicle generation on the road is also random, so in each timeslice, the urban road topology units produce their own unique features. Hence, within each topology unit, as depicted in Figure 2, we can construct a feature vector comprising nine features. This feature vector structurally dissects the operational parameters of the spatiotemporal characteristics of traffic flow, as illustrated in Figure 3.

The operational parameters shown in Figure 3 include static and dynamic parameters. The static parameters include topological structure parameters and road section parameters. Topological structure parameters refer to the parameters contained in the topological structure with directional connectivity formed by the traffic flow shown in Figure 3, including two features: neighboring levels and the number of neighboring road sections; road section parameters refer to the attribute parameters of the road section itself, including three features: road class, road section length, and the number of road section lanes. On the other hand, the dynamic parameters include time series as well as traffic parameters. Among them, the traffic parameter refers to the parameter that describes the vehicle operation performance of the road section, which contains three features: road section flow, road section average speed, and road traffic status. It should be noted that the distinction between weekdays and non-weekdays needs to be considered in the temporal aspect [34].

Next, by considering the present road section as the middle road section and incorporating the features from neighboring road sections as composite attributes, these features can be interwoven. Consequently, each road section slated for prediction can be formulated as a feature matrix with dimensions of

n \times 10

, encompassing the road section identifier. This is articulated as follows:

P = {[T S, A H, R N, R C, L, L S, F, V, S, L i n k I D]}_{n \times 10},

(3)

where the road section number

L i n k I D

is used as the tail of the multidimensional feature sequence, which is not involved in the nearest distance calculation but only has the role of retrieval during data matching. The traffic status of the middle road is used as one of the matching features and the result set. When it has data, it is used as the matching dataset, and when it is the road section to be predicted, it is used as the matching feature. When used as a feature for matching, if there are missing data, the missing element in the matrix is denoted by NULL.

The feature matrix shown in Equation (3) that describes the traffic status of a topology unit consists of multiple feature vectors, where the matrix elements represent the factors that affect the traffic status.

3.3. Feature Normalization and Matching

During the matching process, the combination possibility of each static feature of the topology unit grows exponentially; the combination of static and dynamic features may make the matching dataset explode exponentially, which is not conducive to distinguishing the effects of different feature sequences on the matching results. In the case of constant matching accuracy, to improve the matching efficiency, each feature data point is normalized to make the road traffic feature sequences comparable. A linear method is used to complete the normalized mapping in the interval [0, 1]. Hence, the values of the static and dynamic features, the corresponding intervals, and the normalized values are shown in Table 3.

Among them, considering the timeliness problem of real-time matching, the time data of other algorithms’ searches are generally limited to the current and previous timeslice when status matching is performed, and the factor that the traffic flow changes according to the cycle is not taken into account. We use time series as the foundation for storing the traffic flow cycle, expand this time series into three dimensions of time data, organized as day/hour/minute, and store them in a normalized format. As a result, the formula presented in Equation (3) expands to 12 dimensions. The independence of these features takes into account the proximity factor of day and hour, enhancing the fault tolerance of the algorithm.

The spatiotemporal feature sequence matching of urban roads is a real-time network-wide matching in timeslice to obtain the traffic status of the road section with the closest distance, i.e., the highest similarity. On the basis of feature normalization, we use the Euclidean distance to match the features of road sections without detection data to obtain the status prediction of the road sections to be predicted at the current moment. This Euclidean distance is calculated as follows:

d (P_{L i n k 1}, P_{L i n k 2}) = \sqrt{\sum_{i = 1}^{N_{P}} \sum_{j = 1}^{12} {(p_{1, i j} - p_{2, i j})}^{2}},

(4)

where, in the matrix

P_{L i n k 1}, P_{L i n k 2}

, the

L i n k I D

column is only used as a data location label, not as a parameter for the Euclidean distance calculation, so the dimension of the feature matrix involved in the Euclidean distance calculation is

n

× 11, and

n

is determined based on the number of neighboring road sections of the road section to be predicted. In the above distance calculation, the first judgment is whether the dimensions are equal or not, and the above matrix operation can be executed only if the dimensions are equal; on the other hand, the equal dimensions imply that the number of adjacent segments is roughly the same, i.e., the two topology units to be matched have a basically similar spatial structure. Therefore, Equation (4) can describe the distance relationship between the factors affecting the traffic status and solve the MNAR problem effectively.

In the specific algorithm implementation, the matching of the nearest distance can be based on the

n

× 11 dimensional matrix of the whole road network matching, i.e., the topology unit feature matrix

P_{predicted}

of the current road section to be predicted and the road section topology unit matrices of all the known data, to calculate the distance and compare the nearest distance to obtain the matching result. This method seems to be effective in smaller cities. However, in large-scale cities, especially megacities, whole road network matching is less efficient due to the large volume of traffic trips. Therefore, we propose two matching strategies.

Topology-first matching strategy. First, calculate the closest set of topology units corresponding to the topology unit in which the current road section to be predicted is located. Second, calculate and extract the collection of time series data in the closest set of topology units that is closest to the current timeslice. Extract the average of all the traffic statuses of the middle road section from the matched set of data to calculate the average value of the average traffic status as the result of the matched traffic status.

Time-first matching strategy. First, calculate the closest time series dataset with the current timeslice; the closest topology unit set corresponding to the topology unit in which the current road section to be predicted is located; second, calculate and extract the closest topology unit set corresponding to the topology unit in which the current road section to be predicted is located in the closest time series dataset; extract all the traffic statuses of the middle road section from the matched-to dataset; calculate the average traffic status value as the result of matching to the traffic status. The average value of all the traffic statuses of the middle road section is extracted from the matched dataset, and the average traffic status value is used as the result of the matched traffic status.

Since the topology is a static feature, the unchanged static features can be saved to construct the topology unit search table during the computation process, so we adopt the topology-first matching strategy.

Algorithm 1 depicts the proposed feature matching algorithm. In Algorithm 1, the set of closest topology units is the set of

R C

,

R N

of inflow,

R N

of outflow, and the feature matrix

P_{matched}^{topology}

with the smallest difference in the road length L of the road section, taking the result with the smallest standard deviation of the first 5; the set of closest time series data is the set of feature matrices

P_{matched}^{time}

with the smallest day, hour, and minute differences, with the threshold set to 0 (s).

Algorithm 1 Topology-first Feature Matching Algorithm
Input:
$P_{a l l}$ : Feature matrix of all topology units
$L i n k I D_{predicted}$ : Section number of the section to be predicted
$T S_{predicted}$ : Timeslice to be predicted
Output:
$S$ : Traffic status matched to the section to be predicted
1:	$Retrieve the feature matrix P_{predicted}$ $corresponding to the target road \sec tion L i n k I D_{predicted}$
2:	$/ / Search for the topology units in the feature matrix set P_{a l l}$ $that have similar road \sec tion features as P_{predicted}$ :
3:	$Initialize an empty list similar_topologies and P_{topology}$ as None.
4:	for $each matrix in P_{a l l}$ :
5:	Calculate the standard deviation of $A H, R N, R C, L, L S$ $between P_{predicted}$ $and P_{a l l}$
6:	similar_topologies ← the matrix and the calculated standard deviation
7:	end for
8:	$P_{topology}$ ← top-5 matrices of similar_topologies with the smallest standard deviation
9:	$/ / Search for the matrix in P_{topology}$ $that best matches the timeslice T S$ $of P_{predicted}$ :
10:	$Initialize P_{time}$ as None and min_time_diff as 0 s
11:	for $each matrix in P_{topology}$ :
12:	Calculate the standard deviation between $T S_{predicted}$ $and T S$ $of P_{topology}$
13:	if the standard deviation < min_time_diff then:
	$P_{time}$ ← the matched matrix and the calculated standard deviation
14:	end if
15:	end for
16:	Calculate the average traffic status for middle road sections in the matrix $P_{time}$
17:	$S$ ← the average traffic status
18:	return S

3.4. Traffic Status Prediction Based on 2nd-Order HMM

After matching the traffic status of the section to be predicted at the nearest current moment, the traffic status at subsequent moments needs to be predicted. A feasible approach is to estimate the traffic status of the undocumented road section in the next moment cycle using the feature matching method described above. However, due to the exponential nature of the feature vectors and the large size of the road network, continuous status matching will cause the problem of extremely time-consuming matching occupancy. Moreover, considering that the traffic status at the current moment does not change suddenly, its change is time recursive, i.e., the traffic status at the current moment is not only related to the previous moment but also related to the moment before the previous moment. Thus, in our algorithm, after obtaining the matching status of the road section to be predicted at the first two moments, we use the 2nd-order HMM to predict the traffic status of the no-data road section at the subsequent moments.

Two basic assumptions are first made about the 2nd-order HMM traffic status prediction model:

Assumption 1.

Assume that the hidden Markov chain at any moment t depends only on the status of its first two moments, independent of the status and observations at other moments, and independent of the moment t.

Assumption 2.

The model is observationally independent, i.e., it is assumed that the observation at any moment depends only on the status at that moment, independent of other observations and status.

Next, the variables are defined. Then,

Q = \{q_{i}\} = \{0, 0.25, 0.5, 0.75, 1\}, i = 1, 2, 3, 4, 5 .

(5)

In the case of velocity for observation

V

, for example, since it is difficult to exhaust the observations of velocity, it is thus assumed that

V

obeys a normal distribution

V \sim N (μ, σ)

based on the statistical properties of velocity, and the mean

μ

and variance

σ

of this normal distribution are obtained based on the calibration of the velocity of a known data section. Then, an HMM-based traffic status prediction model can be defined in terms of

λ = (Π, A, B)

. Then, the traffic status sequence

T P I_{sequence}

of the section to be predicted is

T P I_{sequence} = (I_{1}, I_{2}, \dots, I_{t}, \dots, I_{T}) .

(6)

The sequence of observations

O

of the speed of the section to be predicted:

O = (o_{1}, o_{2}, \dots, o_{t}, \dots, o_{T}) .

(7)

Initial traffic status probability vector

Π

:

Π = P (I_{1} = q_{i}), i = 1, 2, 3, 4, 5,

(8)

where, in practical applications, the known observed data sections are categorized according to the road classification, and the initial status probabilities corresponding to the frequency estimates of different traffic statuses are counted on each road classification.

Traffic status transfer probability matrix

A

: for the section traffic status in the set

Q

with the frequency of status transfer estimated status transfer probability, the current status

q_{k}

at the moment t by

t - 2

and

t - 1

two moments of the status

q_{i}

and

q_{j}

decision. At this point, the status transfer probability matrix A is a three-dimensional matrix of size 5 × 5 × 5 with matrix elements:

a_{j k} = P (I_{t} = q_{k} | I_{t - 1} = q_{j}, I_{t - 2} = q_{i}), i, j, k = 1, 2, 3, 4, 5 .

(9)

Theoretically, the speed observation probability matrix

B

has a many-to-one relationship with the set of all possible traffic statuses

Q

. Observations that are in a certain interval are bound to fall into a particular status. However, this is not the case. For various possible reasons, the expectation and variance of the observed data for different road sections of the same road class/topology are different, and, thus, the observed data for different road sections will exhibit different data attributes. Thus, when the status of a road section with known data is predicted using the road section with unknown data, the prediction may not be in line with people’s actual feelings. To try to avoid this, we need to estimate the correspondence between observations and status. Specifically, the observed data are first counted for the same topology section to find the desired speed at 3:00–4:00 a.m. for that topology type of roadway. Next, the speed intervals corresponding to each traffic status are calculated based on this desired speed. Again, for all observations and traffic status pairs

(O, I)

of the same topological section, the probability that

O

falls in the speed interval corresponding to

I

is calculated when

I

is equal. Finally, the sequence consisting of the probabilities for each status is the observation probability matrix. Then, since a particular observed speed corresponds to a particular interval of traffic status,

B

is a one-dimensional vector, and each probability represents the probability of the expected value of the speed over the traffic status.

As shown in Figure 4, to effectively describe the temporal relationship of traffic flow operation, based on the traffic status of the previous two moments

t - 2

and

t - 1

, the forward probability of the observed sequence

O

is calculated by the probability

P (I_{t + 1} | I_{t}, I_{t - 1})

of the occurrence of each hidden status at the current

t

moments and the observation probability

P (o_{t + 1} | I_{t})

. The hidden status corresponding to the maximum forward probability is used as the prediction result of the traffic status. Therefore, the 2nd-order HMM traffic status prediction problem is a maximum forward probability estimation problem with known 2nd-order HMM model parameters and observation sequences.

Then, the traffic status prediction problem at the current moment t is expressed in terms of the forward probability formula with the forward probability of the initial 1st moment (1st timeslice):

α_{1} (i) = P (o_{1} | λ) = Π_{i} b_{i} (o_{1}), i = 1, 2, 3, 4, 5 .

(10)

According to Assumption 2, the observation at the current moment is only related to the status at the current moment, so the forward probability at the 2nd moment (2nd timeslice):

\begin{matrix} α_{2} (j) & = P (o_{2} | λ) \\ = [\sum_{i = 1}^{5} α_{1} (i) a_{i j}] b_{j} (o_{2}) . \\ i, j & = 1, 2, 3, 4, 5 \end{matrix}

(11)

The recursive formula for forward probability at moment

t

(

t \geq 3

):

\begin{array}{l} α_{t} (k) = [\sum_{j = 1}^{5} \sum_{i = 1}^{5} α_{t - 2} (i) α_{t - 1} (j) a_{i j k}] b_{k} (o_{t}) \\ i, j, k = 1, 2, 3, 4, 5 \end{array} .

(12)

Then, the maximum forward probability at the current moment

T

:

\begin{matrix} P (o_{T} | λ) = \arg \max α_{T} (k) \\ k = 1, 2, 3, 4, 5 \end{matrix} .

(13)

Then, the implied status corresponding to the maximum forward probability of the current moment

T

is the status prediction result of the current moment, which can be obtained from the maximum implied status corresponding to the observation

o_{t}

according to Equation (13), which is the traffic status of the current moment

T

according to the traffic status prediction of

t - 2

and

t - 1

.

4. Experiment

The experimental data were obtained from the public data of the Shenzhen government, which lacks data on section traffic volume and number of lanes, both of which are correlated with section speeds, and the dataset provides a context for the NMAR problem. We divided the dataset into a 70% training set and a 30% test set; the training set was used for matching objects for the matching algorithm and the database for training the 2nd-order HMM, and the test set was used for performing the precision. The data used for the experiment were preprocessed by cleaning and multisource fusion. The experiments were all conducted under the same computer hardware conditions, which were configured with a 12th Gen Intel (R) Core (TM) i7-1255U CPU at 1.70 GHz, 16 GB of RAM; Windows was the development platform, and the code was implemented in Python.

The experiment took Section 97301 (Middle Dongmen Road, Shenzhen, China, from Shennan East Road to Wenjin North Road) and its neighboring sections in the Shenzhen City Data as data examples. The topology unit of Section 97301 is shown in Figure 5; the inflow road section IDs are 87402, 78201, and 79401, and the outflow road section IDs are 84103, 84203, and 96301.

4.1. Traffic Status Labeling and Road Feature Construction

The experiment contains data on traffic operations from 14 June 2021 to 20 June 2021, for one week. Based on the above data, the one-week early morning (3:00~4:00 a.m.) traffic speeds of different road classifications were counted, and the average early morning traffic speed was calculated as the desired speed for each road classification. The calculation results are shown in Table 4.

The desired speed and 5 min average speed data of Section 97301 and its neighboring sections were substituted into the TPI calculation formula to obtain the corresponding traffic status labeling, and the range of traffic status preset values corresponds to Table 2. Therefore, the 24 h traffic condition presets for the seven sections (Section 97301 and its neighboring section) for typical weekday 17 June 2021 and typical non-weekday 19 June 2021 are shown in Figure 6.

From Figure 6, it can be seen that the traffic flow in the neighboring sections of Section 97301 has different patterns of change: the inflow section peaks in the morning peak on weekdays and in the evening peak on non-weekdays, and most of the outflow section is in the evening peak.

After obtaining the traffic status of the road sections with data, the multidimensional features of each road can be constructed based on the known multidimensional features, and, in this way, the multidimensional feature matching can be used to obtain the traffic status of the road section in the first 20 min in a 1 h period. The road sections in the road network were constructed into the multidimensional feature sequence of urban roads shown in Figure 3 and represented in matrix form. The constructed feature sequence of the road section was the sequence with missing flow (midstream), speed (midstream), and traffic status (midstream) data, i.e., missing traffic flow parameters.

4.2. Traffic Status Matching and 2nd-Order HMM Prediction

Specific data-matching experiments were conducted by assuming that flow or speed data were missing for Section 97301. By using the available temporal and static parameters, the traffic status of the first four timeslices of the road section was obtained by nearest distance matching, and the coverage of each timeslice was 5 min. Second, 2nd-order HMM predictions of traffic status at future timeslices were performed. Finally, the TPI obtained from matching and prediction was graded according to Table 2. It determined whether the recent status of the query result was consistent with the true status of the road section; if it was consistent, it was considered correct; otherwise, it was incorrect to calculate the correctness rate of this method. In this case, to improve the overall prediction efficiency, the matching phase was set to be performed over the first four timeslices of each hour; the second eight timeslices of each hour were set to be predicted using the 2nd-order HMM, and then the matching of the traffic status was returned to in the next hour. That is, in this experiment, for the first 20 min of 1 h, the traffic status was obtained by matching, and the last 40 min were obtained by prediction.

Taking one week’s detection data in Shenzhen as the experimental data, there are approximately 1 million pieces of data per day on average; after matching the daily data to the road sections according to the detector equipment number and other information, the method described in this paper was applied to carry out experiments on Section 97301 on a typical working day and a typical nonworking day for two days. Considering the stability of static features, the experiment adopted the “topology-first matching strategy” for status matching, and the information of the nearest topology set is shown in Table 5.

To reduce the matching time consumption and improve the prediction efficiency, the subsequent periods within 1 h were predicted using a 2nd-order HMM. In the prediction stage using a 2nd-order HMM, in cases where data were assumed missing for Section 97301, the status of the section with complete data for the corresponding road classification in the road network was utilized for statistical purposes. The initial status probability statistics for the secondary trunk road in the road network are shown in Table 6. The probabilities shown in Table 6 were used as the initial status probabilities for Section 97301.

The traffic status transfer probability matrix

A

was also based on the data of the secondary trunk road in the road network, which was a three-dimensional matrix, as shown in Figure 7. In Figure 7, the darker the color is, the lower the weight value. The probability of transferring from status

i

at moment

t - 2

and status

j

at moment

t - 1

to status

z

at the current moment was the greatest.

In addition, the speed intervals were solved based on the desired vehicle speed inversely to the TPI formula:

v_{O} = \{\begin{matrix} [0, v_{d} / 5) \\ [v_{d} / 5, 2 v_{d} / 5) \\ [2 v_{d} / 5, 11 v_{d} / 20) \\ [11 v_{d} / 20, 3 v_{d} / 4) \\ [3 v_{d} / 4, + \infty) \end{matrix} \begin{matrix} , Smooth \\ , Basic Smooth \\ , Light Congestion \\ , Moderate Congestion \\ , Severe Congestion \end{matrix}, v_{d} = 28.61 (k m / h),

(14)

where

v_{d}

is the early-morning speed statistics for all road sections in the road network with a secondary trunk road class. From there, the speed data of the corresponding road class in the whole road network were used to calculate the observation probability corresponding to each status, and the frequency was used to estimate the probability:

B = {(\begin{matrix} 0.568 & 0.252 & 0.113 & 0.054 & 0.013 \end{matrix})}^{T} .

(15)

The experimental results are shown in Figure 8. In the Shenzhen City dataset, the method predicts the results of the traffic status with a certain degree of error, but the accuracy rate reached 88.866%, and the difference between the misjudgment data is within one level, which will not cause a large psychological gap to the traffic management department or drivers in the process of practical application.

4.3. Comparison Experiment

In the evaluation, we evaluated the model by utilizing the Shenzhen City dataset according to the road classification and calculated the average status recognition precision for each road classification. The evaluation formula is as follows:

P r e c i s i o n = 1 - \frac{\sum_{K} {(\bar{S} - S_{k})}^{2}}{{(\bar{S} - S_{k})}^{2}},

(16)

where

K

denotes the number of road sections of each road class in the test set.

S_{k}

denotes the corresponding status result of the road section in the test set.

To evaluate the method in this paper, we compared it with a hidden Markov models and contrast measure (HMM-C) [29]. The experiment applied the HMM-C method to the dataset of Shenzhen City and calculated the precisions, as shown in Table 7. Among them, the original HMM-C method did not divide the dataset into road classification, and the experiment was artificially divided and experimented by road classification in batches to obtain the corresponding evaluation results (the average of the results on the timeslice on 17 June 2021 and 19 June 2021).

From Table 7, we can see that our method has higher precision in the case of higher road classification. By analysis, we found that the lower precision of secondary trunk roads may be caused by not including the travel time occupancy of inflow or outflow nodes in the traffic status influencing factors. Traffic signal timing is an important factor affecting the traffic status in cities, and node travel time occupancy is an important constraint for traffic flow from the inflow road section to the outflow road section; therefore, in our algorithm, the precision is smaller for trunk and secondary trunk roads with more nodes. The HMM-C method can measure the overall precision of traffic due to the consideration of the “traffic contrast”; it can measure the overall change level of traffic. Thus, the precision fluctuation is smaller in HMM-C, but its performance is weaker than our algorithm in the case of highway classification.

5. Conclusions

In this paper, by analyzing the influencing factors of traffic flow in urban road networks, we constructed an urban traffic status prediction model based on spatiotemporal feature sequences to solve the sustainable MNAR problem in cities. First, through the urban road TPI calculation formula, the road network has dynamic traffic parameter data sections assigned traffic status labels. Then, the influencing factors of urban road traffic status are analyzed, and the feature sequences of urban road traffic status are extracted and encoded as multidimensional feature sequences of road traffic status. Status matching is again performed by nearest distance matching for the first four timeslices of the traffic status of the road section without dynamic traffic records. Additionally, to improve the operational efficiency of the method, a 2nd-order HMM is used to predict the traffic status at future moments. To reduce the cumulative error of the 2nd-order HMM prediction, the prediction cycle is 1 h, and the nearest distance matching method is reused in the next cycle to obtain the traffic status matching results at four timeslices. Then, the 2nd-order HMM is utilized for cyclic prediction. By using the urban traffic data of Shenzhen City for experiments, the results show that the multidimensional feature sequence matching model for urban roads can not only obtain the traffic status of the road sections without detector data and improve the intelligent control range of the urban road network, but it also has good performance with a high discrimination rate of the traffic status and a low severity of misjudgment.

In future studies, the following will be further investigated:

(1) The signal timing effects at intersections are not considered in our study. However, signal timing is an important constraint on trunk and secondary trunk roads in urban road networks. In the next step, signal timing factors will be introduced to construct a traffic status prediction model.

(2) We adopt topology matching to consider the effect of the spatial structure of the road network on traffic status. On the other hand, traffic interruptions and the land aggregation effect (the POI (point of interest) of traffic trips) will have an impact on the direction of traffic trips, thus indirectly affecting the traffic status. In the next step, we will collect interruption data (traffic accidents, traffic occupancy construction, etc.) or urban travel POI data (cell phone signaling density data, etc.) and constrain the matching by combining it with the established spatial road network structure to further improve the matching accuracy.

(3) City size is an important constraint on matching efficiency, and, in the next study, we will further consider the effect of city size on the model.

Author Contributions

Conceptualization and methodology, F.L.; validation and formal analysis, J.C.; writing—original draft preparation, review, and editing, F.L. and J.C.; supervision, K.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Key R&D Program of Ningxia grant number 2023BBF01004.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from the Shenzhen Government Data Open Platform Dataset and are available at https://opendata.sz.gov.cn/data/dataSet/toDataDetails/29200_03103690 (accessed on 1 August 2023) with the permission of the Shenzhen Government.

Acknowledgments

We thank the staff who organized the dataset for public consumption.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sharma, A.; Sharma, A.; Nikashina, P.; Gavrilenko, V.; Tselykh, A.; Bozhenyuk, A.; Masud, M.; Meshref, H. A Graph Neural Network (GNN)-Based Approach for Real-Time Estimation of Traffic Speed in Sustainable Smart Cities. Sustainability 2023, 15, 11893. [Google Scholar] [CrossRef]
Wang, J.; Wang, C.; Lv, J.; Zhang, Z.; Li, C. Modeling Travel Time Reliability of Road Network Considering Connected Vehicle Guidance Characteristics Indexes. J. Adv. Transp. 2017, 2017, 2415312. [Google Scholar] [CrossRef]
Van Buuren, S. Flexible Imputation of Missing Data, 1st ed.; Chapman and Hall/CRC: New York, NY, USA, 2012; ISBN 978-0-429-06540-8. [Google Scholar]
Chen, X.; He, Z.; Wang, J. Spatial-Temporal Traffic Speed Patterns Discovery and Incomplete Data Recovery via SVD-Combined Tensor Decomposition. Transp. Res. Part C Emerg. Technol. 2018, 86, 59–77. [Google Scholar] [CrossRef]
Bae, B.; Kim, H.; Lim, H.; Liu, Y.; Han, L.D.; Freeze, P.B. Missing Data Imputation for Traffic Flow Speed Using Spatio-Temporal Cokriging. Transp. Res. Part C Emerg. Technol. 2018, 88, 124–139. [Google Scholar] [CrossRef]
Li, L.; Zhang, J.; Wang, Y.; Ran, B. Missing Value Imputation for Traffic-Related Time Series Data Based on a Multi-View Learning Method. IEEE Trans. Intell. Transp. Syst. 2019, 20, 2933–2943. [Google Scholar] [CrossRef]
Sun, Y.; Li, J.; Xu, Y.; Zhang, T.; Wang, X. Deep Learning versus Conventional Methods for Missing Data Imputation: A Review and Comparative Study. Expert Syst. Appl. 2023, 227, 120201. [Google Scholar] [CrossRef]
Soumare, H.; Benkahla, A.; Gmati, N. Deep Learning Regularization Techniques to Genomics Data. Array 2021, 11, 100068. [Google Scholar] [CrossRef]
Harleman, M.; Harris, L.; Willis, M.D.; Ritz, B.; Hystad, P.; Hill, E.L. Changes in Traffic Congestion and Air Pollution Due to Major Roadway Infrastructure Improvements in Texas. Sci. Total Environ. 2023, 898, 165463. [Google Scholar] [CrossRef]
Janwari, M.M.; Tiwari, G.; Popli, S.K.; Mir, M.S. Traffic Analysis of Srinagar City. Transp. Res. Procedia 2016, 17, 3–15. [Google Scholar] [CrossRef]
Ukam, G.; Adams, C.; Adebanji, A.; Ackaah, W. Factors Affecting Paratransit Travel Times at Route and Segment Levels. Int. J. Transp. Sci. Technol. 2023. [Google Scholar] [CrossRef]
Cui, S.; Gu, X.; Xie, W.; Wu, D. Research on Cold Chain Routing Optimization of Multi-Distribution Center Considering Traffic Performance Index. Procedia Comput. Sci. 2023, 221, 1343–1350. [Google Scholar] [CrossRef]
Tamir, T.S.; Xiong, G.; Li, Z.; Tao, H.; Shen, Z.; Hu, B.; Menkir, H.M. Traffic Congestion Prediction Using Decision Tree, Logistic Regression and Neural Networks. IFAC Pap. 2020, 53, 512–517. [Google Scholar] [CrossRef]
Saleem, M.; Abbas, S.; Ghazal, T.M.; Adnan Khan, M.; Sahawneh, N.; Ahmad, M. Smart Cities: Fusion-Based Intelligent Traffic Congestion Control System for Vehicular Networks Using Machine Learning Techniques. Egypt. Inform. J. 2022, 23, 417–426. [Google Scholar] [CrossRef]
Afandizadeh Zargari, S.; Amoei Khorshidi, N.; Mirzahossein, H.; Heidari, H. Analyzing the Effects of Congestion on Planning Time Index—Grey Models vs. Random Forest Regression. Int. J. Transp. Sci. Technol. 2023, 12, 578–593. [Google Scholar] [CrossRef]
Gao, Y.; Li, J.; Xu, Z.; Liu, Z.; Zhao, X.; Chen, J. A Novel Image-Based Convolutional Neural Network Approach for Traffic Congestion Estimation. Expert Syst. Appl. 2021, 180, 115037. [Google Scholar] [CrossRef]
Guo, J.; Liu, Y.; Yang, K.Q.; Wang, Y.; Fang, S. GPS-Based Citywide Traffic Congestion Forecasting Using CNN-RNN and C3D Hybrid Model. Transp. A Transp. Sci. 2021, 17, 190–211. [Google Scholar] [CrossRef]
Narmadha, S.; Vijayakumar, V. Spatio-Temporal Vehicle Traffic Flow Prediction Using Multivariate CNN and LSTM Model. Mater. Today Proc. 2023, 81, 826–833. [Google Scholar] [CrossRef]
Zheng, W.; Yang, H.F.; Cai, J.; Wang, P.; Jiang, X.; Du, S.S.; Wang, Y.; Wang, Z. Integrating the Traffic Science with Representation Learning for City-Wide Network Congestion Prediction. Inf. Fusion 2023, 99, 101837. [Google Scholar] [CrossRef]
Fowe, A.J.; Chan, Y. A Microstate Spatial-Inference Model for Network-Traffic Estimation. Transp. Res. Part C Emerg. Technol. 2013, 36, 245–260. [Google Scholar] [CrossRef]
Weerakody, P.B.; Wong, K.W.; Wang, G. Cyclic Gate Recurrent Neural Networks for Time Series Data with Missing Values. Neural. Process Lett. 2023, 55, 1527–1554. [Google Scholar] [CrossRef]
Yang, B.; Kang, Y.; Yuan, Y.; Huang, X.; Li, H. ST-LBAGAN: Spatio-Temporal Learnable Bidirectional Attention Generative Adversarial Networks for Missing Traffic Data Imputation. Knowl. Based Syst. 2021, 215, 106705. [Google Scholar] [CrossRef]
Guo, Z.; Yang, C.; Wang, D.; Liu, H. A Novel Deep Learning Model Integrating CNN and GRU to Predict Particulate Matter Concentrations. Process Saf. Environ. Prot. 2023, 173, 604–613. [Google Scholar] [CrossRef]
Xu, M.; Di, Y.; Ding, H.; Zhu, Z.; Chen, X.; Yang, H. AGNP: Network-Wide Short-Term Probabilistic Traffic Speed Prediction and Imputation. Commun. Transp. Res. 2023, 3, 100099. [Google Scholar] [CrossRef]
Haliduola, H.N.; Bretz, F.; Mansmann, U. Missing Data Imputation Using Utility-Based Regression and Sampling Approaches. Comput. Methods Programs Biomed. 2022, 226, 107172. [Google Scholar] [CrossRef] [PubMed]
Huang, L.; Li, Z.; Luo, R.; Su, R. Missing Traffic Data Imputation with a Linear Generative Model Based on Probabilistic Principal Component Analysis. Sensors 2023, 23, 204. [Google Scholar] [CrossRef]
Wang, L.; Geng, X.; Ma, X.; Liu, F.; Yang, Q. Cross-City Transfer Learning for Deep Spatio-Temporal Prediction. arXiv 2018, arXiv:1802.00386. [Google Scholar]
Qi, Y.; Ishak, S. A Hidden Markov Model for Short Term Prediction of Traffic Conditions on Freeways. Transp. Res. Part C Emerg. Technol. 2014, 43, 95–111. [Google Scholar] [CrossRef]
Zaki, J.F.; Ali-Eldin, A.; Hussein, S.E.; Saraya, S.F.; Areed, F.F. Traffic Congestion Prediction Based on Hidden Markov Models and Contrast Measure. Ain Shams Eng. J. 2020, 11, 535–551. [Google Scholar] [CrossRef]
Raskar, C.; Nema, S. Metaheuristic Enabled Modified Hidden Markov Model for Traffic Flow Prediction. Comput. Netw. 2022, 206, 108780. [Google Scholar] [CrossRef]
Wang, Y.; Chen, Y.; Li, G.; Lu, Y.; He, Z.; Yu, Z.; Sun, W. City-Scale Holographic Traffic Flow Data Based on Vehicular Trajectory Resampling. Sci. Data 2023, 10, 57. [Google Scholar] [CrossRef]
Chen, J.; Hu, Z.; Li, F. An Estimation Method of Traffic Flow State Based on Matching of Temporal-spatial Feature Sequences. J. Transp. Inf. Saf. 2021, 39, 68–76+120. [Google Scholar]
Tang, J.; Zhang, G.; Wang, Y.; Wang, H.; Liu, F. A Hybrid Approach to Integrate Fuzzy C-Means Based Imputation Method with Genetic Algorithm for Missing Traffic Volume Data Estimation. Transp. Res. Part C Emerg. Technol. 2015, 51, 29–40. [Google Scholar] [CrossRef]
Duan, Y.; Lv, Y.; Liu, Y.-L.; Wang, F.-Y. An Efficient Realization of Deep Learning for Traffic Data Imputation. Transp. Res. Part C Emerg. Technol. 2016, 72, 168–181. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the algorithm in this paper.

Figure 2. Morphology schematic diagram of the urban road topology unit.

Figure 3. Multidimensional feature sequence delineation of urban road sections.

Figure 4. Schematic diagram of 2nd-order HMM-based traffic status prediction.

Figure 5. Topological relationships of Section 97301 and its neighboring sections.

Figure 6. Typical weekday and non-weekday one-day traffic conditions on some sections. (a) Typical weekday one-day traffic conditions on some sections; (b) Typical non-weekday one-day traffic conditions on some sections. The horizontal coordinate indicates the timeslice formed after dividing the day by every 5 min, and there are 288 timeslices in 24 h.

Figure 7. Status transfer probability matrix for secondary trunk roads.

Figure 8. Typical weekday vs. typical non-weekday traffic condition congestion level comparison for Section 97301. (a) Typical weekday traffic condition congestion level; (b) Typical non-weekday traffic condition congestion level.

Table 1. Variables and their interpretations.

Variables	Interpretations
${\bar{v}}_{s e g m e n t}$	The average speed of the section at the current moment, taking the average of the sample speeds within the timeslice, km/h.
$l_{s e g m e n t}$	The length of the road section, m.
$R_{T}_{s e g m e n t}$	The ratio of section travel time within the timeslice.
${\bar{T}}_{s e g m e n t}$	The actual travel time on the section within the timeslice, s.
$T_{d}$	The travel time at the desired vehicle speed $v_{d}$ , s.
$T P I_{s e g m e n t}$	The traffic performance index of the road section in timeslice.
$F_{l} (\cdot)$	The conversion relation.
$L i n k I D$	The road section number from which each feature sequence was generated to perform a road network topology search in a GIS mapping system.
$n$	The number of all road sections grouped into a topology unit, arranged in the order of inflow road sections-current road-outflow road sections, and the data in each inflow road section or outflow road section are sorted by number order of $L i n k I D$ .
$T S$	The time series.
$A H$	Adjacency hierarchy. Since the strong correlation at =1 has been demonstrated [31], and in order to reduce the matrix dimension, we only consider the influence of the adjacent level of road traffic on the current road section to be predicted; therefore, this study adopts the road sections that are adjacent to each other at the first level of the road network topology as the features in the adjacency hierarchy part, i.e., the adjacency hierarchy is labeled as 1.
$R N$	Road sections number, means the number of adjacent road sections.
$R C$	Road classification, means the urban road classification of road section, including highway, expressway, trunk road, secondary trunk road, etc.
$L$	Road length.
$L S$	Lanes number of road section.
$F$	Traffic flow of road sections within timeslice.
$V$	Average speed within timeslice of road section.
$S$	Traffic status within timeslice of road section.
$q_{1} ~ q_{5}$	The five congestion degrees in Table 2.
$V$	The set of all possible observations, which in our algorithm can be either traffic flow or velocity.
$Q$	The set of all possible traffic statuses.
$Π$	The initial probability of each traffic status $q_{1} ~ q_{5}$ , estimated with a frequency on the set $Q$ .
$A$	The status transfer probability matrix.
$B$	The observation probability matrix.
$a_{i j}$ or $a_{j k}$	The probability that the first moment status is transferred to the second moment status in $A$ .

Table 2. Classification of road traffic performance index [32].

TPI	R	Congestion Degree
[0, 1]	(0, 4/3]	Smooth
(1, 2]	(4/3, 20/11]	Basic Smooth
(2, 3]	(20/11, 2.5]	Light Congestion
(3, 4]	(2.5, 5]	Moderate Congestion
(4, 5]	(5, +∞)	Severe Congestion

Table 3. Normalized table of multidimensional spatiotemporal features.

Features	Values	Topology Structure	Normalized Value
Time series	Weekday Non-weekday	—	0.5, 1
	light periods rush periods	—	0.5, 1
	1st 5 min 2nd 5 min 3rd 5 min 4th 5 min (in an hour)	—	0.25, 0.5, 0.75, 1
Number of adjacent road sections	0, 1, 2, 3, 4	Inflow road sections	0, 0.25, 0.5, 0.75, 1
Number of adjacent road sections	0, 1, 2, 3, 4	Outflow road sections	0, 0.25, 0.5, 0.75, 1
Urban road classification	Highway Freeway Trunk Road Secondary Trunk Road	Inflow road sections	0.25, 0.5, 0.75, 1
		Middle road sections
		Outflow road sections
Road length	—	Inflow road sections	A linear method was used to normalize with $[0, l_{\max}]$ ¹.
		Middle road sections
		Outflow road sections
Lanes number of road section	0, 1, 2, 3, 4, 5	Inflow road sections	0, 0.2, 0.4, 0.6, 0.8, 1
		Middle road sections
		Outflow road sections
Traffic flow of road section	—	Inflow road sections	A linear method was used to normalize with $[0, C_{\max}]$ ².
Traffic flow of road section	—	Outflow road sections
Average speed of road section	—	Inflow road sections	A linear method was used to normalize with $[0, v_{\max}]$ ³.
Average speed of road section	—	Outflow road sections
Road traffic status of road section	Smooth Basic Smooth Light Congestion Moderate Congestion Severe Congestion	Inflow road sections	0, 0.25, 0.5, 0.75, 1
Road traffic status of road section		Outflow road sections	0, 0.25, 0.5, 0.75, 1

¹

l_{\max}

represents the road maximum length. ²

C_{\max}

represents the road maximum capacity. ³

v_{\max}

represents the road speed limits.

Table 4. Desired speeds for different classifications of road sections.

Road Class	Average Speed (km/h)
Highway	81.60
Freeway	69.55
Trunk Road	39.58
Secondary Trunk Road	28.61

Table 5. Closest set of topologies matched by Section 97301.

ID	Road Classification	UpNum ¹	DownNum ²	Road Length	Diff (m) ³
443301	Secondary Trunk Road	3	1	1366	6
43203	Secondary Trunk Road	3	1	1348	24
681302	Secondary Trunk Road	3	1	1399	27
604301	Secondary Trunk Road	3	1	1331	41
593104	Secondary Trunk Road	3	1	1416	44

¹ UpNum is the number of inflow sections. ² DownNum is the number of outflow sections. ³ Diff is the difference between the length of individual sections and the length of Section 97301.

Table 6. Initial status probabilities.

Congestion Degree	Smooth	Basic Smooth	Light Congestion	Moderate Congestion	Severe Congestion
Initial Status Probabilities	0.607	0.264	0.079	0.043	0.007

Table 7. Comparison with another reported result.

Dataset	Road Classification	Average Precision (%)
Dataset	Road Classification	Ours	HMM-C
Data of Shenzhen City	Highway	92.685	91.823
	Freeway	91.750	91.663
	Trunk Road	91.364	91.315
	Secondary Trunk Road	90.559	92.219

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, F.; Liu, K.; Chen, J. Traffic Status Prediction Based on Multidimensional Feature Matching and 2nd-Order Hidden Markov Model (HMM). Sustainability 2023, 15, 14671. https://doi.org/10.3390/su152014671

AMA Style

Li F, Liu K, Chen J. Traffic Status Prediction Based on Multidimensional Feature Matching and 2nd-Order Hidden Markov Model (HMM). Sustainability. 2023; 15(20):14671. https://doi.org/10.3390/su152014671

Chicago/Turabian Style

Li, Fei, Kai Liu, and Jialiang Chen. 2023. "Traffic Status Prediction Based on Multidimensional Feature Matching and 2nd-Order Hidden Markov Model (HMM)" Sustainability 15, no. 20: 14671. https://doi.org/10.3390/su152014671

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Traffic Status Prediction Based on Multidimensional Feature Matching and 2nd-Order Hidden Markov Model (HMM)

Abstract

1. Introduction

2. Literature Review

2.1. Traffic Status Recognition

2.2. Prediction and Imputation Algorithms

3. Methods

3.1. Traffic Status Labeling

3.2. Multidimensional Feature Extraction

3.3. Feature Normalization and Matching

3.4. Traffic Status Prediction Based on 2nd-Order HMM

4. Experiment

4.1. Traffic Status Labeling and Road Feature Construction

4.2. Traffic Status Matching and 2nd-Order HMM Prediction

4.3. Comparison Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI