Next Article in Journal
Directed Functional Coordination Analysis of Swallowing Muscles in Healthy and Dysphagic Subjects by Surface Electromyography
Next Article in Special Issue
Enhanced Perception for Autonomous Driving Using Semantic and Geometric Data Fusion
Previous Article in Journal
Analysis of the Vibration Characteristics of a Leaf Spring System Using Artificial Neural Networks
Previous Article in Special Issue
Challenges of Large-Scale Multi-Camera Datasets for Driver Monitoring Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Probabilistic Traffic Motion Labeling for Multi-Modal Vehicle Route Prediction

by
Alberto Flores Fernández
1,2,*,
Jonas Wurst
1,
Eduardo Sánchez Morales
1,
Michael Botsch
1,
Christian Facchi
1 and
Andrés García Higuera
2,3
1
Fakultät Elektro- und Informationstechnik, Technische Hochschule Ingolstadt, Esplanade 10, 85049 Ingolstadt, Germany
2
Escuela Técnica Superior de Ingeniería Industrial, Universidad de Castilla-La Mancha, Calle Altagracia 50, 13001 Ciudad Real, Spain
3
European Parliamentary Research Service, Rue Wiertz 60, B-1047 Brussels, Belgium
*
Author to whom correspondence should be addressed.
Sensors 2022, 22(12), 4498; https://doi.org/10.3390/s22124498
Submission received: 20 May 2022 / Revised: 11 June 2022 / Accepted: 11 June 2022 / Published: 14 June 2022
(This article belongs to the Special Issue Robust Multimodal Sensing for Automated Driving Systems)

Abstract

:
The prediction of the motion of traffic participants is a crucial aspect for the research and development of Automated Driving Systems (ADSs). Recent approaches are based on multi-modal motion prediction, which requires the assignment of a probability score to each of the multiple predicted motion hypotheses. However, there is a lack of ground truth for this probability score in the existing datasets. This implies that current Machine Learning (ML) models evaluate the multiple predictions by comparing them with the single real trajectory labeled in the dataset. In this work, a novel data-based method named Probabilistic Traffic Motion Labeling (PROMOTING) is introduced in order to (a) generate probable future routes and (b) estimate their probabilities. PROMOTING is presented with the focus on urban intersections. The generation of probable future routes is (a) based on a real traffic dataset and consists of two steps: first, a clustering of intersections with similar road topology, and second, a clustering of similar routes that are driven in each cluster from the first step. The estimation of the route probabilities is (b) based on a frequentist approach that considers how traffic participants will move in the future given their motion history. PROMOTING is evaluated with the publicly available Lyft database. The results show that PROMOTING is an appropriate approach to estimate the probabilities of the future motion of traffic participants in urban intersections. In this regard, PROMOTING can be used as a labeling approach for the generation of a labeled dataset that provides a probability score for probable future routes. Such a labeled dataset currently does not exist and would be highly valuable for ML approaches with the task of multi-modal motion prediction. The code is made open source.

1. Introduction

Urban mobility and transportation are cornerstones of society. Due to the high socio-economic impact of road accidents, there is a motivation to continuously make improvements with regard to automotive safety. This motivation has derived from the development of the modern road infrastructure, which has brought major advances in terms of road safety and traffic-flow efficiency. Recent European Union (EU) road safety statistics [1] show, however, that these improvements stagnated in 2019. Specifically, they quantify a decrease in fatal accidents of 23% when compared to 2010 and of 2% when compared to 2018. For this reason, the EU has launched an ambitious initiative called “Vision Zero” [2], in which it establishes the goal of reducing fatalities caused by traffic accidents to near zero by 2050 and sets the target of halving the number of severe accidents by 2030. To this end, the EU initiative highlights the role that vehicle automation and connectivity play in increasing safety. Given that the majority of accidents (94%) are caused by human error [3], the ADSs under development are mainly focused on improving safety by assisting drivers with the early recognition and avoidance of dangerous situations, while also considering other aspects such as emissions reduction, driving efficiency, and improved passenger comfort. The deployment of automated driving functions in traffic scenarios in open environments is being carried out progressively. The Society of Automotive Engineers defines six levels of automation from levels 0 to 5 [4], where level 5 corresponds to full and unsupervized autonomy. A level 5 automated vehicle demands a very high technological complexity, and, to date, the driving functions required for this level of automation do not have the necessary robustness for deployment in traffic scenarios in open environments. According to [5], the main aspects and systems related to ADSs can be summarized using ten categories: (1) connected systems, (2) end-to-end driving, (3) localisation, (4) perception, (5) assessment and motion prediction, (6) planning, (7) control and dynamic, (8) human machine interface, (9) dataset and software, and (10) implementation. In this work, multidisciplinary research is performed, covering mainly aspects from categories 5 and 9.
A recent line of research [6,7,8,9,10,11,12] focuses on multi-modal motion prediction. This is based on the consideration that traffic motion is multi-modal in nature, meaning that each traffic participant is not bound to follow a single trajectory in the future, but it can instead choose from a wide variety of possible trajectories. In this way, not just one, but multiple probable motion hypotheses are predicted for each traffic participant, allowing researchers to capture the different options a driver may take, such as turning left, making a U-turn or continuing straight ahead, among others. In the following, the term mode refers to a specific estimation of future motion within a finite set of possibilities, and the likelihood that a given mode will be selected is denoted as mode score or mode probability. One prominent approach to address multi-modal motion predictions makes use of ML methods based on the supervised learning paradigm. For this, a labeled dataset is necessary, i.e., the label associated with each sample is known. In case the dataset is generated from real traffic data, only a single real trajectory per traffic participant can be labeled, namely the one that has been driven. This shows the challenge of (a) predicting multiple motion hypothesis for each traffic participant, out of a single labeled one. In addition, the prediction of multiple motion hypotheses implies the assignment of a probability score to each one with respect to the total number of hypotheses. However, labeled datasets with probabilities for routes are not available, (b) resulting in a lack of ground truth for this probability scores.
These aspects ((a) and (b)) motivate the investigation of a method that addresses the following research questions: (1) how to extract the route (certain sections of the road) that represents each possible mode from real traffic datasets, (2) how to estimate the probability that a vehicle will drive a certain mode, and (3) how to generate an adequate multi-modal labeled dataset so that a ML model can learn from it the intrinsic multi-modal motion of traffic scenarios.
In this regard, this work introduces a novel data-based method named PROMOTING that allows the estimation of multiple routes for each traffic participant and provides a probability score for each of the possible future routes. In this way, PROMOTING can be used as a labeling approach for the generation of a labeled dataset that contains not only single trajectories as its ground truth, but also the multiple estimated routes. Given the fact that the early introduction of smart intersections will be of mixed traffic, i.e., automated and non-automated driving together, the modeling of the traffic flow at such scenarios will be significant. The smart intersection is a concept aimed at improving the safety and traffic flow of intersections. It is based on the use of sensors and communication systems that allow researchers to capture and analyze traffic to support ADSs functions. Thus, PROMOTING focuses on urban traffic scenarios, paying special attention to urban intersections.
Therefore, this work makes a contribution to the improvement of multi-modal motion predictions by introducing the PROMOTING method, highlighting the following. First, the method is able to extract multiple motion hypotheses for each traffic participant. Second, the method is able to estimate the probability that a vehicle will drive following a specific motion hypothesis. Third, the method may be used for the generation of a labeled dataset that provides extra information that is useful for a multi-modal prediction task. Fourth, the method is evaluated using real-world traffic scenarios from a database, which allows us to obtain a realistic representation of the traffic’s behavior in urban traffic scenarios.
The rest of the paper is structured as follows: in Section 2, related works are presented. In Section 3, the methodology of PROMOTING is detailed. In Section 4, the evaluation of PROMOTING is presented, and the associated results are shown and described. In Section 5, the main findings of the work are discussed. The paper is summarized in Section 6.

2. Related Works

According to [6], the motion prediction of traffic participants can be grouped into the following categories:
(1)
an engineering approach or physics-based methods,
(2)
planning-based methods, and
(3)
pattern-based methods.
Over the last few years, the research into motion prediction has shifted its focus from the physics-based generation of trajectories to the use of ML methods for the same purpose. The authors of [13] proposed the Attention mechanism that marked a shift in the way typical neuro-linguistic programming, time-series forecasting, and sequence-to-sequence problems are approached. Along with this Attention mechanism, Transformer Networks are also finding their way into motion prediction tasks. In [6], Multiple Attention Heads (MAH) are implemented together with a Long–Short Term Memory Encoder–Decoder architecture to predict multiple trajectories, thus addressing the multi-modality of the motion of traffic participants and considering cross-agent interaction modeling. A similar approach is taken in [14]. The difference between [6] and [14] is that the latter adds map-related information that is learned by the Attention mechanism, which assists in modeling the agent–map interaction and improves the system performance. In [15], an architecture based on an Encoder–Decoder structure is proposed, where both are based exclusively on MAH. This model achieves a better performance than the one proposed in [14]. Other recent approaches [9,16,17] build on the work of [13] and use Transformer Networks based on MAH. In [16], pedestrian trajectory prediction is investigated, where the behavior of the pedestrians is modeled without taking into account any kind of interaction with neither traffic participants nor with the map information. This approach is able to closely predict the motion of pedestrians, highlighting the suitability of using Transformer Networks for motion planning tasks. A similar method is presented in [17], where the orientation of the traffic participants is considered to be an additional feature to the input vector when compared to [16]. Furthermore, whereas in [16] only pedestrians are considered, in [17] the performance of the ML model is evaluated for different types of traffic scenarios and different types of road users. A more complex ML-architecture than [16,17] is used in [9], consisting on three stacked Transformer Networks: vehicle motion, vehicle–map interaction, and vehicle–vehicle interaction. The networks are trained sequentially for each epoch, where the vehicle–vehicle interaction network receives the output of the vehicle–map interaction network, and the vehicle–map interaction network receives the output of the vehicle motion network. In addition to receiving the output of the previous one, each network receives additional inputs, which allows each network to specialize in a particular task.
In order for a ML model to learn something as complex as urban traffic, a large amount of data captured from real-world driving scenes is necessary. To prevent over-fitting, the data should have a large variability; in this way, the ML model is able to capture as many as possible of the variations of relevant features.
In the case of urban intersections, for example, the behavior of the traffic participants varies depending on the time of day, working/non-working days, and construction sites, among others. All these situations influence the behavior of the traffic participants, and their consideration provides extra knowledge that must be taken into account by ADSs. On the other hand, capturing real traffic data with these characteristics is a major challenge because of the required financial, computational, and time resources. One strategy to overcome this is to constrain the research and development of ADSs to bounded driving environments, such as smart urban corridors [18]. To this end, it is relevant to use appropriate databases for the training of the ML models.
Current research works [6,7,8,9,10,11,12] that focus on multi-modal motion prediction evaluate their performance either in terms of the Average Displacement Error (ADE), the Final Displacement Error (FDE), or the Root Mean Square Error (RMSE). That is, they consider a single labeled real trajectory and measure the Euclidean distances between the reference trajectory and each of the predicted ones. The best trajectory is then chosen based one of the minimum ADE, the minimum FDE, or the minimum RMSE. The main problem with using these metrics both to reduce training losses and to evaluate the model during the inference phase is that it forces ML models to generate trajectories close to the reference trajectory. This may result in a subset of the predicted trajectories not being drivable, not following the road infrastructure, or colliding with other traffic participants. Furthermore, the prediction of multiple motion scenarios for each traffic participant entails assigning a probability score that indicates the likelihood of selecting a hypothesis within the set of multiple hypotheses; however, the existent datasets containing real traffic data as [19,20,21,22,23,24,25] do not provide this score, as there is only a single real trajectory labeled by each traffic participant.
In [26], the graphs of road topologies are used to identify similar examples through their isomorphism. This is required to shape the latent space for proper novelty detection. Moreover, in [27], the isomorphisms are used to identify similar traffic scenarios, also including the trajectories as paths inside the graphs. As before, this is used for shaping a latent space. However, in the present work, isomorphisms are used to identify similar intersections and routes in the intersections in order to identify similar modes.
Relevant work on the representation of motion hypotheses in traffic scenarios is presented in [28], with the introduction of the Predicted-Occupancy Grids (POGs). These represent the future traffic scenarios in the form of grid cells, where the confidence about the motion of dynamic agents is represented. This approach considers a spectrum of expected occupancy values beyond the simplistic binary approach, i.e., occupied or not occupied. This type of representation is used for the prediction of complex traffic scenarios in [29,30], where different types of machine learning based architectures for POGs estimation are presented. However, there are three notable differences between the work of [28] and the present work. In [28], the approach is based on expert knowledge (assumes physical models of vehicles and motion hypothesis), makes use of simulation data, and the method outputs POGs. In contrast, in the present work, a methodology based on a frequentist approach is proposed (recorded traffic data is analyzed without making a motion hypothesis), real-world traffic data is used, and the presented method (PROMOTING) outputs the modes, in the form of routes, and the mode probabilities.
With regard to all the above, the present research work addresses the shortcomings of multi-modal motion prediction research by proposing the novel PROMOTING method. This serves as the methodology for the generation of a labeled dataset that extracts information about the modes of traffic participants based on conditional prior information. The method is able to extract the number and route of the modes, as well as to estimate the probability that a traffic participant will drive a specific mode. To the best of the authors’ knowledge, this is the first work seeking to estimate the modes with their probabilities in a probabilistic way from real-world data for the purpose of the labeling of multi-modal motion hypothesis.

3. Materials and Methods

In order to estimate the modes and the probabilities of each mode, PROMOTING requires (1) historical traffic data and (2) topological information of the road map. To cover these requirements, the publicly available Lyft database [25] is selected, so PROMOTING is evaluated in this work by making use of this database. This database contains traffic motion information that is captured by a vehicle equipped with exteroceptive sensors. It contains a large amount of real-world trajectory data of dynamic participants, including urban intersections, and detailed map information covering the urban area where the traffic scenes were recorded. The methodology of PROMOTING is composed of five steps (see Figure 1), and each step is explained in a subsection of this section.

3.1. Road Infrastructure Description

The first step of the PROMOTING method, see Figure 2, aims to describe static traffic information: the road infrastructure. This is described by the road map information contained in the Lyft database on the basis of the map description (see Section 3.1.1) and the intersection description (see Section 3.1.2).
A visual representation of the road infrastructure generated from information from the Lyft database is shown in Figure 3.

3.1.1. Map Description

The road map information contained in the Lyft database divides the road space into so called ways, which are road sections of finite length representing an individual lane in a given direction. In this work, each way is referred to as a vertex. Thus, the set of vertices ν i of the map V is defined as
V = { ν 1 , ν 2 , , ν i , , ν n ν } ,
where nν indicates the order of G, i.e., the number of vertices contained in the map. Each vertex ν i V is characterized by a number of features that allow its geometric and connectivity definition, for example:
  • centreLine: (x, y) coordinates in global coordinate frame of each vertex.
  • turnDirection: indicates the type of change of direction of the vertex: “1” for straight, “2” for left turns, and “3” for right turns.
  • intersectionId: unique identifier κ i N of the intersection of which the vertex forms a part. Value “−1” if the vertex is not part of an intersection.
  • predecessors: set A i , pre that contains the immediate previous vertices with respect to the driving direction, such that A i , pre V .
  • successors: set A i , suc that contains the immediate following vertices with respect to the driving direction, such that A i , suc V .
  • leftNeighbours: set A i , left that contains the immediate to the left vertices with respect to the driving direction, such that A i , left V .
  • rightNeighbours: set A i , right that contains the immediate to the right vertices with respect to the driving direction, such that A i , right V .
Thus, the set A i V that contains the adjacent vertices of the ith vertex is defined as a union of sets, so that
A i : = A i , pre A i , suc A i , left A i , right .
The connection between the different vertices ν i V provides valuable information for the vehicle motion prediction. In this paper, the connectivity information of the vertices is used to derive a graph-based model that represents the topology of the urban road network. The map topology G is then defined as a directed graph, so that
G = ( V , E ) ,
where E denotes the set of edges ϵ k of the map, with
E = { ϵ 1 , ϵ 2 , , ϵ k , , ϵ n ϵ } ,
where nϵ indicates the size of G, i.e., the number of edges contained in the graph.
Each edge ϵ k represents the connection between two adjacent vertices, so that
ϵ k = { ( ν i , ν j ) | ν i , ν j V ν j A i } , i j , i , j { 1 , 2 , , n ν } .
The order of the vertex pair indicates the driving direction on the edge, where the first element is the “source vertex”, and the second one is the “target vertex”. For example, ϵ k = ( ν i , ν j ) E indicates that the driving direction on the kth edge is from the ith vertex to the jth vertex.

3.1.2. Intersection Description

Similarly, the road topology of an intersection contained in the map, denoted as the ι th intersection, is modeled as the graph G ι = ( V ι , E ι ) with information from G so that G ι G . That is, G ι is a sub-graph of G. Therefore, V ι V and E ι E .
To model the graph of each intersection, it is necessary to identify which vertices belong to the same intersection and how are they connected to each other. In this sense, three types of vertex are differentiated for each intersection:
(1)
Incoming vertex: The vertex at the entrance of an intersection. These vertices are grouped in sets with the sub-index “in”.
(2)
Crossing vertex: The vertex on an intersection. These vertices are grouped in sets with the sub-index “x”.
(3)
Outgoing vertex: The vertex at the exit of an intersection. These vertices are grouped in sets with the sub-index “out”.
Thus, incoming vertices precede crossing vertices, and crossing vertices precede outgoing vertices. With this, the graph G ι of the ι th intersection is generated as described in Algorithm 1 and a graphic depiction is shown in Figure 4.
Algorithm 1: Intersection graph generation
   Input: directed graph G of the map and the unique intersection identifier ι .
   Output: directed graph G ι of the ι th intersection formed by the edges set E ι and the vertex set V ι with the incoming, crossing and outgoing vertices of the intersection.
1   V ι , x : = { ν j V | κ j = ι }
2   V ι , in : = { ν i V | ( ν i , ν j ) E ν j V ι , x }
3   V ι , out : = { ν k V | ( ν j , ν k ) E ν j V ι , x }
4   V ι = V ι , in V ι , x V ι , out
5   E ι : = { ( ν i , ν j ) E | ν i , ν j V ι i j }
6   G ι = ( V ι , E ι )
Algorithm 1 can be used for as many intersections as required to generate the set of intersection graphs S int , so that
S int = { G 1 , G 2 , , G n int } ,
where n int indicates the number of intersection graphs generated from the map.

3.2. Vehicle Intersection Data Extraction

Once the intersection graphs S int and the map vertex set V are generated, the next step is the extraction of the list of the Vehicle Intersection Data (VID) X VID . That is, the route information (sequence of vertices) of each vehicle that crosses an intersection and the graph of the crossed intersection. To accomplish this, the motion history of the vehicle is required, in addition to the intersection graphs and the vertex set obtained in the previous step, see Figure 5.
A detailed description of the VID extraction process is depicted in Figure 6.
As depicted in Figure 6, the VID extraction starts by iterating over all n scenes traffic scenes contained in the Lyft database. Each ith scene contains a record of the motion of all n obj registered objects. For each ith scene, the motion information of each jth object with “car” label is extracted. Next, for each jth vehicle, its (x,y) coordinates are read, and, together with V, the coordinates are associated with vertices so as to generate the vertex sequence Q i , j , as detailed in Section 3.2.1. Later, Q i , j is used to extract n routes routes that cross intersections, as detailed in Section 3.2.2. Then, for each R k route that crosses an intersection with graph G ι a new VID, denoted by X i , j , k , is generated. Hence, the VID, X i , j , k , for the kth route of jth vehicle in the ith scene is determined as
X i , j , k = ( R k , G ι ) ,
where the route R k is represented by a sequence of vertices and is denoted as follows
R k = [ r 1 , r 2 , , r q , ] , r q V ι and
the intersection graph G ι is generated as indicated by Algorithm 1.
Thus, the VID list X VID is defined as the list whose elements are the extracted X i , j , k and is denoted as follows
X VID = ( X 1 , X 2 , , X n VID ) .

3.2.1. Coordinate–Vertex Association

The first step to extract the R k route is to obtain the vertex sequence Q i , j . For this, the (x,y) coordinates of the jth vehicle in the ith scene at each time instance are associated with vertices contained in V. This results in the vertex sequence Q i , j , which represents the vertices that the vehicle has driven on. One should note that the association is not unique, meaning that a set of (x,y) coordinates may be associated with multiple vertices, and a vertex may be associated with multiple sets of (x,y) coordinates, which results in a multiple vertex associations. This occurs frequently when the (x,y) coordinates are located at intersections where different crossing vertices overlap. This means that Q i , j must be processed.
First, the invalid (empty) vertex associations are removed from the sequence. An invalid association can happen, for example, when the vehicle moves on “non-drivable” sections of the map. Second, duplicated vertex associations are unified. A duplicated association occurs when a vertex appears in Q i , j in two or more consecutive time instances. By unifying the duplicated vertex associations, only unique ones remain. Finally, Q i , j is filtered according to the intersection topology. This handles multiple vertex associations that can occur when various vertices overlap, see vertices 7, 8, and 9 in Figure 7. Filtering according to the intersection means that only the vertices included and connected in the intersection are kept.

3.2.2. Extraction of Intersection Routes

Once the vertex sequence Q i , j of the jth vehicle in the ith scene is extracted and processed, the next step is to extract the routes that cross intersections. It is possible for a single vehicle to contain more than one intersection route. An intersection route should fulfill the following two characteristics:
1.
The route must contain at least one crossing vertex.
2.
The route must contain either at least one incoming vertex or at least one outgoing vertex.
This approach allows us to differentiate four categories of intersection routes:
1.
Complete: The route contains a full description of how the vehicle approaches, crosses, and leaves the intersection. The route starts with incoming vertices, follows crossing vertices, and ends with outgoing vertices. An example of a complete route is the vertex sequence [ 2 , 9 , 15 ] , see Figure 4.
2.
Entering: The route contains a description of how the vehicle approaches and crosses the intersection. The route starts with incoming vertices and ends with crossing vertices. An example of an entering is the vertex sequence [ 2 , 9 ] , see Figure 4.
3.
Leaving: The route contains a description of how the vehicle crosses and leaves the intersection. The route starts with crossing vertices and ends with outgoing vertices. An example of a leaving route is the vertex sequence [ 9 , 15 ] , see Figure 4.
4.
Other: Routes that do not belong to any of these three categories. One such route would be that of a vehicle that is standing still during the complete scene, thus remaining at a single vertex. These are omitted, as they do not provide information on how the vehicle approaches or leaves the intersection.
An example of this process is shown in Figure 7. There, the vertex sequence Q i , j of the jth vehicle in the ith scene is given by
Q i , j = [ 4 , 11 , 17 , 22 , 25 , 30 ] .
From Q i , j , 11 and 22 are crossing vertices. Intersection IDs κ are taken from these vertices. So, let κ 11 = 8 indicate that vertex 11 belongs to the 8th intersection and κ 22 = 9 indicate that vertex 22 belongs to the 9th intersection. Then, the rest of the vertices of the Q i , j vertex sequence that belong to these given intersections are extracted. In this example, the 30th vertex is neglected, as it does not belong to any intersection of this vertex sequence. The remaining vertices are split in as many routes as unique intersection IDs. In this example, two routes are created: one for the 8th intersection and one for the 9th intersection. The elements of the vertex sequence are assigned to a route according to the intersection they belong to. In this example, as the 17th vertex belongs to both the 8th and the 9th intersection, it is assigned to both routes. Next, the type of each vertex of each route is assigned according to the intersection topology. This is the reason why the 17th vertex is assigned as being “outgoing” for the route that corresponds to the 8th intersection and “incoming” for the route that corresponds to the 9th intersection.

3.3. Vehicle Intersection Data Clustering

Once the set of intersection graphs S int and the map vertex set V are generated (per Section 3.1) and the VID list X VID is obtained (as per Section 3.2), the next step is the VID clustering. This step aims to cluster the elements of the list of VIDs X VID with respect to their graphs. Specifically, the graph isomorphism represents the similarity criterion. Then, the output of this step is n clusters , where each cluster c is denoted by X c , VID . A graphical depiction of this process is shown in Figure 8. This process consists of two steps: a pre-clustering of graphs (see Section 3.3.1) and an isomorphic clustering (see Section 3.3.2). These steps are detailed in what follows.
The list X VID contains n VID routes of vehicles crossing the intersections; these are classified as they are defined in Section 3.2.2. It should be noted that only complete routes have been selected, because they are the only type of routes that contain a full description of the intersection crossing from the entrance to the exit.

3.3.1. Pre-Clustering

The process of clustering based on isomorphism is computationally expensive. This is specially relevant for large databases, where graph-wise and vertex-wise associations are verified. A brute-force search for the n ν ! possible bijective functions that satisfy the definition of isomorphism between all extracted graphs is not practical.
For this reason, pre-clustering the graphs prior to the isomorphic clustering (Section 3.3.2) is proposed. This is performed by examining a series of preconditions that two graphs must possess in order to be isomorphic. The preconditions are evaluated in a hierarchical manner, allowing us to structure the database in the form of a tree. This database tree allows further analysis of the distribution of the data in terms of graph properties. Then, the first four hierarchical levels of the database tree are detailed in what follows. Alongside this, an example slice of such a database tree is shown in Figure 9.
  • Level 0: The root node of the database tree is located at this level and is the highest hierarchical level from which all branches emerge. All VIDs are inside the root node.
  • Level 1: The graphs are grouped by their order, i.e., the number of vertices contained in the graph. Hence, only VIDs with the same graph order are part of the same node. In Figure 9, A and B are two example nodes at that level, with graph orders 20 and 21, respectively.
  • Level 2: The graphs are grouped by their size, i.e., the number of edges contained in the graph. VIDs with the same graph orders and sizes are part of the same node. In Figure 9, the node C group VIDs with graph order equal to 20 and graph size equal to 24.
  • Level 3: The graphs are grouped by their matrix degree:
    Θ seq = n 0 , in n 1 , in n 2 , in n n ϵ , in n 0 , out n 1 , out n 2 , out n n ϵ , out ,
    where the first row refers to the in-degree of the graph, and the second row refers to the out-degree of the graph, i.e., the number of incoming and outgoing edges to/from the vertices, respectively. With this, n 2 , in indicates the number of vertices in the graph whose in-degree is equal to 2, and n 2 , out indicates the number of vertices in the graph whose out-degree is equal to 2. Therefore, at this level, only VIDs with the same graph order, the same graph size, and the same matrix degree are grouped. In Figure 9, the node E group VIDs with graph orders equal to 20, graph sizes equal to 24, and matrix degree Θ 1 .
The levels 0–3 describe the pre-clustering, which creates smaller groups according to their graph properties, such that computationally expensive isomorphism needs to be examined only with the nodes of level 3.

3.3.2. Isomorphic Clustering

Given the database tree from the pre-clustering, the aim is to identify VIDs with similar graphs. Only level 3 need to be taken into consideration, since isomorphism between the graphs is only possible within nodes of level 3.
Two graphs G 1 and G 2 are said to be isomorphic if
G 1 G 2 ,
where Equation (12) holds true if a bijective function f : V G 1 V G 2 exists, such that
v i , v j V G 1 ; ( v i , v j ) E G 1 ( f ( v i ) , f ( v j ) ) E G 2 .
This means that every vertex and edge of G 1 has a unique mapping to a vertex and edge of G 2 . All isomorphic graphs are then clustered in level 4 nodes. Nodes H, I, J, and K of Figure 9 are level 4 nodes.

3.4. Route-Type Counting

Once the set of intersection graphs S int and the map vertex set V are generated (as per Section 3.1), the VID list X VID is obtained (see Section 3.2), and the VIDs are clustered (see Section 3.3), the next step is the counting of route types. For this, each cth cluster of X VID is analyzed in order to extract (1) the set of route types R ^ c and (2) the counting list ρ R ^ c , whose elements indicate how often each route type appears in the cluster. A graphical depiction of this process is shown in Figure 10.
If one considers that the names of the vertices are unique, two intersections cannot be compared by the vertex name alone. Therefore, a common vertex representation per cluster is needed. This common representation is achieved in the form of a template graph that is created for each cluster. The graph of the first X VID of each cluster is taken as the template of that node. Then, the bijective function (Equations (12) and (13)) is used to map the rest of the vertices of the routes within the cluster. A graphical depiction of this process is shown in Figure 11. There, the graph G * is the template graph. The route R G 1 is mapped to R G * using the bijective function f : V G 1 V G * .
Once the vertices of the routes within the cluster are mapped to those of the template graph, the route types are extracted. Each route type is a specific vertex sequence in the cluster. Then, the set of route types R ^ c is generated for each cth cluster as follows:
R ^ c = R ^ c , 1 , R ^ c , 2 , , R ^ c , n , ,
where the first sub-index of the elements of R ^ c indicates the cluster to which the route type belongs, and the second sub-index is an identifier for the type of route within the cluster.
For each route type R ^ c , n identified, the frequency ρ R ^ c , n is computed. This frequency represents how often the route type R ^ c , n appears in the cluster based on the dataset. This information is relevant for the estimation of the probability that a traffic participant will drive a given route. Then, the counting list of the route types ρ R ^ c is generated for each cth cluster as follows
ρ R ^ c = ( ρ R ^ c , 1 , ρ R ^ c , 2 , , ρ R ^ c , n , ) .

3.5. Mode Estimation

Once the set of intersection graphs S int and the map vertex set V are generated (as per Section 3.1), the VID list X VID is obtained (see Section 3.2), the VIDs are clustered (see Section 3.3), and the set of route types R ^ c , and the counting list ρ R ^ c are extracted (as per Section 3.4), the next step is to generate the modes and estimate the mode probability. That is, to create a set of routes that a traffic participant can drive for a given intersection type (cluster) and motion history, and to estimate the probability that a given mode will be driven. Thus, for each cth cluster, this process extracts the mode data M c . A graphical depiction of this process is shown in Figure 12.
First, a set of sub-routes for each route type is generated in R ^ c in the c-cluster. For example, for the first route type in the c-cluster R ^ c , 1 , the set of sub-routes R ^ c , 1 is generated, such that
S c , 1 R ^ c , 1 .
Since a route is a vertex sequence, each sub-route s S c , 1 is defined as a coherent sub-sequence of vertices of the corresponding route.
As an example of the creation of the set of sub-routes let the map topology correspond to Figure 4 and the intersection belong to the cth cluster. Given the set of route types R ^ c ,
R ^ c = R ^ c , 1 , R ^ c , 2 , R ^ c , 3 ,
R ^ c , 1 = [ 1 , 7 , 14 ] ,
R ^ c , 2 = [ 1 , 7 , 9 , 15 ] ,
R ^ c , 3 = [ 2 , 9 , 15 ] ;
the corresponding sets of sub-routes S c , 1 , S c , 2 , and S c , 3 for each route type R ^ c , 1 , R ^ c , 2 , and R ^ c , 3 can be generated, so that
S c , 1 = { [ 1 ] , [ 7 ] , [ 14 ] , [ 1 , 7 ] , [ 7 , 14 ] , [ 1 , 7 , 14 ] } ,
S c , 2 = { [ 1 ] , [ 7 ] , [ 9 ] , [ 15 ] , [ 1 , 7 ] , [ 7 , 9 ] , [ 9 , 15 ] , [ 1 , 7 , 9 ] , [ 7 , 9 , 15 ] , [ 1 , 7 , 9 , 15 ] } ,
S c , 3 = { [ 2 ] , [ 9 ] , [ 15 ] , [ 2 , 9 ] , [ 9 , 15 ] , [ 2 , 9 , 15 ] } .
Second, the set S c * that contains all unique sub-routes of the cth cluster is then defined as
S c * = S c , 1 S c , 2 S c , 3 .
The mode data M c of the cth cluster have as many elements, as the sub-routes s are driven in the cluster. This means that, for each sub-route s S c * , an element of M c is computed. Each element of M c contains (1) the set of modes μ c , s and (2) the estimated probabilities P ( μ m | c , s ) of each mode μ m | c , s and is computed as follows:
1.
The set of modes μ c , s used to forecast the possible modes that a vehicle can drive on (1) given the observation of the sub-route s, (2) where each mode ends with an outgoing vertex, and (3) where each mode is part of S c * . For this, a set S ^ c , s is created, so that
S ^ c , s = { s ^ 1 | c , s , s ^ 2 | c , s , , s ^ m | c , s , } ,
with
S ^ c , s S c * : s { s ^ 1 | c , s , s ^ 2 | c , s , , s ^ m | c , s , } , s ^ m | c , s V c , out , m ,
where the set V c , out contains the outgoing vertices of the template graph of the cth cluster. Since the observed sub-route is not part of the modes, i.e., of the future motion, the observed sub-route s is extracted from each of the mth sub-sequences s ^ m | c , s , generating the corresponding mth mode μ m | c , s . This allows the definition of the set of modes μ c , s as follows
μ c , s = { μ 1 | c , s , μ 2 | c , s , , μ m | c , s , } ,
where each element of μ c , s represents a unique mode of completing the crossing of an intersection with the template graph of the cth cluster according to the recorded data and the observation s.
2.
The conditional probability estimation P ( μ m | c , s ) of the mth mode μ m | c , s μ c , s is estimated. This represents the probability that a traffic participant will drive on the mth mode given the cth cluster and the sth observed sub-route in this cluster. The conditional probability is given by
P ( μ m | c , s ) = ρ m | c , s ρ c , s .
One the one hand, ρ m | c , s indicates how often a vehicle is traveling a route type in the cth cluster with the initial sequence-part defined by the sth observed sub-route, and the final sequence-part defined by the mth mode μ m | c , s . On the other hand, ρ c , s indicates how often a vehicle is traveling the sth observed sub-route in the cth cluster of the dataset. Then, ρ c , s is defined by
ρ c , s = q = 1 | R ^ c | ρ R ^ c , q · z q ,
where
z q = 1 , s S c , q 0 , otherwise .
The frequency ρ R ^ c , q was introduced in Section 3.4 and indicates how often a vehicle is traveling the route type R ^ c , q in the cth cluster. The Boolean z q allows us to select only those route types R ^ c , q in which the sub-route s is part of its sequence. Given the above, the sum of the probabilities of all modes is then given by
m = 1 | μ c , s | P ( μ m | c , s ) = 1 .
These two steps (Equations (25)–(30)) are applied for each observed sub-route s in the cth cluster in order to generate each element of the mode data M c .

4. Evaluation and Results

In this section, the evaluation procedure and evaluation results are detailed. The proposed methodology is evaluated with respect to its ability to generate similar modes, mode probabilities, route types, graphs, and database trees, given similar datasets as inputs. For this, the Lyft database is used as data source, because it contains map information, as well as data about the motion of traffic participants. The data from the traffic participants are randomly divided into two independent datasets ( D 1 and D 2 ), where D 1 is the small training dataset provided Lyft for the Kaggle Challenge https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles (accessed on 11 October 2021), and D 2 is the validation dataset provided by Lyft, while the map information remains the same for both datasets. An overview of the evaluation process is shown in Figure 13, and each step of the PROMOTING method is detailed in what follows.
The first step of the PROMOTING method (as per Section 3.1) describes the static traffic information (map vertex set V and intersection graphs S int ). Given that this information does not vary over time and is shared among datasets, the outputs of the first step for each given dataset are not compared. A summary of the road infrastructure description of the Lyft database is shown in Table 1.
The second step of the PROMOTING method (as per Section 3.2) extracts the VID list X VID . Given that each X VID is generated from a unique set of traffic scenes, the VIDs from different datasets are inherently different. In this step, the routes contained in the VIDs from D 1 and D 2 cannot be compared, because the vertices that compose each route have different names and are not yet standardized to a template graph. However, the details of each VID (number of scenes, objects, vehicles, etc.) can be compared, which allows us to corroborate that both D 1 and D 2 are similar in size. This is important, because datasets of different sizes would imply different numbers of clusters, types of clusters, modes, and so on. Specifically, a total of ≈1.6 millions routes of vehicles crossing intersections are extracted from the Lyft database [25]. Approximately 50.5% of the routes belong to D 1 , while the remaining ≈49.5% belong to D 2 . The route distribution according to Section 3.2 is shown in Figure 14, and a summary of the details of the VID of each dataset is shown in Table 2.
As can be inferred from Figure 14 and Table 2, both datasets, D 1 and D 2 , are similar in size, thus aiding in a fair evaluation of the method. Further, as mentioned in Section 3.2, only “complete” routes have been selected in the output of the second step of PROMOTING. The reason for this is that these routes are the only type that contain a full description of the intersection crossing from the entrance to the exit.
The third step of the PROMOTING method (as per Section 3.3) focuses on the clustering of the VIDs according to their graph isomorphism. The comparison metric is the structure of the database trees T D 1 and T D 2 that are generated when the datasets D 1 and D 2 are used as inputs. The node generation of both trees is analyzed, that is, how was the database tree was generated for each input dataset. If the trees are similar, it is an indication that the method is able to cluster similar routes, even when they come from different datasets. The common tree T com is defined as one with a lineage such as the one that is present in both T D 1 and T D 2 , i.e., each node of T com within each level of the tree has a counterpart in both T D 1 and T D 2 . T com can be expressed as follows
T com = T D 1 T D 2 .
The comparison of the structure of the database tree of both T D 1 and T D 2 with T com is shown in Table 3.
Given that both datasets D 1 and D 2 are similar in size, from the results shown in Table 3, it can be inferred that the method is able to comparably cluster the dynamic data from different datasets.
The fourth step of the PROMOTING method (as per Section 3.4) consists of the counting of route types within each cluster. Given a cluster a from T D 1 , its equivalent cluster b from T D 2 is the one with the similar template graph. The comparison metric is computed by the number of routes in cluster a that have an equivalence (same route type) in cluster b, normalized by the overall number of routes in cluster a. For this, let n c ( D 1 ) be the number of routes of the cth cluster of T D 1 and n c , e ( D 1 ) be the number of similar routes, given the cth cluster of T D 1 and its equivalent cluster in T D 2 . Then, the comparison metric is given by
η c , e = n c , e ( D 1 ) n c ( D 1 ) .
Then, the metric η ˜ c , e that represents the average of the ratio of equivalent routes between all common cth clusters from T D 1 and T D 2 is estimated as follows
η ˜ c , e = 1 n clusters c = 1 n clusters η c , e .
For this comparison, η ˜ c , e = 95.82 % was achieved. This indicates that common cth clusters from T D 1 and T D 2 contain mostly the same route types. This indicates that the method is able to cluster the routes of traffic participants from different datasets in a similar manner.
The fifth step of the PROMOTING method (as per Section 3.5) performs the mode estimation. Therefore, the comparison metric is based on the generated modes and their estimated probabilities. For this, let P ( μ m | c , s ( D 1 ) ) be the probability that a vehicle will drive the mth mode given the cth cluster and the sth observed sub-route, considering the dataset D 1 . Similarly, let P ( μ m e | c e , s e ( D 2 ) ) be the probability that a vehicle will drive the m e th mode given the c e th cluster and the s e th observed sub-route, considering the dataset D 2 . Here, the subscript e indicates that the corresponding equivalence is used, i.e., the m e th mode is the equivalence of the mth mode. Therefore, only equivalent modes in equivalent clusters are considered.
Then, the relative difference η m | c , s between the probabilities of equivalent modes of both trees with respect to the probability P ( μ m | c , s ( D 1 ) ) is given by
η m | c , s = | P ( μ m e | c e , s e ( D 2 ) ) P ( μ m | c , s ( D 1 ) ) | P ( μ m | c , s ( D 1 ) ) .
Equation (35) is then estimated for all equivalent modes given all equivalent observations in all equivalent clusters. Then, the metric η ˜ m | c , s that represents the average of η m | c , s of all equivalent modes for all observations in all common clusters from T D 1 and T D 2 is computed as follows:
η ˜ m | c , s = 1 n m , e m = 1 n m , e η m | c , s ,
where n m , e indicates the total number of equivalent modes between T D 1 and T D 2 . For the used datasets, η ˜ m | c , s = 0.39 % . This shows that the mode probabilities, when estimated from two different datasets, are similar to each other. This indicates that the mode probability, when calculated using a large dataset, can estimate mode probabilities for similar datasets from same distributions. Even when PROMOTING uses different datasets, it is able to estimate the modes and the probability of each mode in a similar fashion for equivalent sub-route observations in equivalent intersections.
The main results of the evaluation of steps 4 and 5 of PROMOTING are summarized in Table 4.
A representative graphical example of the extraction of modes and the estimation of the mode probabilities is shown in Figure 15.

5. Discussion

A common challenge of multi-modal motion prediction, is to determine the “optimal” number of modes to predict, that is, how many trajectories per traffic participant should be predicted in order to comprehensively model a given traffic scene. This question has to take into consideration the amount of computational resources available, the time constraints, and the number of traffic participants, among others. Not only the number of trajectories is important, but also what they should look like. The PROMOTING method serves as a reference that shows both what the modes in a given intersection look like and what the probability is that a traffic participant will drive a specific mode. That is, the proposed method aids in the trajectory-prediction task. The method has the potential to be highly valuable for both the training and inference phases of ML methods for multi-modal motion prediction.
Along with the trajectory prediction that each traffic participant performs, the PROMOTING method could also prove to be useful at smart intersections with Vehicle-to-everything (V2X) capabilities. In that scenario, an automated vehicle could receive the information of the crossing (graphs, modes, etc.) from the infrastructure, so that the traffic participant could perform a better prediction of their own motion according to different parameters, such as efficiency or traffic load. This can be extended to all traffic participants, where each one knows where all the other traffic participants are and can predict the motion of the others with the help of the crossing information. This is relevant in the case of mixed traffic, where automated and human-driven vehicles coexist at the same intersection. Even when no V2X is present, the PROMOTING method could still be on board the EGO-vehicle, and, together with the information from exteroceptive sensors, the relationship between the surrounding traffic participants and their possible routes can be generated.
The PROMOTING method was evaluated in this work using the Lyft database. However, the method is not dependent on this database; instead, it can be used together with other map representations, as long as the required map properties are present, that is, the method is not limited to certain types of intersections but can instead generate the information from many different sources.
The method can be extended using real-time traffic information, as already provided by many navigation tools. The constant update of the traffic conditions (flow, weather, construction works, etc.) can provide an extra benefit for traffic analysis, as well as for the trajectory planning of traffic participants. This real-time traffic information does not necessarily have to come from navigation tools or infrastructure but could also be transmitted by other vehicles in the vicinity that have already crossed the intersection.
It should be noted that the mode probability estimation presented in this work does not take into account the interaction between traffic participants. In this paper, only the past sub-route, not the state of the other objects, are considered in the condition. This is a point for future research, with a special focus on the exchange of intentions between traffic participants via V2X. In addition, the investigation of abnormal behavior of traffic participants is also envisaged.

6. Conclusions

In this research work, a novel method named PROMOTING is proposed that is able to generate the modes (probable routes) of traffic participants, as well as estimate the probability that a traffic participant will drive a specific mode. This is done with the aim of supporting ADSs in their task of multi-modal motion prediction.
Mode generation is performed by clustering intersections based on the isomorphisms of their road topology. This allows us to cluster together equivalent intersections and, as a consequence, the equivalent routes of vehicles that crossed the isomorphic intersections. The probability of each mode is estimated based on the frequency with which each route is driven and a given observation (sub-route within the intersection).
The method is evaluated using the Lyft database. The results confirm that the method is able to cluster equivalent intersections and modes. The estimated probabilities of equivalent modes are almost identical, which also corroborates that the method estimates similar probabilities for similar crossings given similar observations. Therefore, PROMOTING provides a methodology that makes it possible to generate a labeled dataset that allows researchers to estimate multiple routes for each traffic participant and provides a probability score for each of the estimated routes. This labeled dataset has the potential to be highly valuable for ML models aimed at the task of motion prediction.
The method could be improved with the inclusion of real-time traffic information that can be sent via V2X communication, including information about the road infrastructure, cellular networks, or other traffic participants. The method is not limited to the used dataset but could also be implemented for other map sources.
Interested readers are referred to the repository [31], where the code that implements the methodology proposed in PROMOTING is made publicly available.

Author Contributions

A.F.F. contributed to the investigation and conceptualization of the methodology, its implementation, and its evaluation. J.W. and E.S.M. contributed to the conceptualization of the methodology. M.B. and C.F. contributed to the conceptualization and supervision of the methodology, the funding acquisition, and the project administration. A.G.H. contributed to the supervision of the methodology. All authors contributed to the formal analysis, writing, and review and editing. A.G.H. is a policy analyst in the European Parliamentary Research Service (EPRS), the internal research service and think-tank of the European Parliament. He is writing in a personal capacity, and any views expressed do not represent an official position of the European Parliament. All authors have read and agreed to the published version of the manuscript.

Funding

This research and the APC were funded by the Federal Ministry of Transport and Digital Infrastructure of Germany (Bundesministerium für Digitales und Verkehr), project number 01MM20012J.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code is public under [31].

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations and notation are used in this manuscript:
ADEAverage Displacement Error
ADSsAutomated Driving Systems
EPRSEuropean Parliamentary Research Service
EUEuropean Union
FDEFinal Displacement Error
MAHMultiple Attention Heads
MLMachine Learning
POGsPredicted-Occupancy Grids
PROMOTINGProbabilistic Traffic Motion Labeling
RMSERoot Mean Square Error
V2XVehicle-to-everything
VIDVehicle Intersection Data
A i Set of the adjacent vertices of the ith vertex with respect to the driving direction
A i , left Set of the immediate vertices to the left of the ith vertex with respect to the driving direction
A i , pre Set of the immediate previous vertices of the ith vertex with respect to the driving direction
A i , right Set of the immediate vertices to the right of the ith vertex with respect to the driving direction
A i , suc Set of the immediate vertices following the ith vertex with respect to the driving direction
D 1 Training dataset provided by the Lyft database
D 2 Validation dataset provided by the Lyft database
ESet of edges ϵ k of the map
E ι Set of edges of the ι th intersection
GDirected graph of the road topology of the map
G * Template graph
M c Mode data of the cth cluster
P ( μ m | c , s ( D 1 ) ) Probability that a vehicle will drive the mth mode given the cth cluster and the sth observed sub-route for the dataset D 1
P ( μ m e | c e , s e ( D 2 ) ) Probability that a vehicle will drive the equivalent m e th mode given the c e th cluster and the s e th observed sub-route for the dataset D 2
Q i , j Sequence of vertices that follows the jth vehicle in the ith traffic scene
R ^ c Set of route types in the cth cluster
R ^ c , n Sequence of vertices that represents the nth route type in the c-cluster
R k Sequence of vertices that represents the k-route of a vehicle that cross an intersection
S ^ c , s Set of sequence of vertices that allows to extract the modes given the cth cluster and the sth observed sub-route
S c , n Set of sub-routes of the c-cluster generated from R ^ c , n
S c * Set that contains unique sub-routes given all the nth sub-routes sets S c , n
S int Set of intersection graphs generated from the map
T com Database tree whose lineage is present in both T D 1 and T D 2
T D 1 Database tree that distributes the VIDs extracted from the database D 1
T D 2 Database tree that distributes the VIDs extracted from the database D 2
VSet of vertices ν i of the map
V ι Set of vertices of the ι th intersection
V ι , in Set of incoming vertices of the ι th intersection
V ι , out Set of outgoing vertices of the ι th intersection
V ι , x Set of crossing vertices of the ι th intersection
X i , j , k VID extracted at the ith scene for the jth vehicle that drives the kth route.
X VID List of VIDs extracted from a given dataset
X c , VID List of VIDs grouped in the cth cluster given the list X VID
n 1 , in Number of vertices of a graph whose in-degree is equal to 1
n 1 , out Number of vertices of a graph whose out-degree is equal to 1
n c ( D 1 ) Number of routes in the cth cluster of the tree T D 1
n c , e ( D 1 ) Number of similar routes in the cth cluster of the tree T D 1 , given the cth cluster of T D 1 and its equivalent cluster in T D 2
nϵNumber of edges contained in the map
n clusters Number of VID clusters given a specific dataset
n int Number of intersection graphs contained in the map
n int , veh Number of intersections crossed by “complete” routes
n obj Number of traffic participants in a specific traffic scene
n obj , total Number of traffic participant in a corresponding dataset
n routes Number of routes that cross a intersection given a vehicle
n scenes Number of traffic scenes contained in a specific dataset
n veh , total Number of vehicles contained in a specific dataset
n VID Number of extracted routes from a specified dataset
n VID , compl Number of “complete” routes
n VID , enter Number of “entering” routes
n VID , leave Number of “leaving” routes
n VID , other Number of “other” routes
n m , e Number of equivalent modes between T D 1 and T D 2
nνNumber of vertices contained in the map
r q Element of the route R k which represent a vertex of the map
sSub-route represented by a vertex or a sequence of vertices
s ^ m | c , s Sequence of vertices used to extract the corresponding mode μ m | c , s given the cth cluster and the sth observed sub-route
z q Boolean that indicates if an observation is part of the sub-route set S c , q or not
ϵ k kth edge of the map
η c , e Ratio of equivalent routes between all common cth clusters from T D 1 and T D 2
η ˜ c , e Average of the ratio η c , e of equivalent routes between all common cth clusters from T D 1 and T D 2
η m | c , s Relative difference between the probabilities of equivalent modes of the trees T D 1 and T D 2 with respect to the probability P ( μ m | c , s ( D 1 ) )
η ˜ m | c , s Average of η m | c , s of all equivalent modes for all observations in all common clusters from T D 1 and T D 2
Θ seq Matrix degree of a graph formed by all its in-degrees and out-degrees
κ i Unique identifier of the intersection of which the ith vertex is part of
μ c , s Set of modes extracted given the cth cluster and the sth observed sub-route
μ m | c , s Mode m given the cth cluster and the sth observed sub-route
ν i ith vertex of the map
ρ c , s Number of times the vehicles are traveling the sth observed sub-route in the cth cluster of a specific dataset
ρ m | c , s Number of times the vehicles are traveling the sth observed sub-route in the cth cluster of a specific dataset, with the initial sequence-part defined by the sth observed sub-route and the final sequence-part defined by the mth mode μ m | c , s
ρ R ^ c List whose elements indicate how often each route type appears in the cth cluster
ρ R ^ c , n Number of times that the route type R ^ c , n is registered

References

  1. European Commision. Road Safety: Europe’s Roads Are Getting Safer but Progress Remains Too Slow; European Commission: Brussels, Belgium, 2020. [Google Scholar]
  2. European Commision. EU Road Safety Policy Framework 2021–2030—Next Steps towards “Vision Zero”; Publications Office of the European Union: Luxembourg, 2020. [Google Scholar] [CrossRef]
  3. Singh, S. Critical Reasons for Crashes Investigated in the National Motor Vehicle Crash Causation Survey; Technical Report, Traffic Safety Facts Crash Stats Report No. DOT HS 812 506; National Highway Traffic Safety Administration: Washington, DC, USA, 2018. [Google Scholar]
  4. On-Road Automated Driving (ORAD) Committee. Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles; SAE International: Warrendale, PA, USA, 2018. [Google Scholar] [CrossRef]
  5. Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A Survey of Autonomous Driving: Common Practices and Emerging Technologies. arXiv 2019, arXiv:1906.05113. [Google Scholar] [CrossRef]
  6. Messaoud, K.; Yahiaoui, I.; Verroust-Blondet, A.; Nashashibi, F. Attention Based Vehicle Trajectory Prediction. IEEE Trans. Intell. Veh. 2021, 6, 175–185. [Google Scholar] [CrossRef]
  7. Cui, H.; Radosavljevic, V.; Chou, F.C.; Lin, T.H.; Nguyen, T.; Huang, T.K.; Schneider, J.; Djuric, N. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks. arXiv 2019, arXiv:cs.RO/1809.10732. [Google Scholar]
  8. Deo, N.; Trivedi, M.M. Multi-Modal Trajectory Prediction of Surrounding Vehicles with Maneuver based LSTMs. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018. [Google Scholar] [CrossRef] [Green Version]
  9. Liu, Y.; Zhang, J.; Fang, L.; Jiang, Q.; Zhou, B. Multimodal Motion Prediction with Stacked Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 7573–7582. [Google Scholar] [CrossRef]
  10. Dong, B.; Liu, H.; Bai, Y.; Lin, J.; Xu, Z.; Xu, X.; Kong, Q. Multi-Modal Trajectory Prediction for Autonomous Driving with Semantic Map and Dynamic Graph Attention Network. arXiv 2021, arXiv:abs/2103.16273. [Google Scholar]
  11. Deo, N.; Wolff, E.; Beijbom, O. Multimodal Trajectory Prediction Conditioned on Lane-Graph Traversals. In Proceedings of the 5th Annual Conference on Robot Learning, London, UK, 8–11 November 2021. [Google Scholar]
  12. Luo, C.; Sun, L.; Dabiri, D.; Yuille, A.L. Probabilistic Multi-Modal Trajectory Prediction with Lane Attention for Autonomous Vehicles. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 2370–2376. [Google Scholar]
  13. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:cs.CL/1706.03762. [Google Scholar]
  14. Messaoud, K.; Deo, N.; Trivedi, M.M.; Nashashibi, F. Multi-Head Attention with Joint Agent-Map Representation for Trajectory Prediction in Autonomous Driving. arXiv 2020, arXiv:abs/2005.02545. [Google Scholar]
  15. Kim, H.; Kim, D.; Kim, G.; Cho, J.; Huh, K. Multi-Head Attention based Probabilistic Vehicle Trajectory Prediction. arXiv 2020, arXiv:cs.CV/2004.03842. [Google Scholar]
  16. Giuliari, F.; Hasan, I.; Cristani, M.; Galasso, F. Transformer Networks for Trajectory Forecasting. arXiv 2020, arXiv:cs.CV/2003.08111. [Google Scholar]
  17. Quintanar, A.; Fernandez-Llorca, D.; Parra, I.; Izquierdo, R.; Sotelo, M.A. Predicting Vehicles Trajectories in Urban Scenarios with Transformer Networks and Augmented Information. arXiv 2021, arXiv:cs.CV/2106.00559. [Google Scholar]
  18. Neemann, A. Autonomous Shuttles: Riding Past the Traffic Jam. Available online: https://www.zf.com/mobile/en/technologies/domains/autonomous_driving/stories/20211124_autonomousshuttle.html (accessed on 19 April 2022).
  19. Ettinger, S.; Cheng, S.; Caine, B.; Liu, C.; Zhao, H.; Pradhan, S.; Chai, Y.; Sapp, B.; Qi, C.R.; Zhou, Y.; et al. Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset. arXiv 2021, arXiv:abs/2104.10133. [Google Scholar]
  20. Huang, X.; Cheng, X.; Geng, Q.; Cao, B.; Zhou, D.; Wang, P.; Lin, Y.; Yang, R. The ApolloScape Dataset for Autonomous Driving. arXiv 2018, arXiv:abs/1803.06184. [Google Scholar]
  21. Maddern, W.; Pascoe, G.; Linegar, C.; Newman, P. 1 Year, 1000 km: The Oxford RobotCar Dataset. Int. J. Robot. Res. 2017, 36, 3–15. [Google Scholar] [CrossRef]
  22. Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A multi-modal dataset for autonomous driving. arXiv 2019, arXiv:abs/1903.11027. [Google Scholar]
  23. Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
  24. Chang, M.; Lambert, J.; Sangkloy, P.; Singh, J.; Bak, S.; Hartnett, A.; Wang, D.; Carr, P.; Lucey, S.; Ramanan, D.; et al. Argoverse: 3D Tracking and Forecasting with Rich Maps. arXiv 2019, arXiv:1911.02620. [Google Scholar]
  25. Houston, J.; Zuidhof, G.; Bergamini, L.; Ye, Y.; Chen, L.; Jain, A.; Omari, S.; Iglovikov, V.; Ondruska, P. One Thousand and One Hours: Self-driving Motion Prediction Dataset. arXiv 2020, arXiv:cs.CV/2006.14480. [Google Scholar]
  26. Wurst, J.; Balasubramanian, L.; Botsch, M.; Utschick, W. Novelty Detection and Analysis of Traffic Scenario Infrastructures in the Latent Space of a Vision Transformer-Based Triplet Autoencoder. In Proceedings of the 2021 IEEE Intelligent Vehicles Symposium (IV), Nagoya, Japan, 11–17 July 2021; pp. 1304–1311. [Google Scholar] [CrossRef]
  27. Wurst, J.; Balasubramanian, L.; Botsch, M.; Utschick, W. Expert-LaSTS: Expert-Knowledge Guided Latent Space for Traffic Scenarios. In Proceedings of the 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 5–9 June 2022. [Google Scholar]
  28. Nadarajan, P.; Botsch, M. Probability estimation for Predicted-Occupancy Grids in vehicle safety applications based on machine learning. In Proceedings of the 2016 IEEE Intelligent Vehicles Symposium (IV), Gothenburg, Sweden, 19–22 June 2016; pp. 1285–1292. [Google Scholar] [CrossRef]
  29. Nadarajan, P.; Botsch, M.; Sardina, S. Predicted-occupancy grids for vehicle safety applications based on autoencoders and the Random Forest algorithm. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 1244–1251. [Google Scholar] [CrossRef]
  30. Nadarajan, P.; Botsch, M.; Sardina, S. Machine Learning Architectures for the Estimation of Predicted Occupancy Grids in Road Traffic. J. Adv. Inf. Technol. 2018, 9, 1–9. [Google Scholar] [CrossRef]
  31. Flores Fernández, A. PROMOTING. Available online: https://github.com/albertofloresfernandez/PROMOTING (accessed on 9 May 2022).
Figure 1. Sequence of operation of PROMOTING method.
Figure 1. Sequence of operation of PROMOTING method.
Sensors 22 04498 g001
Figure 2. Overview of step 1 of the PROMOTING method: the road infrastructure description.
Figure 2. Overview of step 1 of the PROMOTING method: the road infrastructure description.
Sensors 22 04498 g002
Figure 3. A visual representation of the road infrastructure generated from the Lyft database [25].
Figure 3. A visual representation of the road infrastructure generated from the Lyft database [25].
Sensors 22 04498 g003
Figure 4. The graphical representation of an intersection. On the (left), the vertices are coloured polygons, where the circles represent the first point of the centreLine feature, and the arrows connecting the circles represent the edges. On the (right) are the edge matrix with the edge list and the corresponding vertices. In this case, V ι , in = { 1 , 2 , 3 , 4 , 5 , 6 } , V ι , x = { 7 , 8 , 9 , 10 , 11 , 12 , 13 } , and V ι , out = { 14 , 15 , 16 , 17 , 18 , 19 } .
Figure 4. The graphical representation of an intersection. On the (left), the vertices are coloured polygons, where the circles represent the first point of the centreLine feature, and the arrows connecting the circles represent the edges. On the (right) are the edge matrix with the edge list and the corresponding vertices. In this case, V ι , in = { 1 , 2 , 3 , 4 , 5 , 6 } , V ι , x = { 7 , 8 , 9 , 10 , 11 , 12 , 13 } , and V ι , out = { 14 , 15 , 16 , 17 , 18 , 19 } .
Sensors 22 04498 g004
Figure 5. Overview of step 2 of the PROMOTING method: the route extraction.
Figure 5. Overview of step 2 of the PROMOTING method: the route extraction.
Sensors 22 04498 g005
Figure 6. The process to extract the VID.
Figure 6. The process to extract the VID.
Sensors 22 04498 g006
Figure 7. Graphical depiction of the method used to extract intersection routes. Crossing vertices are marked in blue, incoming vertices in green, and outgoing vertices in red.
Figure 7. Graphical depiction of the method used to extract intersection routes. Crossing vertices are marked in blue, incoming vertices in green, and outgoing vertices in red.
Sensors 22 04498 g007
Figure 8. Overview of step 3 of the PROMOTING method: the VID clustering.
Figure 8. Overview of step 3 of the PROMOTING method: the VID clustering.
Sensors 22 04498 g008
Figure 9. Slice of a database tree for the clustering of the VID based on isomorphic graphs. Each leaf node at level 4 groups the routes of vehicles crossing intersections whose graphs are isomorphic.
Figure 9. Slice of a database tree for the clustering of the VID based on isomorphic graphs. Each leaf node at level 4 groups the routes of vehicles crossing intersections whose graphs are isomorphic.
Sensors 22 04498 g009
Figure 10. Overview of step 4 of the PROMOTING method: the route-type counting.
Figure 10. Overview of step 4 of the PROMOTING method: the route-type counting.
Sensors 22 04498 g010
Figure 11. Overview of step 4 of the PROMOTING method: the route-type counting.
Figure 11. Overview of step 4 of the PROMOTING method: the route-type counting.
Sensors 22 04498 g011
Figure 12. Overview of step 5 of the PROMOTING method: the mode estimation.
Figure 12. Overview of step 5 of the PROMOTING method: the mode estimation.
Sensors 22 04498 g012
Figure 13. An overview of the evaluation process of the PROMOTING method.
Figure 13. An overview of the evaluation process of the PROMOTING method.
Sensors 22 04498 g013
Figure 14. Route distribution according to Section 3.2: complete, outgoing, entering, and other.
Figure 14. Route distribution according to Section 3.2: complete, outgoing, entering, and other.
Sensors 22 04498 g014
Figure 15. Example of the extraction of modes and estimation of the probability of each mode for four different types of intersections of the Lyft database. In the first column, the intersection is represented by the vertices that compose its graph. The second, third, and fourth columns represent the most probable modes (from highest to lowest probability), given the observed sub-route coloured in yellow and the history of the motion.
Figure 15. Example of the extraction of modes and estimation of the probability of each mode for four different types of intersections of the Lyft database. In the first column, the intersection is represented by the vertices that compose its graph. The second, third, and fourth columns represent the most probable modes (from highest to lowest probability), given the observed sub-route coloured in yellow and the history of the motion.
Sensors 22 04498 g015
Table 1. Summary of the road infrastructure description of the Lyft database.
Table 1. Summary of the road infrastructure description of the Lyft database.
FeatureNameValue
Graph order (number of map vertices)nν8506
Graph size (number of map edges)nϵ12,185
Number of intersections contained in the map n int 909
Table 2. Summary of the details of the VIDs generated from D 1 and D 2 .
Table 2. Summary of the details of the VIDs generated from D 1 and D 2 .
FeatureNameValue ( D 1 / D 2 )
Number of traffic scenes n scenes 16,265/16,220
Number of traffic participants n obj , total 20,320,381/19,557,084
Number of vehicles n veh , total 4,710,949/4,621,107
Number of routes n VID 801,612/786,919
Number of “complete” routes n VID , compl 349,322/342,003
Number of “entering” routes n VID , enter 202,260/199,359
Number of “leaving” routes n VID , leave 207,595/205,122
Number of “other” routes n VID , other 42,435/40,435
Number of intersections crossed by “complete” routes n int , veh 250/250
Table 3. Comparison of the database trees of both T D 1 and T D 2 with T com .
Table 3. Comparison of the database trees of both T D 1 and T D 2 with T com .
Database Tree
T D 1 T D 2
Feature T com Total T com Total
Clusters ( n clusters )168 (97.1%)173168 (97.1%)173
Routes ( n VID , compl )349,313 (99.99%)349,322341,997 (99.99%)342,003
Table 4. Main results of the evaluation of steps 4 and 5 of PROMOTING.
Table 4. Main results of the evaluation of steps 4 and 5 of PROMOTING.
FeatureNameValue
Average ratio of equivalent routes η ˜ c , e 95.82%
Average relative difference between equivalent modes η ˜ m | c , s 0.39%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Flores Fernández, A.; Wurst, J.; Sánchez Morales, E.; Botsch, M.; Facchi, C.; García Higuera, A. Probabilistic Traffic Motion Labeling for Multi-Modal Vehicle Route Prediction. Sensors 2022, 22, 4498. https://doi.org/10.3390/s22124498

AMA Style

Flores Fernández A, Wurst J, Sánchez Morales E, Botsch M, Facchi C, García Higuera A. Probabilistic Traffic Motion Labeling for Multi-Modal Vehicle Route Prediction. Sensors. 2022; 22(12):4498. https://doi.org/10.3390/s22124498

Chicago/Turabian Style

Flores Fernández, Alberto, Jonas Wurst, Eduardo Sánchez Morales, Michael Botsch, Christian Facchi, and Andrés García Higuera. 2022. "Probabilistic Traffic Motion Labeling for Multi-Modal Vehicle Route Prediction" Sensors 22, no. 12: 4498. https://doi.org/10.3390/s22124498

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop