# A Review of Tracking and Trajectory Prediction Methods for Autonomous Driving

^{*}

## Abstract

**:**

## 1. Introduction

- Tracking: identifying traffic participants, i.e., cars, pedestrians, and obstacles from sequences of images, sensor data, or observations. It is assumed that some preprocessing of sensor data and/or input images has already been done;
- Prediction: assessing the future motion of surrounding vehicles in order to navigate through various traffic scenarios. Beside the prediction of the simple physical behavior of the agents based on a set of past observations, an important issue is to take into account their possible interactions.

## 2. Tracking Methods

#### 2.1. Methods Using Neural Networks

#### 2.1.1. Learning Features from Convolutional Layers

#### 2.1.2. High-Level Features, Occlusion Handling, and Feature Fusion

#### 2.1.3. Ensuring Temporal Coherence

#### 2.1.4. LSTM-Based Methods

#### 2.1.5. Miscellaneous Neural Network-Based Methods

#### 2.2. Other Techniques

#### 2.2.1. Traditional Algorithms and Methods Focusing on High-Performance

#### 2.2.2. Methods Based on Graphs and Flow Models

#### 2.3. Discussion

## 3. Trajectory Prediction Methods

- Predicting only a short time into the future. Given the probabilistic nature of trajectory prediction, the farther one predicts into the future, the less certain the results become. Moreover, the probability distribution of the predicted trajectories disperses and thus becomes less useful altogether;
- Relying on physics where possible. Machine learning models, e.g., deep networks, can be used to predict trajectories, but they can suffer from approximation errors. Especially in simple, non-interactive scenarios, e.g., when the vehicles have constant speeds or accelerate/decelerate in a foreseeable manner, using physics-based extrapolations can provide more precise results. Also, each type of vehicle may have its own dynamics, so the identification of the vehicle class before prediction is a necessary initial step;
- Considering whether road users obey traffic rules. The autonomous car may plan as if the other traffic participants observed the imposed traffic rules, e.g., cars stopping at red lights or pedestrians not crossing the street in forbidden areas. However, defensive safety measures must be in place to prevent accidents with the so-called “vulnerable” road users;
- Recognizing particular traffic situations. For example, the behavior of traffic participants caught in a traffic jam differs from their behavior in flowing traffic.

#### 3.1. Problem Description

#### 3.2. Classification of Methods

- 1.
- Model-based approaches. They identify common behaviors of the vehicle, e.g., changing lane, turning left, turning right, determining the maximum turning speed, etc. A model is created for each possible trajectory the vehicle can follow and then probabilities are computed for all these models. One of the simplest approaches to compute the probabilities is the autonomous multiple modal (AMM) algorithm. First, the states of the vehicle at times $t-1$ and t are observed. Then the process model is computed at time $t-1$ resulting in the expected states for time t. Then the likelihood of the expected state with the observed state is compared, and the probability of the model at time t is computed. Finally, the model with the highest probability is selected;
- 2.
- Data-driven approaches. In these approaches a black box model (usually a neural network) is trained using a large quantity of training data. After training, the model is applied to the observed behavior to make the prediction. The training of the model is usually computationally expensive and is made offline. On the other hand, the prediction of the trajectories, once the model is trained, is quite fast and can be made online, i.e., in real-time. Some of these methods also employ unsupervised clustering of trajectories using, e.g., spectral clustering or agglomerative clustering, and define a trajectory pattern for each cluster. In the prediction stage, the partial trajectory of the vehicle is observed, it is compared with the prototype trajectories, and the trajectory most similar to a prototype is predicted.

- 1.
- Physics-based motion models. They apply the laws of physics to estimate the trajectory of a vehicle, by considering inputs such as steering, acceleration, weight, and even the coefficient of friction of the pavement in order to predict outputs such as position, speed, and heading. Challenges are related to noisy sensors and sensitivity to initial conditions. Methods include Kalman filters and Monte Carlo sampling. Advantages. Such models are very often used in the context of safety, as classic fail-safe methods when more sophisticated approaches, such as those using machine learning, are utilized for prediction. They can also be employed in situations that lack intricate interactions between road users. These models do not have to be very simple, as they can include detailed representations of vehicle kinematics, road geometry, etc. Disadvantages. They are usually appropriate for short-term predictions, e.g., less than a second, because they cannot predict maneuvers that aim to accomplish higher level goals, e.g., slowing down to prepare to turn in an intersection or because the vehicle in front is expecting a pedestrian to cross the street;
- 2.
- Maneuver-based motion models. They try to estimate the series of maneuvers that the cars perform on their way, but consider each vehicle to be deciding independently from the other traffic participants. These models attempt to identify such maneuvers as early as possible, and then assume that the maneuvers continue into the near future and estimate the corresponding trajectories. They use either prototype trajectories or maneuver intention estimation. Challenges are related to occlusions and the complexity of intentions. Methods include clustering, hidden Markov models, and reinforcement learning. Advantages. The identified maneuvers serve as a priori information or evidence that conditions future motion. Therefore, they tend to be more reliable than the physics-based ones and their predictions remain relevant for longer periods of time. Disadvantages. Because of the independence assumption, these models cannot handle the ways in which the maneuvers of a car influence the behavior of its neighbors. The interactions between traffic participants can be strong in scenarios with a high density of agents, e.g., intersections, possibly with specific priority rules. By ignoring the inter-agent interactions, these models tend to provide less accurate interpretations of such situations;
- 3.
- Interaction-aware motion models. This is the most general class of models, where the maneuvers of the vehicles are considered to be influenced by those of their neighboring road users. These models use prototype trajectories or dynamic Bayesian networks. Challenges refer to the ability to detect interactions and to a possible combinatorial explosion. Methods include coupled hidden Markov models, dynamically-linked hidden Markov models, and even rule-based systems. Advantages. The inclusion of inter-agent dependencies contributes to a better understanding of the situation. On the one hand, they facilitate longer-term predictions compared to physics-based models. On the other hand, they can be more reliable than maneuver-based models. Disadvantages. Because they often have to compute all possible trajectories, they can be inefficient from the computational point of view. Therefore, they may not be appropriate for real-time use cases.

- 1.
- Learning-based motion prediction: learning from the observation of the past movements of vehicles in order to predict the future motion;
- 2.
- Model-based motion prediction: using motion models;
- 3.
- Motion prediction with a cognitive architecture: trying to reproduce human behavior.

#### 3.3. Methods Using Neural Networks

- 1.
- Diverse sample generation: A conditional variational auto-encoder (CVAE) is used to capture the multi-modal nature of future trajectories. It uses stochastic latent variables which can be sampled to generate multiple possible future hypotheses for a single set of past information. The CVAE is combined with an RNN that encodes the past trajectories, to generate hypotheses using another RNN;
- 2.
- Inverse optimal control (IOC)-based ranking and refinement: After including the context and the interactions, the most likely trajectories are identified using potential future rewards, similar to inverse optimal control (IOC) or inverse reinforcement learning (IRL). The agents maximize long-term values. The authors believe that in this way the generalization capabilities of the model are improved and the model is more reliable for longer-term predictions. Since a reward function is difficult to design for general traffic scenarios, it is learned by means of IOC. The RNN model assigns rewards to each prediction hypothesis and assesses its quality based on the accumulated long-term rewards. In the testing phase, there are multiple iterations in order to obtain more accurate refinements of the future prediction;
- 3.
- Scene context fusion: This module aggregates the agent interactions and the context encoded by a CNN. Then this information is passed to an RNN scoring module which computes the rewards.

_{IRL}[127] uses similar techniques, i.e., attention, GRU, CNN, but it conditions trajectories by means of a policy learned with inverse reinforcement learning (IRL) on a grid that represents the scene.

#### 3.4. Methods Using Stochastic Techniques

#### 3.5. Mixed Methods

#### 3.6. Discussion

## 4. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Chen, L.; Ai, H.; Shang, C.; Zhuang, Z.; Bai, B. Online multi-object tracking with convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 645–649. [Google Scholar] [CrossRef]
- Liu, Q.; Lu, X.; He, Z.; Zhang, C.; Chen, W.S. Deep Convolutional Neural Networks for Thermal Infrared Object Tracking. Knowl.-Based Syst.
**2017**. [Google Scholar] [CrossRef] - Danelljan, M.; Häger, G.; Khan, F.S.; Felsberg, M. Convolutional Features for Correlation Filter Based Visual Tracking. In Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), Santiago, Chile, 7–13 December 2015; pp. 621–629. [Google Scholar] [CrossRef] [Green Version]
- Mozhdehi, R.J.; Reznichenko, Y.; Siddique, A.; Medeiros, H. Deep Convolutional Particle Filter with Adaptive Correlation Maps for Visual Tracking. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 798–802. [Google Scholar] [CrossRef] [Green Version]
- Song, Y.; Ma, C.; Gong, L.; Zhang, J.; Lau, R.W.H.; Yang, M.H. CREST: Convolutional Residual Learning for Visual Tracking. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2574–2583. [Google Scholar]
- Wang, C.; Galoogahi, H.K.; Lin, C.H.; Lucey, S. Deep-LK for Efficient Adaptive Object Tracking. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 627–634. [Google Scholar]
- Du, M.; Ding, Y.; Meng, X.; Wei, H.L.; Zhao, Y. Distractor-Aware Deep Regression for Visual Tracking. Sensors
**2019**, 19, 387. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Zhou, H.; Ummenhofer, B.; Brox, T. DeepTAM: Deep Tracking and Mapping. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 851–868. [Google Scholar]
- Danelljan, M.; Robinson, A.; Khan, F.S.; Felsberg, M. Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking. In Proceedings of the Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 472–488. [Google Scholar] [CrossRef] [Green Version]
- Danelljan, M.; Bhat, G.; Khan, F.S.; Felsberg, M. ECO: Efficient Convolution Operators for Tracking. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 6931–6939. [Google Scholar] [CrossRef] [Green Version]
- Nam, H.; Han, B. Learning Multi-domain Convolutional Neural Networks for Visual Tracking. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4293–4302. [Google Scholar]
- Fan, H.; Ling, H. SANet: Structure-Aware Network for Visual Tracking. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 2217–2224. [Google Scholar]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar] [CrossRef] [Green Version]
- Li, K.; Kong, Y.; Fu, Y. Multi-stream Deep Similarity Learning Networks for Visual Tracking. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17, Melbourne, Australia, 19–25 August 2017; pp. 2166–2172. [Google Scholar]
- Chu, Q.; Ouyang, W.; Li, H.; Wang, X.; Liu, B.; Yu, N. Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4846–4855. [Google Scholar]
- Cui, Z.; Lu, N.; Jing, X.; Shi, X. Fast Dynamic Convolutional Neural Networks for Visual Tracking. In Proceedings of the 10th Asian Conference on Machine Learning, Beijing, China, 14–16 November 2018. [Google Scholar]
- Fabbri, M.; Lanzi, F.; Calderara, S.; Palazzi, A.; Vezzani, R.; Cucchiara, R. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. In Proceedings of the Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, 8–14 September 2018; pp. 450–466. [Google Scholar] [CrossRef] [Green Version]
- Cui, H.; Radosavljevic, V.; Chou, F.; Lin, T.; Nguyen, T.; Huang, T.; Schneider, J.; Djuric, N. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 2090–2096. [Google Scholar] [CrossRef] [Green Version]
- Chu, P.; Fan, H.; Tan, C.C.; Ling, H. Online Multi-Object Tracking With Instance-Aware Tracker and Dynamic Model Refreshment. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 7–11 January 2019; pp. 161–170. [Google Scholar] [CrossRef] [Green Version]
- Shahian Jahromi, B.; Tulabandhula, T.; Cetin, S. Real-Time Hybrid Multi-Sensor Fusion Framework for Perception in Autonomous Vehicles. Sensors
**2019**, 19, 4357. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Bhat, G.; Johnander, J.; Danelljan, M.; Khan, F.S.; Felsberg, M. Unveiling the Power of Deep Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Qi, Y.; Zhang, S.; Qin, L.; Yao, H.; Huang, Q.; Lim, J.; Yang, M. Hedged Deep Tracking. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4303–4311. [Google Scholar] [CrossRef]
- Ristani, E.; Tomasi, C. Features for Multi-target Multi-camera Tracking and Re-identification. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6036–6046. [Google Scholar] [CrossRef] [Green Version]
- Teng, Z.; Xing, J.; Wang, Q.; Lang, C.; Feng, S.; Jin, Y. Robust Object Tracking Based on Temporal and Spatial Deep Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1153–1162. [Google Scholar] [CrossRef]
- Yoon, Y.C.; Boragule, A.; Yoon, K.; Jeon, M. Online Multi-Object Tracking with Historical Appearance Matching and Scene Adaptive Detection Filtering. In Proceedings of the 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 27–30 November 2018; pp. 1–6. [Google Scholar]
- Sun, S.; Akhtar, N.; Song, H.; Mian, A.S.; Shah, M. Deep Affinity Network for Multiple Object Tracking. IEEE Trans. Pattern Anal. Mach. Intell.
**2021**, 43, 104–119. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Li, S.; Ma, B.; Chang, H.; Shan, S.; Chen, X. Continuity-Discrimination Convolutional Neural Network for Visual Object Tracking. In Proceedings of the 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, 23–27 July 2018. [Google Scholar]
- Hahn, M.; Chen, S.; Dehghan, A. Deep Tracking: Visual Tracking Using Deep Convolutional Networks. CoRR
**2015**, arXiv:1512.03993v1 [cs.CV]. [Google Scholar] - Yang, L.; Liu, R.; Zhang, D.; Zhang, L. Deep Location-Specific Tracking. In Proceedings of the 25th ACM International Conference on Multimedia, MM ’17, Mountain View, CA, USA, 23–27 October 2017; ACM: New York, NY, USA, 2017; pp. 1309–1317. [Google Scholar] [CrossRef]
- Feichtenhofer, C.; Pinz, A.; Zisserman, A. Detect to Track and Track to Detect. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3057–3065. [Google Scholar]
- Jiang, X.; Zhen, X.; Zhang, B.; Yang, J.; Cao, X. Deep Collaborative Tracking Networks. In Proceedings of the 29th The British Machine Vision Conference, Newcastle, UK, 3–6 September 2018. [Google Scholar]
- Chen, L.; Lou, J.; Xu, F.; Ren, M. Grid-based multi-object tracking with Siamese CNN based appearance edge and access region mechanism. Multimed. Tools Appl.
**2020**, 79. [Google Scholar] [CrossRef] - Zhang, W.; Du, Y.; Chen, Z.; Deng, J.; Liu, P. Robust adaptive learning with Siamese network architecture for visual tracking. Vis. Comput.
**2020**. [Google Scholar] [CrossRef] - Son, J.; Baek, M.; Cho, M.; Han, B. Multi-object Tracking with Quadruplet Convolutional Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3786–3795. [Google Scholar] [CrossRef]
- Schulter, S.; Vernaza, P.; Choi, W.; Chandraker, M.K. Deep Network Flow for Multi-object Tracking. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2730–2739. [Google Scholar]
- Wang, N.; Zou, Q.; Ma, Q.; Huang, Y.; Luan, D. A light tracker for online multiple pedestrian tracking. J. Real-Time Image Process.
**2021**, 18, 1–17. [Google Scholar] [CrossRef] - Feng, W.; Hu, Z.; Wu, W.; Yan, J.; Ouyang, W. Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification. CoRR
**2019**, arXiv:1901.06129v1 [cs.CV]. [Google Scholar] - Zhu, J.; Yang, H.; Liu, N.; Kim, M.; Zhang, W.; Yang, M.H. Online Multi-Object Tracking with Dual Matching Attention Networks. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Springer International Publishing: Cham, Switzerland, 2018; pp. 379–396. [Google Scholar]
- Pu, S.; Song, Y.; Ma, C.; Zhang, H.; Yang, M.H. Deep Attentive Tracking via Reciprocative Learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Wang, Y.; Zhang, Z.; Zhang, N.; Zeng, D. Attention Modulated Multiple Object Tracking with Motion Enhancement and Dual Correlation. Symmetry
**2021**, 13, 266. [Google Scholar] [CrossRef] - Meng, F.; Wang, X.; Wang, D.; Shao, F.; Fu, L. Spatial–Semantic and Temporal Attention Mechanism-Based Online Multi-Object Tracking. Sensors
**2020**, 20, 1653. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Milan, A.; Rezatofighi, S.H.; Dick, A.; Reid, I.; Schindler, K. Online Multi-Target Tracking using Recurrent Neural Networks. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Sadeghian, A.; Alahi, A.; Savarese, S. Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 300–311. [Google Scholar]
- Kim, B.; Kang, C.M.; Kim, J.; Lee, S.H.; Chung, C.C.; Choi, J.W. Probabilistic vehicle trajectory prediction over occupancy grid map via recurrent neural network. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 399–404. [Google Scholar]
- Deo, N.; Trivedi, M.M. Multi-Modal Trajectory Prediction of Surrounding Vehicles with Maneuver based LSTMs. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 1179–1184. [Google Scholar]
- Ma, C.; Yang, C.; Yang, F.; Zhuang, Y.; Zhang, Z.; Jia, H.; Xie, X. Trajectory Factory: Tracklet Cleaving and Re-Connection by Deep Siamese Bi-GRU for Multiple Object Tracking. In Proceedings of the 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, 23–27 July 2018. [Google Scholar] [CrossRef] [Green Version]
- Kim, C.; Li, F.; Rehg, J.M. Multi-object Tracking with Neural Gating Using Bilinear LSTM. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Ondrúška, P.; Posner, I. Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Dequaire, J.; Ondruska, P.; Rao, D.; Wang, D.Z.; Posner, I. Deep tracking in the wild: End-to-end tracking using recurrent neural networks. Int. J. Robot. Res.
**2018**, 37, 492–512. [Google Scholar] [CrossRef] - Chen, C.; Zhao, P.; Lu, C.; Wang, W.; Markham, A.; Trigoni, N. Deep-Learning-Based Pedestrian Inertial Navigation: Methods, Data Set, and On-Device Inference. IEEE Internet Things J.
**2020**, 7, 4431–4441. [Google Scholar] [CrossRef] [Green Version] - Du, Y.; Yan, Y.; Chen, S.; Hua, Y. Object-adaptive LSTM network for real-time visual tracking with adversarial data augmentation. Neurocomputing
**2020**, 384, 67–83. [Google Scholar] [CrossRef] [Green Version] - Huang, Z.; Hasan, A.; Shin, K.; Li, R.; Driggs-Campbell, K. Long-Term Pedestrian Trajectory Prediction Using Mutable Intention Filter and Warp LSTM. IEEE Robot. Autom. Lett.
**2021**, 6, 542–549. [Google Scholar] [CrossRef] - Quan, R.; Zhu, L.; Wu, Y.; Yang, Y. Holistic LSTM for Pedestrian Trajectory Prediction. IEEE Trans. Image Process.
**2021**, 30, 3229–3239. [Google Scholar] [CrossRef] [PubMed] - Song, X.; Chen, K.; Li, X.; Sun, J.; Hou, B.; Cui, Y.; Zhang, B.; Xiong, G.; Wang, Z. Pedestrian Trajectory Prediction Based on Deep Convolutional LSTM Network. IEEE Trans. Intell. Transp. Syst.
**2020**, 1–18. [Google Scholar] [CrossRef] - Huang, R.; Zhang, W.; Kundu, A.; Pantofaru, C.; Ross, D.A.; Funkhouser, T.; Fathi, A. An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 266–282. [Google Scholar]
- Dan, X. Spatial-Temporal Block and LSTM Network for Pedestrian Trajectories Prediction. arXiv
**2020**, arXiv:2009.10468. [Google Scholar] - Zhou, Y.; Wu, H.; Cheng, H.; Qi, K.; Hu, K.; Kang, C.; Zheng, J. Social graph convolutional LSTM for pedestrian trajectory prediction. IET Intell. Transp. Syst.
**2021**, 15, 396–405. [Google Scholar] [CrossRef] - Fernando, T.; Denman, S.; Sridharan, S.; Fookes, C. Tracking by prediction: A deep generative model for multi-person localisation and tracking. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV 2018), Lake Tahoe, NV, USA, 12–15 March 2018; IEEE: Lake Tahoe, NV, USA, 2018; pp. 1122–1132. [Google Scholar] [CrossRef] [Green Version]
- Ren, L.; Lu, J.; Wang, Z.; Tian, Q.; Zhou, J. Collaborative Deep Reinforcement Learning for Multi-Object Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Li, J.; Yao, L.; Xu, X.; Cheng, B.; Ren, J. Deep reinforcement learning for pedestrian collision avoidance and human-machine cooperative driving. Inf. Sci.
**2020**, 532, 110–124. [Google Scholar] [CrossRef] - Bae, S.; Yoon, K. Robust Online Multi-object Tracking Based on Tracklet Confidence and Online Discriminative Appearance Learning. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1218–1225. [Google Scholar] [CrossRef] [Green Version]
- Receveur, J.B.; Victor, S.; Melchior, P. Autonomous car decision making and trajectory tracking based on genetic algorithms and fractional potential fields. Intell. Serv. Robot.
**2020**, 13. [Google Scholar] [CrossRef] - Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar] [CrossRef] [Green Version]
- Bergmann, P.; Meinhardt, T.; Leal-Taixé, L. Tracking without bells and whistles. arXiv
**2019**, arXiv:1903.05625. [Google Scholar] - Mogelmose, A.; Trivedi, M.M.; Moeslund, T.B. Trajectory analysis and prediction for improved pedestrian safety: Integrated framework and evaluations. In Proceedings of the 2015 IEEE Intelligent Vehicles Symposium (IV), Seoul, Korea, 28 June–1 July 2015; pp. 330–335. [Google Scholar] [CrossRef] [Green Version]
- Xiang, Y.; Alahi, A.; Savarese, S. Learning to Track: Online Multi-object Tracking by Decision Making. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 4705–4713. [Google Scholar] [CrossRef] [Green Version]
- Rangesh, A.; Trivedi, M.M. No Blind Spots: Full-Surround Multi-Object Tracking for Autonomous Vehicles using Cameras and LiDARs. arXiv
**2019**, arXiv:1802.08755. [Google Scholar] [CrossRef] [Green Version] - Deo, N.; Rangesh, A.; Trivedi, M.M. How Would Surround Vehicles Move? A Unified Framework for Maneuver Classification and Motion Prediction. IEEE Trans. Intell. Veh.
**2018**, 3, 129–140. [Google Scholar] [CrossRef] [Green Version] - Maksai, A.; Wang, X.; Fleuret, F.; Fua, P. Non-Markovian Globally Consistent Multi-object Tracking. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2563–2573. [Google Scholar] [CrossRef]
- Milan, A.; Roth, S.; Schindler, K. Continuous Energy Minimization for Multitarget Tracking. IEEE TPAMI
**2014**, 36, 58–72. [Google Scholar] [CrossRef] [PubMed] - Solera, F.; Calderara, S.; Cucchiara, R. Learning to Divide and Conquer for Online Multi-target Tracking. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV ’15, Santiago, Chile, 7–13 December 2015; IEEE Computer Society: Washington, DC, USA, 2015; pp. 4373–4381. [Google Scholar] [CrossRef] [Green Version]
- Dicle, C.; Camps, O.I.; Sznaier, M. The Way They Move: Tracking Multiple Targets with Similar Appearance. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 2304–2311. [Google Scholar] [CrossRef] [Green Version]
- Kim, C.; Li, F.; Ciptadi, A.; Rehg, J.M. Multiple Hypothesis Tracking Revisited. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 4696–4704. [Google Scholar] [CrossRef]
- Sharma, S.; Ansari, J.A.; Murthy, J.K.; Krishna, K.M. Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 3508–3515. [Google Scholar]
- Andersen, H.; Chong, Z.J.; Eng, Y.H.; Pendleton, S.; Ang, M.H. Geometric path tracking algorithm for autonomous driving in pedestrian environment. In Proceedings of the 2016 IEEE International Conference on Advanced Intelligent Mechatronics (AIM), Banff, AB, Canada, 12–15 July 2016; pp. 1669–1674. [Google Scholar] [CrossRef]
- Manjunath, A.; Liu, Y.; Henriques, B.; Engstle, A. Radar Based Object Detection and Tracking for Autonomous Driving. In Proceedings of the 2018 IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM), Munich, Germany, 15–17 April 2018; pp. 1–4. [Google Scholar] [CrossRef]
- Cao, J.; Song, C.; Peng, S.; Song, S.; Zhang, X.; Xiao, F. Trajectory Tracking Control Algorithm for Autonomous Vehicle Considering Cornering Characteristics. IEEE Access
**2020**, 8, 59470–59484. [Google Scholar] [CrossRef] - Kampker, A.; Sefati, M.; Rachman, A.S.A.; Kreisköther, K.; Campoy, P. Towards Multi-Object Detection and Tracking in Urban Scenario under Uncertainties. arXiv
**2018**, arXiv:1801.02686. [Google Scholar] - Bochinski, E.; Eiselein, V.; Sikora, T. High-Speed tracking-by-detection without using image information. In Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017; pp. 1–6. [Google Scholar] [CrossRef] [Green Version]
- Shan, Y.; Zheng, B.; Chen, L.; Chen, L.; Chen, D. A Reinforcement Learning-Based Adaptive Path Tracking Approach for Autonomous Driving. IEEE Trans. Veh. Technol.
**2020**, 69, 10581–10595. [Google Scholar] [CrossRef] - Ristani, E.; Solera, F.; Zou, R.S.; Cucchiara, R.; Tomasi, C. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. arXiv
**2016**, arXiv:1609.01775. [Google Scholar] - Pi, W.; Yang, P.; Duan, D.; Chen, C.; Cheng, X.; Yang, L.; Li, H. Malicious User Detection for Cooperative Mobility Tracking in Autonomous Driving. IEEE Internet Things J.
**2020**, 7, 4922–4936. [Google Scholar] [CrossRef] - Chari, V.; Lacoste-Julien, S.; Laptev, I.; Sivic, J. On pairwise costs for network flow multi-object tracking. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5537–5545. [Google Scholar]
- Berclaz, J.; Fleuret, F.; Turetken, E.; Fua, P. Multiple Object Tracking Using K-Shortest Paths Optimization. IEEE Trans. Pattern Anal. Mach. Intell.
**2011**, 33, 1806–1819. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Pirsiavash, H.; Ramanan, D.; Fowlkes, C.C. Globally-optimal greedy algorithms for tracking a variable number of objects. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 1201–1208. [Google Scholar] [CrossRef] [Green Version]
- Ristani, E.; Tomasi, C. Tracking Multiple People Online and in Real Time. In Proceedings of the 12th Asian Conference on Computer Vision, Singapore, 1–5 November 2014. [Google Scholar]
- Roshan Zamir, A.; Dehghan, A.; Shah, M. GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs. In Computer Vision—ECCV 2012; Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C., Eds.; Springer: Heidelberg/Berlin, Germany, 2012; pp. 343–356. [Google Scholar]
- Naiel, M.A.; Ahmad, M.O.; Swamy, M.; Lim, J.; Yang, M.H. Online multi-object tracking via robust collaborative model and sample selection. Comput. Vis. Image Underst.
**2017**, 154, 94–107. [Google Scholar] [CrossRef] - Wen, L.; Li, W.; Yan, J.; Lei, Z.; Yi, D.; Li, S.Z. Multiple Target Tracking Based on Undirected Hierarchical Relation Hypergraph. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1282–1289. [Google Scholar] [CrossRef]
- Wen, L.; Du, D.; Li, S.; Bian, X.; Lyu, S. Learning Non-Uniform Hypergraph for Multi-Object Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 8981–8988. [Google Scholar] [CrossRef]
- Dolatabadi, M.; Elfring, J.; van de Molengraft, R. Multiple-Joint Pedestrian Tracking Using Periodic Models. Sensors
**2020**, 20, 6917. [Google Scholar] [CrossRef] [PubMed] - Schreier, M.; Willert, V.; Adamy, J. An Integrated Approach to Maneuver-Based Trajectory Prediction and Criticality Assessment in Arbitrary Road Environments. IEEE Trans. Intell. Transp. Syst.
**2016**, 17, 2751–2766. [Google Scholar] [CrossRef] - Houenou, A.; Bonnifait, P.; Cherfaoui, V.; Yao, W. Vehicle trajectory prediction based on motion model and maneuver recognition. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 4363–4369. [Google Scholar] [CrossRef] [Green Version]
- Casas, S.; Luo, W.; Urtasun, R. IntentNet: Learning to Predict Intention from Raw Sensor Data. In Proceedings of the 2nd Annual Conference on Robot Learning, CoRL 2018, Zürich, Switzerland, 29–31 October 2018; pp. 947–956. [Google Scholar]
- Nikhil, N.; Morris, B.T. Convolutional Neural Network for Trajectory Prediction. In Proceedings of the Computer Vision—ECCV 2018 Workshops, Munich, Germany, 8–14 September 2018; pp. 186–196. [Google Scholar] [CrossRef] [Green Version]
- Aptiv.; Audi.; Baidu.; BMW.; Continental.; Daimler.; Fiat.; Chrysler Automobiles.; HERE.; Infineon.; et al. Safety First for Automated Driving. Available online: https://www.daimler.com/documents/innovation/other/safety-first-for-automated-driving.pdf (accessed on 2 July 2019).
- Djuric, N.; Radosavljevic, V.; Cui, H.; Nguyen, T.; Chou, F.; Lin, T.; Schneider, J. Motion Prediction of Traffic Actors for Autonomous Driving using Deep Convolutional Networks. CoRR
**2018**, arXiv:1808.05819v3 [cs.LG]. [Google Scholar] - Ward, E. Models Supporting Trajectory Planning in Autonomous Vehicles. Ph.D. Thesis, KTH Royal Institute of Technology, Stockholm, Sweden, 2018. [Google Scholar]
- Lee, N.; Choi, W.; Vernaza, P.; Choy, C.B.; Torr, P.H.S.; Chandraker, M.K. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2165–2174. [Google Scholar] [CrossRef] [Green Version]
- Singh, A. Prediction in Autonomous Vehicle–All You Need to Know. Available online: https://medium.com/m/global-identity?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fprediction-in-autonomous-vehicle-all-you-need-to-know-d8811795fcdc (accessed on 28 January 2021).
- Lefèvre, S.; Vasquez, D.; Laugier, C. A survey on motion prediction and risk assessment for intelligent vehicles. Robomech J.
**2014**, 2014, 1–14. [Google Scholar] [CrossRef] [Green Version] - Lawitzky, A.; Althoff, D.; Passenberg, C.F.; Tanzmeister, G.; Wollherr, D.; Buss, M. Interactive scene prediction for automotive applications. In Proceedings of the 2013 IEEE Intelligent Vehicles Symposium (IV), Gold Coast, QLD, Australia, 23–26 June 2013; pp. 1028–1033. [Google Scholar] [CrossRef] [Green Version]
- Woo, H.; Sugimoto, M.; Wu, J.; Tamura, Y.; Yamashita, A.; Asama, H. Trajectory Prediction of Surrounding Vehicles Using LSTM Network. In Proceedings of the 2013 IEEE Intelligent Vehicles Symposium (IV), Gold Coast, QLD, Australia, 26–28 June 2013. [Google Scholar]
- Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social LSTM: Human Trajectory Prediction in Crowded Spaces. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 961–971. [Google Scholar] [CrossRef] [Green Version]
- Bojarski, M.; Yeres, P.; Choromanska, A.; Choromanski, K.; Firner, B.; Jackel, L.D.; Muller, U. Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car. CoRR
**2017**, arXiv:1704.07911v1 [cs.CV]. [Google Scholar] - Phan-Minh, T.; Grigore, E.C.; Boulton, F.A.; Beijbom, O.; Wolff, E.M. CoverNet: Multimodal Behavior Prediction using Trajectory Sets. arXiv
**2020**, arXiv:1911.10298. [Google Scholar] - Mangalam, K.; An, Y.; Girase, H.; Malik, J. From Goals, Waypoints and Paths to Long Term Human Trajectory Forecasting. arXiv
**2020**, arXiv:2012.01526. [Google Scholar] - Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- MacQueen, J.B. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1967; pp. 281–297. [Google Scholar]
- Chandra, R.; Bhattacharya, U.; Bera, A.; Manocha, D. TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8475–8484. [Google Scholar] [CrossRef] [Green Version]
- Li, J.; Yang, F.; Tomizuka, M.; Choi, C. EvolveGraph: Multi-Agent Trajectory Prediction with Dynamic Relational Reasoning. arXiv
**2020**, arXiv:2003.13924. [Google Scholar] - Zhao, H.; Gao, J.; Lan, T.; Sun, C.; Sapp, B.; Varadarajan, B.; Shen, Y.; Shen, Y.; Chai, Y.; Schmid, C.; et al. TNT: Target-driveN Trajectory Prediction. arXiv
**2020**, arXiv:2008.08294. [Google Scholar] - Gao, J.; Sun, C.; Zhao, H.; Shen, Y.; Anguelov, D.; Li, C.; Schmid, C. VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation. arXiv
**2020**, arXiv:2005.04259. [Google Scholar] - Rhinehart, N.; McAllister, R.; Kitani, K.; Levine, S. PRECOG: PREdiction Conditioned On Goals in Visual Multi-Agent Settings. In Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
- Bhattacharyya, A.; Hanselmann, M.; Fritz, M.; Schiele, B.; Straehle, C.N. Conditional Flow Variational Autoencoders for Structured Sequence Prediction. arXiv
**2020**, arXiv:1908.09008. [Google Scholar] - Mangalam, K.; Girase, H.; Agarwal, S.; Lee, K.H.; Adeli, E.; Malik, J.; Gaidon, A. It Is Not the Journey But the Destination: Endpoint Conditioned Trajectory Prediction. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 759–776. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Advances in Neural Information Processing Systems; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2014; Volume 27. [Google Scholar]
- Gupta, A.; Johnson, J.; Li, F.F.; Savarese, S.; Alahi, A. Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Lai, W.C.; Xia, Z.X.; Lin, H.S.; Hsu, L.F.; Shuai, H.H.; Jhuo, I.H.; Cheng, W.H. Trajectory Prediction in Heterogeneous Environment via Attended Ecology Embedding. In Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, Seattle, WA, USA, 12–16 October 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 202–210. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef] [Green Version]
- Amirian, J.; Hayet, J.B.; Pettre, J. Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs. In Proceedings of the CVPR Workshops, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Sadeghian, A.; Kosaraju, V.; Sadeghian, A.; Hirose, N.; Rezatofighi, H.; Savarese, S. SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019; pp. 1349–1358. [Google Scholar] [CrossRef] [Green Version]
- Messaoud, K.; Deo, N.; Trivedi, M.M.; Nashashibi, F. Trajectory Prediction for Autonomous Driving based on Multi-Head Attention with Joint Agent-Map Representation. arXiv
**2020**, arXiv:2005.02545. [Google Scholar] - Monti, A.; Bertugli, A.; Calderara, S.; Cucchiara, R. DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting. arXiv
**2020**, arXiv:2005.12661. [Google Scholar] - Liang, J.; Jiang, L.; Murphy, K.; Yu, T.; Hauptmann, A. The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction. arXiv
**2020**, arXiv:1912.06445. [Google Scholar] - Salzmann, T.; Ivanovic, B.; Chakravarty, P.; Pavone, M. Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020. [Google Scholar]
- Deo, N.; Trivedi, M.M. Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans. arXiv
**2020**, arXiv:2001.00735. [Google Scholar] - Zhou, B.; Schwarting, W.; Rus, D.; Alonso-Mora, J. Joint Multi-Policy Behavior Estimation and Receding-Horizon Trajectory Planning for Automated Urban Driving. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 2388–2394. [Google Scholar] [CrossRef] [Green Version]
- Suraj, M.S.; Grimmett, H.; Platinský, L.; Ondrúška, P. Predicting trajectories of vehicles using large-scale motion priors. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 1639–1644. [Google Scholar] [CrossRef]
- Hoermann, S.; Stumper, D.; Dietmayer, K. Probabilistic long-term prediction for autonomous vehicles. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 237–243. [Google Scholar] [CrossRef] [Green Version]
- Xu, Y.; Zhao, T.; Baker, C.; Zhao, Y.; Wu, Y.N. Learning Trajectory Prediction with Continuous Inverse Optimal Control via Langevin Sampling of Energy-Based Models. CoRR
**2019**, arXiv:1904.05453v1 [cs.LG]. [Google Scholar] - Andersson, J. Predicting Vehicle Motion and Driver Intent Using Deep Learning. Master’s Thesis, Chalmers University of Technology, Göteborg, Sweden, 2018. [Google Scholar]
- Silver, D.; van Hasselt, H.; Hessel, M.; Schaul, T.; Guez, A.; Harley, T.; Dulac-Arnold, G.; Reichert, D.; Rabinowitz, N.; Barreto, A.; et al. The Predictron: End-To-End Learning and Planning. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 3191–3199. [Google Scholar]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature
**2016**, 529, 484–489. [Google Scholar] [CrossRef] - Guez, A.; Weber, T.; Antonoglou, I.; Simonyan, K.; Vinyals, O.; Wierstra, D.; Munos, R.; Silver, D. Learning to Search with MCTSnets. CoRR
**2018**, arXiv:1802.04697v2 [cs.AI]. [Google Scholar] - Schwarting, W.; Alonso-Mora, J.; Rus, D. Planning and Decision-Making for Autonomous Vehicles. Annu. Rev. Control Robot. Auton. Syst.
**2018**, 1, 187–210. [Google Scholar] [CrossRef] - Zhou, Y.; Hu, H.; Liu, Y.; Lin, S.W.; Ding, Z. A distributed method to avoid higher-order deadlocks in multi-robot systems. Automatica
**2020**, 112, 108706. [Google Scholar] [CrossRef] - Foumani, M.; Moeini, A.; Haythorpe, M.; Smith-Miles, K. A cross-entropy method for optimising robotic automated storage and retrieval systems. Int. J. Prod. Res.
**2018**, 56, 6450–6472. [Google Scholar] [CrossRef] - Foumani, M.; Gunawan, I.; Smith-Miles, K. Resolution of deadlocks in a robotic cell scheduling problem with post-process inspection system: Avoidance and recovery scenarios. In Proceedings of the 2015 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 6–9 December 2015; pp. 1107–1111. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Lapan, M. Deep Reinforcement Learning Hands-On; Packt Publishing: Birmingham, UK, 2018. [Google Scholar]
- Grondman, I.; Busoniu, L.; Lopes, G.A.D.; Babuska, R. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients. IEEE Trans. Syst. Man Cybern. Part (Appl. Rev.)
**2012**, 42, 1291–1307. [Google Scholar] [CrossRef] [Green Version] - Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature
**2015**, 518, 529–533. [Google Scholar] [CrossRef] [PubMed] - Konda, V.R.; Tsitsiklis, J.N. Actor-Critic Algorithms. SIAM
**2000**, 42, 1008–1014. [Google Scholar] - Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.P.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. arXiv
**2016**, arXiv:1602.01783. [Google Scholar] - Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. CoRR
**2017**, arXiv:1707.06347v2 [cs.LG]. [Google Scholar] - Schulman, J.; Levine, S.; Moritz, P.; Jordan, M.I.; Abbeel, P. Trust Region Policy Optimization. CoRR
**2015**, arXiv:1502.05477v5 [cs.LG]. [Google Scholar] - Weber, T.; Racanière, S.; Reichert, D.P.; Buesing, L.; Guez, A.; Rezende, D.J.; Badia, A.P.; Vinyals, O.; Heess, N.; Li, Y.; et al. Imagination-Augmented Agents for Deep Reinforcement Learning. In Proceedings of the 31st International Conference on Neural Information Processing, Long Beach, CA, USA, 4–9 December 2017; pp. 5694–5705. [Google Scholar] [CrossRef]
- Liu, B.; Ghavamzadeh, M.; Gemp, I.; Liu, J.; Mahadevan, S.; Petrik, M. Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity. J. Artif. Intell. Res.
**2018**, 63, 461–494. [Google Scholar] [CrossRef]

Category | Strenghts/Weaknesses | Overall Approach | Contributions |
---|---|---|---|

Methods based on convolutional neural networks | Strengths: - good at learning spatial, shape and geometric features - reduced computational load compared to regular neural networks - translation invariance Weaknesses: - cannot determine temporal dependencies in sequence data without additional mechanisms | Tracking from features directly learned by simple single-stream convolutional layers | [8,16,18] |

Dual-stream CNNs with data associations performed by additional model components | [6,19,28,37] | ||

Tracking using responses from convolutional features processed through correlation filters | [2,3,4,5,22,23] | ||

Multistream CNNs that determine similarities between multiple ROIs and target templates | [14] | ||

Models that determine appearance descriptors or that generate appearance representations from convolutional features | [1,13,25,27,32] | ||

Convolutional models that account for temporal coherence using a multi-network pipeline | [24] | ||

Tracking from features learned by fusing responses from dual stream convolutional layers (Siamese CNNs) | [26,30,31,38] | ||

Models that generate features from convolutional layers and use attention mechanisms for temporal coherence and matching | [15,39,40,41] | ||

Multi-stream convolutional layers used for detecting pedestrian poses | [17] | ||

Siamese networks combining convolutional features with complementary features from image processing | [33] | ||

Methods based on recurrent neural networks | Strengths: - good for processing data series - good for learning temporal features and dependencies, and for ensuring temporal coherence Weaknesses: - cannot deduce interactions without additional mechanisms - generally more difficult to train | Models based on LSTM cell configurations that directly predict vehicle/obstacle occupancy | [44,49] |

Model for motion prediction based on vehicle maneuvers | [45] | ||

LSTM-based architectures that generate appearance and motion models and learn interaction information over extended sequences | [43,47,51] | ||

Multi-layer GRU-based architecture which splits and reconnects tracklets generated from convolutional features | [46] | ||

Basic RNN that encodes information from multiple frame sequences | [42,48] | ||

LSTM layers that focus on learning and interpreting actor intentions | [52,53] | ||

LSTM-based object detection and tracking adapted for sequences of higher-dimensional data | [55] | ||

LSTM models that encode relationships between actors using graph representations | [56,57] | ||

LSTM model that uses multidimensional internal representations of data sequences | [54] | ||

Methods not relying on neural networks | Strengths: - designing a working model is more straightforward compared to neural networks - most are not training data-dependent Weaknesses: - traditional, classic methods do not model sequence dependencies as effectively as many RNN-based solutions | Models that represent and predict actor relationships using flow-networks and graphs | [83,85,89,90] |

Models relying on geometric representations, kinematics and pose estimations | [74,75,77,91] | ||

Models that ensure detection coherence using adaptive partitioning of the problem space | [71,86] | ||

Methods relying on Markov models and Markov decision processes | [66,67,68] | ||

Methods that build appearance models and/or use appearance similarity metrics | [72,73,87,88] | ||

Methods using a multi-stage tracking pipeline incorporating filtering, segmentation, clustering and/or data association | [76,78] | ||

Methods relying on lightweight filtering and optimization for high-speed high-performance applications | [63,64,80,84] |

Contribution | Year | Datasets | Multi-Modal | Social Context | Methods |
---|---|---|---|---|---|

[93] | 2013 | physics model (CYRA), maneuver recognition | |||

[102] | 2013 | Yes | interaction-based, risk estimation, discrete maneauvers | ||

[92] | 2016 | Yes | Bayesian networks, maneuver-based, risk estimation, Monte Carlo simulation | ||

Social LSTM [104] | 2016 | ETH, UCY | Yes | LSTM | |

DESIRE [99] | 2017 | KITTY, Stanford Drone | Yes | Yes | CVAE, GRU, IRL |

[44] | 2017 | Yes (probabilities of occupancy grid cells) | Yes | LSTM, occupancy grid, softmax | |

PilotNet [105] | 2017 | CNN, outputs steering angles | |||

[130] | 2017 | Yes | particle filter, IDM, driving style estimation, Monte Carlo simulation | ||

Predictron [133] | 2017 | Markov reward process, DNN (fully-connected deep neural network) | |||

[103] | 2018 | I-80 | Yes (only four neighbors) | LSTM | |

[45] | 2018 | NGSIM, I-80 | Yes | Yes | LSTM, maneuver-based |

[97] | 2018 | CNN | |||

[18] | 2018 | Yes | CNN | ||

IntentNet [94] | 2018 | Yes | CNN, intention-based, discrete intention set | ||

[95] | 2018 | ETH, UCY | CNN | ||

[128] | 2018 | Yes | POMDP, particle filter, discrete states and actions | ||

[129] | 2018 | probabilistic sampling | |||

[68] | 2018 | Yes | Gaussian mixture models, HMM, discrete maneauvers | ||

[132] | 2018 | NGSIM | DNN, MDN, LSTM | ||

MCTSNet [135] | 2018 | DNN, vector embeddings, Monte Carlo tree search | |||

Social GAN [118] | 2018 | ETH, UCY | Yes | Yes | GAN, LSTM |

[131] | 2019 | NGSIM | IRL, Langevin sampling, DNN | ||

PRECOG [114] | 2019 | nuScenes | Yes | Yes | GRU, CNN, generative model |

Social Ways [121] | 2019 | ETH, UCY | Yes | Yes | GAN, LSTM, generative model, attention |

SoPhie [122] | 2019 | ETH, UCY, Stanford Drone | Yes | Yes | GAN, attention, CNN, LSTM |

TraPHic [110] | 2019 | NGSIM | Yes | LSTM-CNN hybrid network | |

MHA-JAM [123] | 2020 | nuScenes | Yes | Yes | multi-head attention, LSTM, CNN |

AEE-GAN [119] | 2020 | Waymo, Stanford Drone, ETH, UCY | Yes | Yes | attention, GAN, LSTM |

CF-VAE [115] | 2020 | Stanford Drone | Yes | Yes | CF-VAE |

CoverNet [106] | 2020 | nuScenes | Yes | softmax for discrete trajectory set | |

DAG-NET [124] | 2020 | Stanford Drone, SportVU | Yes | Yes | VAE, RNN, attention, GNN |

EvolveGraph [111] | 2020 | Honda 3D, Stanford Drone, SportVU | Yes | Yes | graphs, GRU, Gaussian mixture |

Multiverse [125] | 2020 | VIRAT/ActEV | Yes | ConvRNN, occupancy grid, graph attention network | |

P2T_{IRL} [127] | 2020 | Stanford Drone | Yes | IRL, attention, GRU, CNN | |

PECNet [116] | 2020 | Stanford Drone, ETH, UCY | Yes | Yes | CVAE (Endpoint VAE), truncation trick |

TNT [112] | 2020 | Argoverse, Interaction, Stanford Drone | Yes | Yes | VectorNet, MLP (classic multilayer percepton) |

Trajectron++ [126] | 2020 | ETH, UCY, nuScenes | Yes | Yes | LSTM, attention, GRU, CVAE, Gaussian mixture model |

Y-Net [107] | 2020 | Stanford Drone, ETH, UCY | Yes | U-Net, k-means |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Leon, F.; Gavrilescu, M.
A Review of Tracking and Trajectory Prediction Methods for Autonomous Driving. *Mathematics* **2021**, *9*, 660.
https://doi.org/10.3390/math9060660

**AMA Style**

Leon F, Gavrilescu M.
A Review of Tracking and Trajectory Prediction Methods for Autonomous Driving. *Mathematics*. 2021; 9(6):660.
https://doi.org/10.3390/math9060660

**Chicago/Turabian Style**

Leon, Florin, and Marius Gavrilescu.
2021. "A Review of Tracking and Trajectory Prediction Methods for Autonomous Driving" *Mathematics* 9, no. 6: 660.
https://doi.org/10.3390/math9060660