Next Article in Journal
An “Instantaneous” Response of a Human Visual System to Hue: An EEG-Based Study
Previous Article in Journal
Electromagnetic Fields Exposure Assessment in Europe Utilizing Publicly Available Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

Transformer-Based Maneuvering Target Tracking

1
School of Artificial Intelligence, Xidian University, Xi’an 710071, China
2
School of Electronic Confrontation, National University of Defense, Hefei 230037, China
*
Author to whom correspondence should be addressed.
Sensors 2022, 22(21), 8482; https://doi.org/10.3390/s22218482
Submission received: 14 October 2022 / Revised: 30 October 2022 / Accepted: 1 November 2022 / Published: 4 November 2022
(This article belongs to the Section Sensing and Imaging)

Abstract

:
When tracking maneuvering targets, recurrent neural networks (RNNs), especially long short-term memory (LSTM) networks, are widely applied to sequentially capture the motion states of targets from observations. However, LSTMs can only extract features of trajectories stepwise; thus, their modeling of maneuvering motion lacks globality. Meanwhile, trajectory datasets are often generated within a large, but fixed distance range. Therefore, the uncertainty of the initial position of targets increases the complexity of network training, and the fixed distance range reduces the generalization of the network to trajectories outside the dataset. In this study, we propose a transformer-based network (TBN) that consists of an encoder part (transformer layers) and a decoder part (one-dimensional convolutional layers), to track maneuvering targets. Assisted by the attention mechanism of the transformer network, the TBN can capture the long short-term dependencies of target states from a global perspective. Moreover, we propose a center–max normalization to reduce the complexity of TBN training and improve its generalization. The experimental results show that our proposed methods outperform the LSTM-based tracking network.

1. Introduction

With the rapid development of the electronic information industry, target tracking technology has been increasingly used in the military and civilian fields. The target tracking task aims to estimate the state of the target based on data measured by sensors. It can be classified into maneuvering and non-maneuvering target tracking, where “maneuvering” refers to the case in which the target suddenly changes its motion state. For the tracking of maneuvering targets, the interactive multi-model (IMM) algorithm, which uses multiple models to fit complex motion states, is considered [1]. Therefore, many tracking algorithms proposed subsequently were based on the IMM [2,3,4]. However, IMM-based algorithms are associated with the mismatch problem between the set of models and the target motion states. Furthermore, when the motion state of the target changes, a specific number of observations must be accumulated, resulting in the model estimation delay problem [5].
The development of deep neural networks, especially recurrent neural networks (RNNs) with memory ability, provides novel ideas to solve the problems of IMM-based algorithms [6,7,8,9]. The RNN [10] and long short-term memory (LSTM) networks [11] can estimate the state from the observation at each time step [6,12]. Nevertheless, the LSTM and RNN can only process the input sequence sequentially, resulting in long-distance memory fading problems [9]. Thus, the LSTM and RNN may reduce the correlation between trajectory points at different locations, which subjectively influences the modeling of maneuvering states. In addition, trajectory datasets are usually collected in a fixed-range coordinate system and preprocessed with min–max normalization [8,13,14]. However, the same maneuvering state in the dataset may correspond to trajectories with different initial positions, which increases the complexity of network learning. Moreover, the fixed distance range reduces the generalization of the network.
In this study, to accurately model and estimate the states of maneuvering targets, we propose a transformer-based network (TBN). Specifically, our proposed network applies the transformer network as an encoder to extract global features of the observation sequence. Simultaneously, 1D convolutional networks are applied as a decoder to estimate the state sequence from the features. Compared with the LSTM network, which processes observations sequentially, the TBN associates the observations at all positions and applies an attention mechanism to model their dependencies [15]. Thus, the features of the observations can be represented independently without regard to their position in the sequence [16,17,18]. Therefore, the TBN has better feature representation and global memory ability than LSTM [18]. Moreover, a learnable positional embedding is added to the input of the TBN to explore the temporal features of the observation sequence. Finally, a novel center–max normalization is applied by the TBN to improve generalization. Compared with the min–max normalization, our proposed center–max normalization transforms the trajectories from a fixed to a relative coordinate system with the initial observation point as the origin. The experimental results demonstrate that center–max normalization considerably increases the generalization of the TBN to trajectories with different distance ranges. Furthermore, center–max normalization also promotes the tracking performance of the TBN by reducing the complexity of trajectory learning.

2. Problem Formulation

Based on the previous research on maneuvering target tracking [4,7,8,19], we mainly considered point targets tracked by radar in the X-Y plane. Meanwhile, the problem of target birth and death was not considered in this study. Therefore, we assumed that z k is the observation vector and x k is the state vector at the kth time step. Specifically, x k = c x , k , c y , k , v x , k , v y , k denotes the coordinates and corresponding velocities in the two-dimensional scene, and z k = θ k , d k denotes the azimuth and distance of the radar observation.
We intend to build a maneuvering target tracking model based on a deep neural network. The input to the model is the observation sequence z 1 : K = z 1 , z 2 , , z K , and the output is the estimated state sequence x ^ 1 : K = x ^ 1 , x ^ 2 , , x ^ K , where K is the total number of time steps. Given that target tracking is a regression problem, we used the root-mean-squared error (RMSE) between the normalized ground-truth sequence x 1 K * = x 1 * , x 2 * , , x K * and the estimated sequence x ^ 1 : K * = x ^ 1 * , x ^ 2 * , , x ^ K * as the loss function [9] to evaluate the model:
L o s s = 1 K k = 1 K x ^ k * x k * 2 .
In practice, obtaining a sufficient number of trajectories is difficult. Thus, we simulated segmented trajectories based on the state-space model (SSM) [20].
The SSM defines the state transition equation and observation equation as:
x k = F x k 1 + n k z k = h x k + u k
where F is the transition matrix and n k is the transition noise. h is the nonlinear observation, and u k is the observed noise.
In this study, two motion states were considered: constant velocity (CV) and constant turn (CT), as mentioned in [8]. The transition matrix of CV and CT is defined as:
F C V = 1 0 τ 0 0 1 0 τ 0 0 1 0 0 0 0 1
F C T = 1 0 sin w τ w cos w τ 1 w 0 1 1 cos w τ w sin w τ w 0 0 cos w τ sin w τ 0 0 sin w τ cos w τ
where w is the turn rate of the maneuvering target and τ is the sampling interval of the observations. According to [21], the transition noise n k = [ n c , k , n c , k , n v , k , n v , k ] is calculated from:
n c , k n c , k n v , k n v , k = τ 2 2 0 τ 2 2 0 0 τ 0 τ · α k α k
where α k N 0 , σ a 2 is the Gaussian noise caused by the maneuvering acceleration with zero mean and standard deviation σ a .
For radar tracking, Z k is defined as:
θ k d k = arc tan c y , k c x , k c x , k 2 + c y , k 2 h x k + u θ , k u d , k u θ , k N 0 , σ θ 2 , u d , k N 0 , σ d 2
where σ θ is the standard deviation of the azimuth and σ d is the standard deviation of the distance.

3. Proposed Model

In this section, we discuss the components of the TBN in detail. In Section 3.1, we introduce a trajectory normalization method named center–max normalization to improve generalization. In Section 3.2, the structure of the TBN is presented. In Section 3.3, we summarize the overall process of applying the TBN for maneuvering target tracking.

3.1. Center–Max Normalization

A trajectory of a maneuvering target is exhibited in Figure 1. The left of Figure 1 shows an observation sequence, which contains the distance and azimuth. To eliminate the dimensional difference between the observations, z 1 K in the polar coordinates are converted to z ˜ 1 K in the X-Y plane coordinates:
z ˜ x , k z ˜ y , k z ˜ k = d k cos θ k d k sin θ k .
Figure 1c shows a trajectory in the X-Y plane coordinates. The distance range and initial position of the targets may vary extensively; thus, we propose a center–max normalization mechanism to improve the generalization of the model and reduce the training complexity, as shown in Figure 2. This can be formulated as follows:
z ˜ k * = z ˜ k z ˜ 1 D max , k = 1 , , K
where z ˜ k * is the normalized observation at the kth time step, z ˜ 1 is the initial value of z ˜ 1 : K , and D max denotes the maximum distance that the targets can move within K time steps. In Equation (8), the observation sequence is normalized to [ 1 , 1 ] by dividing by D max . Subtracting z ˜ 1 , the observation sequence z ˜ 1 : K is represented in a relative coordinate system with z ˜ 1 as the origin. Benefiting from center–max normalization, the TBN only needs to focus on learning different maneuvers of the target without considering the influence of the initial position. Therefore, the tracking of maneuvering targets by the TBN is not limited by the detection range. Correspondingly, the ground-truth state sequence x 1 : K is normalized as follows:
x 1 : K * = x 1 : K c x X max c x = [ z ˜ x , 1 , z ˜ y , 1 , 0 , 0 ] X max = [ D max , D max , V max , V max ]
where x 1 : K * is the normalized state sequence, c x is the centering vector corresponding to x 1 : K * , z ˜ x , 1 , z ˜ y , 1 is the position component of z ˜ 1 , and V max is the maximum speed of the simulation targets.

3.2. Proposed Network

In sequence modeling tasks, the LSTM network sequentially extracts features. However, the transformer network uses the self-attention mechanism to process input data in parallel, which can capture both local and global dependencies. Therefore, we innovatively introduced it to the target tracking task to comprehensively capture the internal law of target maneuvering. Our proposed TBN consists of positional encoding, N-stacked transformer encoder layers, and one convolutional decoder layer. Each transformer encoder layer contains multi-head self-attention, a feedforward fully connected network, and two residual connections after each of the previous blocks. For intuitive understanding, the entire architecture of the TBN is shown in Figure 3.

3.2.1. Positional Encoding

In natural language processing tasks, the transformer network adds positional encoding to the input tokens to represent their relative or absolute positions in the sequence [15]. However, in this study, the input to the TBN is numeric. Therefore, the learnable positional encoding mentioned [22] is added to the input of the network as follows:
s 1 : K * i = w i z ˜ 1 : K * + φ i , if i = 0 F w i z ˜ 1 : K * + φ i , if 1 i E
where F is the sine function, w i and φ i are learnable parameters that map z ˜ k * to an E-dimensional representation space, and s 1 : K * i is the encoding result of the ith subspace.

3.2.2. Multi-Head Self-Attention

Self-attention is the core of the TBN. First, the input encoding sequence s 1 : K * R E × K of self-attention is linearly mapped into the sequences “query” (Q), “keys” (K), and “values” (V) as follows:
Q = W Q · s 1 : K K = W K · s 1 : K V = W V · s 1 : K
where W Q , W K , and W V R E × E are learnable matrices.
Furthermore, Q, K, and V are split into M subsequences along dimension E, and M attention heads are obtained by the interaction of the elements at any two positions in each subsequence:
h e a d m = soft max Q m T K m d M V m , m = 1 , , M
where Q m , K m , V m R E m × K and E m = E M .
Finally, M attention heads are concatenated to compose the multi-head self-attention:
s a t t e n t i o n = C o n c a t ( h e a d 1 , , h e a d M ) .
Thus, the network is allowed to capture more information from different representation subspaces at different positions.

3.2.3. Feedforward Layer

After the multi-head self-attention, a feedforward layer consisting of two fully connected layers is used to linearly transform each position of s a t t e n t i o n .
In the decoder part, two 1D convolutional layers are used to output the final trajectory estimation x ^ 1 : K , and the parameters of the network are trained by minimizing Equation (1) using the mini-batch gradient descent.

3.3. Maneuvering Target Tracking Based on the TBN

When the well-trained TBN is applied to track a complete trajectory, all observations are first segmented with window length K = 10 and step size P = 5 . These segmented observation sequences are then normalized sequentially and passed to the TBN to estimate the corresponding state sequence set x ^ 1 : K 1 + r P * , r 0 , , R 1 , where x ^ 1 : K 1 + r P * denotes the normalized state sequence output at time step ( 1 + r P ) and R is the number of sequences. Subsequently, x ^ 1 : K 1 + r P * needs to be denormalized as follows:
x ^ 1 : K 1 + r P = x ^ 1 : K 1 + r P * X max + c x r , r = 0 , , R 1
where c x r is the centering vector of the rth state sequence. In addition, adjacent state sequences x ^ 1 : K 1 + r P and x ^ 1 : K 1 + r + 1 P are merged together. Let x ¯ 1 : 2 K P 1 + r P denote the merge result of two above-mentioned state sequences, whose length is 2 K P . Thus, the overlapped regions of x ¯ 1 : 2 K P 1 + r P are calculated as follows:
x ¯ P : K 1 + r P = 0.5 x ^ P : K 1 + r P + x ^ 1 : K P 1 + ( r + 1 ) P .
Finally, all state sequences in the set x ^ 1 : K 1 + r P , r 0 , , R 1 are merged in turn to obtain complete state estimates. Figure 4 illustrates the overall tracking process, including the observations segmentation, center–max normalization processing, network estimation, denormalization processing, and segmented state sequences concatenation.

4. Experiments and Results

In this section, we list the parameters of the trajectory dataset and the TBN. Several experiments were designed to test the tracking performance of our proposed model.

4.1. Implementation Details

Dataset: We generated 300,000 trajectories based on the SSM as a dataset. The parameters of the trajectory dataset are listed in Table 1. In addition, we assumed normalization parameters: D max = 3 km, V max = 300 m/s, and targets were observed every 1 s.
Hyper–parameters: Our network consists of four encoder layers, with eight attention heads. The dimension of E was 512. The output dimensions of the 1D convolutional layer in the decoder were 64 and 4, respectively. The model was trained using the Adam optimizer [23] with β 1 = 0.9 , β 2 = 0.98 , and ε = 10 9 . The learning rate was linear warmed-up for the first 10 epochs and decayed subsequently based on the dynamic adjustment strategy mentioned in [15]. We trained 300 epochs with a batch size of 64 on a single NVIDIA TITAN Xp GPU.
Baseline: We compared the TBN+center–max normalization (TBN+CM) model with the IMM algorithm [19] and the LSTM+min–max normalization (LSTM+MM) tracking model [8]. As a comparison, we also built the LSTM+center–max normalization (LSTM+CM) model. The LSTM network consisted of four hidden layers with a dimension of 128, as mentioned in [8]. The same dataset was used to train the above networks.

4.2. Results

We first compared the performances of the LSTM+MM, LSTM+CM, and TBN+CM based on a test set containing 20,000 segmented trajectories. The tracking results are listed in Table 2. In Table 2, the position and velocity RMSEs of the LSTM+CM are smaller than that of the LSTM+MM, which proves that our proposed center–max normalization improved the tracking capability of the network by reducing the complexity of trajectory learning. At the same time, the TBN+CM achieved the smallest position and velocity RMSE. Thus, it can be concluded that the TBN yields better performance than LSTM when tracking segmented trajectories.
We then simulated a target with the initial states of [2 km, 2 km, 50 m/s, 0 m/s] and steering rates equal to 0 and conducted Monte Carlo simulations to generate a 60-step trajectory named A1. The target maneuvers had turn rates equal to 1 and 3 at the 10th step and the 40th steps, respectively. In addition, the standard deviations of acceleration, azimuth, and distance noise were set to 5 m/s 2 , 0.2 , and 5 m. We evaluated the TBN+CM, LSTM+MM, LSTM+CM, and IMM algorithms on trajectory A1. The tracking results are listed in Table 3 and Figure 5.
Among the listed figures, Figure 5a shows how well the algorithms tracked the target. Figure 5b,c show the pointwise RMSE of trajectory A1. Furthermore, the average RMSEs of trajectory A1 are listed in Table 3. In Table 3, the RMSEs of the LSTM+CM are smaller than those of the LSTM+MM, which proves that our proposed center–max normalization improved the tracking capability of the network by reducing the complexity of trajectory learning. At the same time, the bolded results in Table 3 indicate that TBN+CM had the smallest tracking error. The experiments above demonstrated the superiority of the TBN+CM in tracking maneuvering targets.
In addition, the initial position of trajectory A1 was moved to [12 km, 12 km] and [15 km, 15 km] to obtain trajectories A2 and A3. We conducted generalization experiments on trajectories A2 and A3, as listed in Table 4. The bolded results in Table 4 demonstrated that our proposed TBN+CM can generalize to tracking trajectories beyond the preset distance. However, the LSTM+MM led to tracking failure due to its fixed normalization mechanism.

5. Conclusions

In this study, we employed the attention mechanism of the transformer network to extract a comprehensive tracking of trajectories and finally developed a novel network named the TBN for radar target tracking missions. Furthermore, our proposed center–max normalization improved the generalization of the network by processing observations in a relative coordinate system. It can be seen from the experimental results that, when tracking maneuvering targets, our proposed TBN model obtained lower RMSEs of position and velocity than the LSTM model, and the TBN model can still work normally when the observation sequence is missing; however, the LSTM model will not be available. Therefore, our algorithm outperformed existing LSTM-based tracking networks and traditional algorithms.

Author Contributions

Investigation, G.Z., Z.W., Y.H. and H.Z.; methodology, G.Z., Z.W. and Y.H.; validation, Z.W. and Y.H.; visualization, Z.W.; writing—review and editing, Z.W., Y.H. and H.Z.; writing—original draft preparation, Z.W.; supervision, X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Blom, H.A.; Bar-Shalom, Y. The interacting multiple model algorithm for systems with Markovian switching coefficients. IEEE Trans. Autom. Control. 1988, 33, 780–783. [Google Scholar] [CrossRef]
  2. Pulford, G.W.; La Scala, B.F. MAP estimation of target manoeuvre sequence with the expectation-maximization algorithm. IEEE Trans. Aerosp. Electron. Syst. 2002, 38, 367–377. [Google Scholar] [CrossRef]
  3. Chen, H.; Chang, K. Novel nonlinear filtering & prediction method for maneuvering target tracking. IEEE Trans. Aerosp. Electron. Syst. 2009, 45, 237–249. [Google Scholar]
  4. Ning, X.H.; Hui, X. Algorithm of maneuvering target tracking for video based on UKF and IMM. In Proceedings of the IEEE Conference Anthology, China, 1–8 January 2013; pp. 1–4. [Google Scholar]
  5. Li, B.; Pang, F.; Liang, C.; Chen, X.; Liu, Y. Improved interactive multiple model filter for maneuvering target tracking. In Proceedings of the Proceedings of the 33rd IEEE Chinese Control Conference, Nanjing, China, 28–30 July 2014; pp. 7312–7316. [Google Scholar]
  6. Gao, C.; Yan, J.; Zhou, S.; Varshney, P.K.; Liu, H. Long short-term memory-based deep recurrent neural networks for target tracking. Inf. Sci. 2019, 502, 279–296. [Google Scholar] [CrossRef]
  7. Liu, J.; Wang, Z.; Xu, M. DeepMTT: A deep learning maneuvering target-tracking algorithm based on bidirectional LSTM network. Inf. Fusion 2020, 53, 289–304. [Google Scholar] [CrossRef]
  8. Yu, W.; Yu, H.; Du, J.; Zhang, M.; Liu, J. DeepGTT: A general trajectory tracking deep learning algorithm based on dynamic law learning. IET Radar Sonar Navig. 2021, 15, 1125–1150. [Google Scholar] [CrossRef]
  9. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
  10. Zimmermann, H.G.; Grothmann, R.; Schafer, A.M.; Tietz, C. Dynamical consistent recurrent neural networks. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–4 August 2005; Volume 3, pp. 1537–1541. [Google Scholar]
  11. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  12. Gao, C.; Liu, H.; Zhou, S.; Su, H.; Chen, B.; Yan, J.; Yin, K. Maneuvering target tracking with recurrent neural networks for radar application. In Proceedings of the 2018 IEEE International Conference on Radar (RADAR), Oklahoma City, OK, USA, 23–27 April 2018; pp. 1–5. [Google Scholar]
  13. Ma, L.; Tian, S. A hybrid CNN-LSTM model for aircraft 4D trajectory prediction. IEEE Access 2020, 8, 134668–134680. [Google Scholar] [CrossRef]
  14. Zhang, Z.; Ni, G.; Xu, Y. Ship trajectory prediction based on LSTM neural network. In Proceedings of the 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 12–14 June 2020; pp. 1356–1364. [Google Scholar]
  15. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 30; MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
  16. Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
  17. Kim, Y.; Denton, C.; Hoang, L.; Rush, A.M. Structured attention networks. arXiv 2017, arXiv:1702.00887. [Google Scholar]
  18. Shi, H.; Gao, S.; Tian, Y.; Chen, X.; Zhao, J. Learning Bounded Context-Free-Grammar via LSTM and the Transformer: Difference and Explanations. Proc. Aaai Conf. Artif. Intell. 2022, 36, 8267–8276. [Google Scholar] [CrossRef]
  19. Magill, D. Optimal adaptive estimation of sampled stochastic processes. IEEE Trans. Autom. Control. 1965, 10, 434–439. [Google Scholar] [CrossRef]
  20. Li, X.R.; Bar-Shalom, Y. Design of an interacting multiple model algorithm for air traffic control tracking. IEEE Trans. Control. Syst. Technol. 1993, 1, 186–194. [Google Scholar] [CrossRef]
  21. Liu, J.; Wang, Z.; Xu, M. A Kalman estimation based rao-blackwellized particle filtering for radar tracking. IEEE Access 2017, 5, 8162–8174. [Google Scholar] [CrossRef]
  22. Kazemi, S.M.; Goel, R.; Eghbali, S.; Ramanan, J.; Sahota, J.; Thakur, S.; Wu, S.; Smyth, C.; Poupart, P.; Brubaker, M. Time2vec: Learning a vector representation of time. arXiv 2019, arXiv:1907.05321. [Google Scholar]
  23. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Figure 1. The observation and ground-truth of a trajectory. The (a) is the azimuth observation sequence of the trajectory. The (b) is the distance observation sequence of the trajectory. The (c) is the observation and ground-truth sequence in the X-Y plane coordinate system.
Figure 1. The observation and ground-truth of a trajectory. The (a) is the azimuth observation sequence of the trajectory. The (b) is the distance observation sequence of the trajectory. The (c) is the observation and ground-truth sequence in the X-Y plane coordinate system.
Sensors 22 08482 g001
Figure 2. Center–max normalization. After center–max normalization, the distance ranges of the trajectories are transformed to [ 1 , 1 ] and the differences in the initial positions of the trajectories are removed.
Figure 2. Center–max normalization. After center–max normalization, the distance ranges of the trajectories are transformed to [ 1 , 1 ] and the differences in the initial positions of the trajectories are removed.
Sensors 22 08482 g002
Figure 3. Architecture of the TBN. Input data are normalized observation sequences z ˜ 1 : K * . z ˜ 1 : K * is first mapped to s 1 : K * , whose dimension is E × K by positional encoding. The encoder consists of N-stacked multi-head self-attention and fully connected feedforward layers, which aim at extracting the features of s 1 : K * . The decoder maps E-dimensional feature vectors to the normalized state sequence x ^ 1 : K * by two 1D convolutional layers.
Figure 3. Architecture of the TBN. Input data are normalized observation sequences z ˜ 1 : K * . z ˜ 1 : K * is first mapped to s 1 : K * , whose dimension is E × K by positional encoding. The encoder consists of N-stacked multi-head self-attention and fully connected feedforward layers, which aim at extracting the features of s 1 : K * . The decoder maps E-dimensional feature vectors to the normalized state sequence x ^ 1 : K * by two 1D convolutional layers.
Sensors 22 08482 g003
Figure 4. Structure of the transformer-based maneuvering target tracking. The observation sequences of targets are firstly segmented into subsequences z 1 : K of length K with step size P. After that, z 1 : K are converted to z ˜ 1 : K * by center–max normalization. Then, the TBN infers the normalized trajectory x ^ 1 : K * from z ˜ 1 : K * . In addition, z ˜ 1 : K * are de-normalized to x ^ 1 : K . Finally, the overlapped region of x ^ 1 : K is averaged and concatenated to obtain the estimation of the complete state sequences.
Figure 4. Structure of the transformer-based maneuvering target tracking. The observation sequences of targets are firstly segmented into subsequences z 1 : K of length K with step size P. After that, z 1 : K are converted to z ˜ 1 : K * by center–max normalization. Then, the TBN infers the normalized trajectory x ^ 1 : K * from z ˜ 1 : K * . In addition, z ˜ 1 : K * are de-normalized to x ^ 1 : K . Finally, the overlapped region of x ^ 1 : K is averaged and concatenated to obtain the estimation of the complete state sequences.
Sensors 22 08482 g004
Figure 5. The result of tracking a maneuvering target by the TBM+CM, LSTM+MM, LSTM+CM, and IMM algorithms. (a) Tracking trajectory in the X-Y plane. (b) Pointwise position RMSE. (c) Pointwise velocity RMSE.
Figure 5. The result of tracking a maneuvering target by the TBM+CM, LSTM+MM, LSTM+CM, and IMM algorithms. (a) Tracking trajectory in the X-Y plane. (b) Pointwise position RMSE. (c) Pointwise velocity RMSE.
Sensors 22 08482 g005
Table 1. Parameters of the trajectory dataset.
Table 1. Parameters of the trajectory dataset.
ParameterValue
Distance range 1 km 10 km
Angle range 0 360
Velocity range 300 m / s 300 m / s
Turn rate ( w ) 10 / s 10 / s
The standard deviation of acceleration noise ( σ a ) 2 m / s 2 8 m / s 2
The standard deviation of azimuth noise ( σ θ ) 0 . 1 0 . 3
The standard deviation of distance noise ( σ d ) 5 m 8 m
Table 2. Numerical results of several methods for tracking segmented trajectories.
Table 2. Numerical results of several methods for tracking segmented trajectories.
RMSE of Position (m)RMSE of Velocity (m/s)
LSTM+MM16.276.75
LSTM+CM14.435.14
TBN+CM13.503.64
Table 3. Numerical results of several methods for tracking trajectory A1.
Table 3. Numerical results of several methods for tracking trajectory A1.
RMSE of Position (m)RMSE of Velocity (m/s)
IMM14.546.35
LSTM+MM11.824.64
LSTM+CM10.303.47
TBN+CM9.332.04
Table 4. Results of tracking trajectories at different initial positions.
Table 4. Results of tracking trajectories at different initial positions.
RMSE of Position (m)RMSE of Velocity (m/s)
TBN+CMLSTM+MMTBN+CMLSTM+MM
A29.94146.142.0856.19
A39.15295.712.0378.92
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhao, G.; Wang, Z.; Huang, Y.; Zhang, H.; Ma, X. Transformer-Based Maneuvering Target Tracking. Sensors 2022, 22, 8482. https://doi.org/10.3390/s22218482

AMA Style

Zhao G, Wang Z, Huang Y, Zhang H, Ma X. Transformer-Based Maneuvering Target Tracking. Sensors. 2022; 22(21):8482. https://doi.org/10.3390/s22218482

Chicago/Turabian Style

Zhao, Guanghui, Zelin Wang, Yixiong Huang, Huirong Zhang, and Xiaojing Ma. 2022. "Transformer-Based Maneuvering Target Tracking" Sensors 22, no. 21: 8482. https://doi.org/10.3390/s22218482

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop