# Bidirectional Tracking Method for Construction Workers in Dealing with Identity Errors

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Works

- Feature extraction (appearance model): employing a person re-identification (ReID) [8] network to extract a unidimensional vector from the ROI.

- TBD treats object detection as a separate detector, while feature extraction and data association are considered trackers [8]. TBD offers the advantage of flexibility in replacing modules with better DNNs or association methods. However, the detector and the tracker cannot enhance each other’s performance. If the detector produces missing or false bounding boxes, this will result in the tracker’s failure to track or accurately identify the target.
- JDT integrates the detector and tracker into one unified network that can be trained end-to-end, such as Siamese [22] or Transformer networks [23]. JDT relies exclusively on appearance features, but the training of DNNs often demands better GPUs and takes significant time. For instance, TransTrack [24] demands 16.1 GB of GPU memory for inference, making it incompatible with the NVIDIA Tesla T4 (15 GB) on Google Colaboratory [25]. To provide a simple and training-free tracking method, TBD is the better choice.

## 3. Motion Estimation with KF

#### 3.1. Basic KF Formula

_{t}governs the system dynamics, while Q

_{t}(8 × 8) represents process noise, following a normal distribution.

_{t}, a 4 × 4 matrix). The observation matrix H facilitates dimensional transformation between state and measurement vectors.

_{t}is shown in Equation (3). Since elements in the vector can vary widely, a covariance matrix is used to quantify the variation between elements. P

_{t}(8 × 8) is updated regardless of the presence of a tracking ID, with a “dummy update” [9] occurring during tracking, which can lead to error accumulation and non-convergence.

#### 3.2. KF Divergence Proof

- If the KF divergence → elements in the covariance matrix ${P}_{t}$ is larger → elements in the inverse covariance matrix ${{P}_{t}}^{-1}$ is smaller → Mahalanobis distance of two different persons’ IDs is smaller than the threshold → leading to “accepting false ID” errors

_{t}in the covariance matrix, assuming there is no relevant prior knowledge of noise and initial value at t = 0, let ${r}_{0}={\sigma}^{2},\text{}{p}_{0}=\infty $. For the constant-velocity model, ${f}_{t}={h}_{t}=1$, the following is obtained (10):

#### 3.3. KF Processing Example

## 4. Methods

- Head_tracker for tracking heads (in Section 4.1).
- Body_tracker for tracking bodies (in Section 4.1).
- Intra-frame processing to delete false positives of heads and bodies (in Section 4.2).
- Inter-frame matching to find the pairing relationship between heads and bodies (in Section 4.3).

#### 4.1. Head Tracker and Body Tracker

Algorithm 1: Two Stages of the Matching Algorithm for the Trackers |

Input: M ← number of workers; A ← number of frames;Two detections set: $D=\left\{{d}_{i}^{j}\left|1\le i\le M;\text{}1\le j\le A\right.\right\};\text{\hspace{1em}}{D}_{remain}=\left\{{d}_{i,remain}^{j}\left|1\le i\le {M}_{remain};\text{}1\le j\le A\right.\right\};$ Output: Two tracks set:${T}_{matched}=\left\{{t}_{i}^{j}\left|1\le i\le {N}_{matched};\text{}1\le j\le A\right.\right\};\text{\hspace{1em}}{T}_{un\_matched}=\left\{{t}_{i}^{j}\left|1\le i\le {N}_{un\_matched};\text{}1\le j\le A\right.\right\};$ 1: for frame j in A do:2: ${D}^{j}=\left\{{d}_{1}^{j},{d}_{2}^{j},{d}_{3}^{j},\dots {d}_{{M}^{j}}^{j}\right\}$ /*observations from the detector and ReID at j*/ ${x}^{j}=\left\{{x}_{1}^{j},{x}_{2}^{j},{x}_{3}^{j},\dots {x}_{{N}^{j}}^{j}\right\}$ /*posterior state from KF by j − 1 */ 3. /*The First match: assignment to list of matches, unmatched_tracks, unmatched_detections:*/ ${C}^{j}\leftarrow {C}_{cosine}\left({D}^{j},{x}^{j}\right)+{C}_{Euclidean}\left({D}^{j},{x}^{j}\right)$ /*cost matrix at j */ Use the linear $\mathrm{Hungarian}\text{}\mathrm{algorithm}\text{}\mathrm{to}\text{}\mathrm{solve}{C}^{j}$. if (${c}_{\mathrm{cos}ine}$ > 0.2 and ${c}_{Euclidean}$ > 2 × head_width):${T}_{matched}\leftarrow {t}_{i}^{j}$ $\mathrm{that}\text{}\mathrm{matched}\text{}\mathrm{with}\text{}{d}_{i}^{j}$ ${T}_{un\_matched}\leftarrow {t}_{i}^{j}$$\text{}\mathrm{that}\text{}\mathrm{is}\text{}\mathrm{not}\text{}\mathrm{matched}\text{}\mathrm{with}\text{}{d}_{i}^{j}$ ${D}_{remain}\leftarrow {d}_{i}^{j}$$\text{}\mathrm{that}\text{}\mathrm{is}\text{}\mathrm{not}\text{}\mathrm{matched}\text{}\mathrm{with}\text{}{t}_{i}^{j}$ 4. /*The second match: for the remaining detecting-ID in ${D}_{remain}$*/ ${C}^{j}\leftarrow {C}_{cosine}\left({{D}_{remain}}^{j},{{x}_{un\_matched}}^{j}\right)+{C}_{1-IoU}\left({{D}_{remain}}^{j},{{x}_{un\_matched}}^{j}\right)$ Use the linear $\mathrm{Hungarian}\text{}\mathrm{algorithm}\text{}\mathrm{to}\text{}\mathrm{solve}{C}^{j}$. if (${c}_{1-IoU}$ > 0.7) or (match_times = 0 and $0.1\le {c}_{\mathrm{cos}ine}\le 0.2$):${T}_{matched}\leftarrow {t}_{i}^{j}$ $\mathrm{that}\text{}\mathrm{matched}\text{}\mathrm{with}\text{}{d}_{i,remain}^{j}$ ${T}_{un\_matched}\leftarrow {t}_{i}^{j}$$\text{}\mathrm{that}\text{}\mathrm{is}\text{}\mathrm{not}\text{}\mathrm{matched}\text{}\mathrm{with}\text{}{d}_{i,remain}^{j}$ ${D}_{remain}\leftarrow {d}_{i,remain}^{j}$$\text{}\mathrm{that}\text{}\mathrm{is}\text{}\mathrm{not}\text{}\mathrm{matched}\text{}\mathrm{with}\text{}{t}_{i}^{j}$ |

#### 4.2. Intra-Frame Processing

Algorithm 2: Filter out False Positives of Detected Heads |

Input: M ← number of detected heads; N ← number of remain heads; j = frame;${D}_{head}=\left\{{d}_{i}^{j}\left|1\le i\le M;\text{}1\le j\le A\right.\right\};\text{\hspace{1em}}$${d}_{i}^{j}={\left[{x}_{c},{y}_{c},w,h,confidence,class\_id=1\right]}_{i=1}^{M}$ Output: ${D}_{head}^{del}=\left\{{d}_{i}^{j}\left|1\le i\le N;\text{}1\le j\le A\right.\right\};\text{\hspace{1em}}$${D}_{head}^{remain}=\left\{{d}_{i}^{j}\left|1\le i\le N;\text{}1\le j\le A\right.\right\};$1: ${D}_{head}\leftarrow sort\left\{{d}_{i}^{j},{y}_{c}\right\};$ /*sorted by head center y _{c}*/2: while i + 1 < M − 1 do:${\Delta}_{i}\leftarrow {d}_{i}^{j}-{d}_{i+1}^{j};$ if ${\Delta}_{i}$ < −5: /*two heads are very close, maybe FP in here*/if ${d}_{i}^{j}\left[4\right]$ ≥ 99%: /* confidence ≥ 0.99*/${D}_{head}^{del}\leftarrow {d}_{i}^{j}$ /*delete i*/ elif ${d}_{i+1}^{j}\left[4\right]\ge {d}_{i}^{j}\left[4\right]$:${D}_{head}^{del}\leftarrow {d}_{i+1}^{j}$ /*delete i + 1*/ else:${D}_{head}^{del}\leftarrow {d}_{i}^{j}\text{}and\text{}{d}_{i+1}^{j}$ /*delete i and i + 1*/ i += 1 3: ${D}_{head}^{remain}\leftarrow {D}_{head}-{D}_{head}^{del}$ |

Algorithm 3: Filter out False Positives of Detected Bodies |

Input: M ← number of detected bodies; N ← number of remain bodies; j = frame;${D}_{body}=\left\{{d}_{i}^{j}\left|1\le i\le M;\text{}1\le j\le A\right.\right\};\text{\hspace{1em}}$${d}_{i}^{j}={\left[{x}_{c},{y}_{c},w,h,confidence,class\_id=0\right]}_{i=1}^{M}$ Output: ${D}_{body}^{del}=\left\{{d}_{i}^{j}\left|1\le i\le N;\text{}1\le j\le A\right.\right\};\text{\hspace{1em}}$${D}_{body}^{remain}=\left\{{d}_{i}^{j}\left|1\le i\le N;\text{}1\le j\le A\right.\right\};$1: for i, k in M do:$Io{U}_{i}\leftarrow Compute\_IoU\left({d}_{i}^{j},{d}_{i}^{j+1}\right);$ if $Io{U}_{i}$ ≥ 60%: /*two bodies are very close, maybe FPs in here*/${D}_{body}^{del}\leftarrow \mathrm{min}\left({d}_{i}^{j}\left[4\right],\text{}{d}_{i}^{j+1}\left[4\right]\right)$ /*lower confidence deleted*/ i += 1 k += 1 2: for i in length(keypoints) do:/*if the body has no more than two effective keypoints in total*/ if torch.sum (one_key[:, −1] ≥ 0.05) < 2:${D}_{body}^{del}\leftarrow {d}_{i}^{j}$ /*lower confidence deleted*/ i += 1 3: ${D}_{body}^{remain}\leftarrow {D}_{body}-{D}_{body}^{del}$ |

#### 4.3. Inter-Frame Matching

Algorithm 4: Matching of “Head to Body” across Frames |

Input:M _{T} ← number of tracked heads; N_{T} ← number of tracked bodies; j = frame;${T}_{body}=\left\{{t}_{i,body}^{j}\left|1\le i\le {N}_{T};\text{}1\le j\le A\right.\right\};\text{\hspace{1em}}$${t}_{i,body}^{j}={\left[{x}_{c},{y}_{c},w,h,confidence,class\_id=0\right]}_{i=1}^{{N}_{T}}$ ${T}_{head}=\left\{{t}_{i,head}^{j}\left|1\le i\le {M}_{T};\text{}1\le j\le A\right.\right\};\text{\hspace{1em}}$${t}_{i,head}^{j}={\left[{x}_{c},{y}_{c},w,h,confidence,class\_id=1\right]}_{i=1}^{{M}_{T}}$ ${L}_{body,head}=\left\{\left({t}_{i,body}^{j},{t}_{a,head}^{j}\right)\left|1\le i\le {N}_{T};\text{}1\le j\le A;1\le a\le {N}_{T}\right.\right\};$ Output: ${L}_{body,head}$1: if $\exists I{D}_{head}\notin {T}_{head}$:for k in ${L}_{body,head}$ do:/*find the closest head, and calculate the Euclidean distance of the center*/if Euclidean $\left|I{D}_{head}-{t}_{a,head}^{j}\right|$ > 3×w and $I{D}_{head}\left[4\right]$ > 0.95:${T}_{head}\leftarrow I{D}_{head}$ 2: if $\exists I{D}_{body}\notin {T}_{body}$:for k in ${L}_{body,head}$ do:/*find the closest body, and calculate the IoU*/if IoU $\left(I{D}_{body},\text{}{t}_{a,head}^{j}\right)$ > 0.8 and $I{D}_{a,head}^{j}\ne None$:${T}_{body}\leftarrow I{D}_{body}$ 3: if ${T}_{head}\leftarrow I{D}_{head}$$\text{}\mathrm{or}\text{}{T}_{body}\leftarrow I{D}_{body}$: /*matching of new added head–body pairs*/$\forall {C}^{j}\in \mathrm{cos}\mathrm{t}\text{\_}\mathrm{matrix},\text{\hspace{1em}}{C}^{j}\leftarrow \frac{1-\mathrm{modified}\text{\_}\mathrm{IoU}}{\mathrm{confidence}}$$\text{}\mathrm{when}\text{}{C}^{j}\le 1.0$; ${C}^{j}\leftarrow 100000$$\text{\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}}\mathrm{when}\text{}{C}^{j}1.0$; Use the linear $\mathrm{Hungarian}\text{}\mathrm{algorithm}\text{}\mathrm{to}\text{}\mathrm{solve}{C}^{j}$ # row_indices, col_indices = linear_assignment(cost_matrix) ${L}_{body,head}\leftarrow \left(I{D}_{body},\text{\hspace{1em}}I{D}_{head}\right)$ |

#### 4.4. Evaluation Metrics

## 5. Results and Discussion

#### 5.1. Quantitative Results

#### 5.2. Comparision of Other SOTA Methods

#### 5.3. Discussion

- (1)
- Low dependency of detection DNN

- (2)
- Application of head tracking aid in body tracking

- (3)
- Avoidance of the impact of KF divergence issues

- (4)
- Focus on metrics performance with ID errors

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Appendix A

## Appendix B

- detector calculation → Intra-frame processing → head_tracker → body_tracker → Inter-frame matching.

No. | Calculation Descriptions | ||||||
---|---|---|---|---|---|---|---|

Input frame #1 | |||||||

1 | - The output of Detector (head_integrated Keypoint R-CNN) as:
| ||||||

1102.4229 | 275.0947 | 1219.5381 | 563.0864 | 0.9966 | 0 | #body ID = 0 | |

733.7047 | 1.0617 | 789.8318 | 97.3516 | 0.9864 | 0 | #body ID = 1 | |

1312.4270 | 197.8513 | 1434.9780 | 470.1823 | 0.9817 | 0 | #body ID = 2 | |

601.3785 | 0.0000 | 654.7200 | 104.9832 | 0.9561 | 0 | #body ID = 3 | |

1148.3512 | 274.9141 | 1189.9109 | 322.3585 | 0.9955 | 1 | #head ID = 0 | |

1382.5782 | 198.2840 | 1422.7518 | 249.7969 | 0.9886 | 1 | #head ID = 1 ]) | |

Number of Body IDs = 4, Number of Head IDs = 2. | |||||||

2 | - Intra-frame processing: delete the false positives:
Algorithm 3: delta_del_body = [3]. Then the detection = [601.3785, 0.0000, 654.7200, 104.9832, 0.9561, 0] is deleted, and the left bounding box is: boxes_xyxy = [ | ||||||

1102.422852 | 275.094666 | 1219.538086 | 563.086426 | 0.996561 | 0 | ||

733.704712 | 1.061707 | 789.831787 | 97.351639 | 0.986442 | 0 | ||

1312.427002 | 197.851257 | 1434.978027 | 470.182251 | 0.98167 | 0 | ||

1148.351318 | 274.914093 | 1189.910889 | 322.35849 | 0.995536 | 1 | ||

1382.578247 | 198.283997 | 1422.751831 | 249.796875 | 0.988596 | 1] | ||

3 | - Head_tracker in Algorithm 1:
the second match: matches_b, unmatched_tracks_b, unmatched_detections = [] [] [0, 1] matches, unmatched_tracks, unmatched_detections = [] [] [0, 1] The tracking ID in the head sequence set is: 0, 1. | ||||||

4 | - Body_tracker in Algorithm 1:
the second match: matches_b, unmatched_tracks_b, unmatched_detections = [] [] [0, 1, 2] The tracking ID in the body sequence set is: 0, 1, 2. | ||||||

5 | - Inter-frame matching in Algorithm 4:
cost_matrix = [ | ||||||

0.003819 | 100,000. | ||||||

100,000. | 100,000. | ||||||

100,000. | 0.] | ||||||

row_indices = bodys, col_indices = heads: [0 2] [0 1] The matched body-ID and head-ID are self.match_body_head = [(0, 0), (2, 1), (1, None)] | |||||||

Input frame #2 | |||||||

1 | Bounding boxes in (x1, y1, x2, y2, confidence, class_id) = tensor ([ | ||||||

1100.2812 | 274.1615 | 1218.8927 | 563.6195 | 0.9977 | 0 | ||

1317.1239 | 199.0990 | 1435.9489 | 468.3447 | 0.9848, | 0 | ||

733.1741 | 0.5491 | 783.7123 | 96.3040 | 0.9834 | 0 | ||

598.9022 | 0.0000 | 644.2981 | 103.0769 | 0.9701 | 0 | ||

1148.8118 | 275.4363 | 1190.7026 | 322.5742 | 0.9915 | 1 | ||

1383.2902 | 199.0062 | 1423.4681 | 248.5701 | 0.9893 | 1]) | ||

Number of Body IDs = 4, Number of Head IDs = 2. | |||||||

2 | - Intra-frame processing: delete the false positives:
Algorithm 3: delta_del_body = [3]. [598.9022, 0.0000, 644.2981, 103.0769, 0.9701, 0] is deleted. | ||||||

3 | - Head_tracker in Algorithm 1:
the second match: matches_b, unmatched_tracks_b, unmatched_detections = [] [] [] | ||||||

4 | - Body_tracker in Algorithm 1:
the second match: matches_b, unmatched_tracks_b, unmatched_detections= [] [] [] | ||||||

5 | - Inter-frame matching in Algorithm 4:
body-np_xyxy_final = [ | ||||||

1100.495617 | 274.254816 | 1218.957032 | 563.566199 | 0.997711 | 0 | ||

731.006493 | 0.616912 | 786.759376 | 96.442545 | 0.983394 | 1 | ||

1315.511496 | 198.934008 | 1436.811807 | 468.587705 | 0.984758 | 2] | ||

⋮ ⋮ | |||||||

Input the frame #7 | |||||||

1 | Bounding boxes in (x1, y1, x2, y2, confidence, class_id) = tensor ([ | ||||||

1103.0753 | 278.1667 | 1218.2045 | 573.1316 | 0.9963 | |||

1327.7861 | 199.1935 | 1435.1885 | 466.8471 | 0.9849 | |||

576.0401 | 0.9442 | 636.5553 | 104.2498 | 0.9785 | |||

721.5332 | 2.7390 | 797.3731 | 105.7765 | 0.9664 | |||

646.1354 | 0.9752 | 769.4316 | 103.6911 | 0.8158 | |||

1153.6139 | 279.1069 | 1196.0039 | 327.2707 | 0.9968 | |||

1388.5093 | 198.7742 | 1425.3273 | 243.2225 | 0.9838]) | |||

Number of Body IDs = 5, Number of Head IDs = 2. | |||||||

2 | - Intra-frame processing: delete the false positives:
Algorithm 3: delta_del_body = [4]. [646.1354, 0.9752, 769.4316, 103.6911, 0.8158] is deleted. | ||||||

3 | - Head_tracker in Algorithm 1:
the second match: matches_b, unmatched_tracks_b, unmatched_detections = [] [] [] | ||||||

4 | - Body_tracker in Algorithm 1:
the second match: matches_b, unmatched_tracks_b, unmatched_detections = [] [3] [] | ||||||

5 | - Inter-frame matching in Algorithm 4:
head-candidates_tlwh = [ | ||||||

1153.614397 | 278.977509 | 42.391931 | 48.180364 | ||||

1388.411052 | 198.606596 | 37.066294 | 44.685888] | ||||

body_id_list= [0, 2, 1]; head_id_list = [0, 1, None] #match between one newly added body-ID and two existing head-ID cost_matrix = [[100,000. 100,000.]] #100,000 is bigger than the threshold, then body ID = 3 with head-ID = None self.match_body_head = [(0, 0), (2, 1), (1, None), (3, None)] #the newly added track ID’s bounding box is not in the current frame, but will be shown in the next frame. body-np_xyxy_final = [ | |||||||

1102.914528 | 278.100212 | 1218.567574 | 572.995262 | 0.996286 | 0 | ||

570.531019 | 0.777104 | 631.442964 | 104.358195 | 0.978527 | 1 | ||

1322.21706 | 198.937752 | 1440.727778 | 466.620414 | 0.984871 | 2] | ||

Input the frame #8 | |||||||

5 | body-np_xyxy_final = [ | ||||||

1104.565028 | 282.556513 | 1212.796813 | 574.978288 | 0.99675 | 0 | ||

563.08197 | 1.296683 | 623.345229 | 103.507251 | 0.983013 | 1 | ||

1323.317717 | 198.276943 | 1441.398572 | 466.717254 | 0.983189 | 2 | ||

722.836121 | 2.213905 | 787.554428 | 106.905021 | 0.941132 | 3] #new added body-ID = 3 |

## References

- Teizer, J. Status quo and open challenges in vision-based sensing and tracking of temporary resources on infrastructure construction sites. Adv. Eng. Inform.
**2015**, 29, 225–238. [Google Scholar] [CrossRef] - Xiao, B.; Xiao, H.; Wang, J.; Chen, Y. Vision-based method for tracking workers by integrating deep learning instance segmentation in off-site construction. Autom. Constr.
**2022**, 136, 104148. [Google Scholar] [CrossRef] - Golizadeh, H.; Hon, C.K.H.; Drogemuller, R.; Hosseini, M.R. Digital engineering potential in addressing causes of construction accidents. Autom. Constr.
**2018**, 95, 284–295. [Google Scholar] [CrossRef] - Freimuth, H.; Koenig, M. Planning and executing construction inspections with unmanned aerial vehicles. Autom. Constr.
**2018**, 96, 540–553. [Google Scholar] [CrossRef] - Guo, H.; Yu, Y.; Skitmore, M. Visualization technology-based construction safety management: A review. Autom. Constr.
**2017**, 73, 135–144. [Google Scholar] [CrossRef] - Luiten, J.; Os, A.A.; Dendorfer, P.; Torr, P.; Geiger, A.; Leal-Taixé, L.; Leibe, B. HOTA: A Higher Order Metric for Evaluating Multi-object Tracking. Int. J. Comput. Vis.
**2021**, 129, 548–578. [Google Scholar] [CrossRef] [PubMed] - Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. arXiv
**2022**, arXiv:2110.06864. [Google Scholar] [CrossRef] - Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. In Proceedings of the 24th IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar] [CrossRef]
- Cao, J.; Pang, J.; Weng, X.; Khirodkar, R.; Kitani, K. Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. arXiv
**2022**, arXiv:2203.14360. [Google Scholar] [CrossRef] - Liu, Y.; Zhou, Z.; Wang, Y.; Sun, C. Head-Integrated Detecting Method for Workers under Complex Construction Scenarios. Buildings
**2024**, 14, 859. [Google Scholar] [CrossRef] - Dendorfer, P.; Ošep, A.; Milan, A.; Schindler, K.; Cremers, D.; Reid, I.; Roth, S.; Leal-Taixé, L. MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking. arXiv
**2020**, arXiv:2010.07548. [Google Scholar] [CrossRef] - Leal-Taixé, L.; Milan, A.; Reid, I.; Roth, S.; Schindler, K. MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking. arXiv
**2015**, arXiv:1504.01942. [Google Scholar] [CrossRef] - Ciaparrone, G.; Sánchez, F.L.; Tabik, S.; Troiano, L.; Tagliaferri, R.; Herrera, F. Deep Learning in Video Multi-Object Tracking: A Survey. arXiv
**2019**, arXiv:1907.12740. [Google Scholar] [CrossRef] - Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv
**2016**, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed] - Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv
**2015**, arXiv:1506.02640. [Google Scholar] [CrossRef] - Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. arXiv
**2019**, arXiv:1904.08189. [Google Scholar] [CrossRef] - Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. arXiv
**2017**, arXiv:1706.03762. [Google Scholar] [CrossRef] - Girshick, R. Fast R-CNN. arXiv
**2015**, arXiv:1504.08083. [Google Scholar] [CrossRef] - Bewley, A.; Ge, Z.Y.; Ott, L.; Ramov, F.; Upcroft, B. Simple Onlne and Realtime Tracking. In Proceedings of the 23rd IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. arXiv
**2018**, arXiv:1703.06870. [Google Scholar] [CrossRef] - Bashar, M.; Islam, S.; Hussain, K.K.; Hasan, M.B.; Rahman, A.B.M.A.; Kabir, M.H. Multiple Object Tracking in Recent Times: A Literature Review. arXiv
**2022**, arXiv:2209.04796. [Google Scholar] [CrossRef] - Shuai, B.; Berneshawi, A.; Li, X.; Modolo, D.; Tighe, J. SiamMOT: Siamese Multi-Object Tracking. arXiv
**2021**, arXiv:2105.11595. [Google Scholar] [CrossRef] - Meinhardt, T.; Kirillov, A.; Leal-Taixe, L.; Feichtenhofer, C. TrackFormer: Multi-Object Tracking with Transformers. arXiv
**2021**, arXiv:2101.02702. [Google Scholar] [CrossRef] - Sun, P.; Cao, J.; Jiang, Y.; Zhang, R.; Xie, E.; Yuan, Z.; Wang, C.; Luo, P. TransTrack: Multiple Object Tracking with Transformer. arXiv
**2020**, arXiv:2012.15460. [Google Scholar] [CrossRef] - Google. Google Colaboratory. 2023. Available online: https://colab.research.google.com/ (accessed on 29 August 2023).
- Maggiolino, G.; Ahmad, A.; Cao, J.; Kitani, K. Deep OC-SORT: Multi-Pedestrian Tracking by Adaptive Re-Identification. arXiv
**2023**, arXiv:2302.11813. [Google Scholar] [CrossRef] - Yang, M.; Han, G.; Yan, B.; Zhang, W.; Qi, J.; Lu, H.; Wang, D. Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking. arXiv
**2023**, arXiv:2308.00783. [Google Scholar] [CrossRef] - Duan, P.; Zhou, J.; Goh, Y.M. Spatial-temporal analysis of safety risks in trajectories of construction workers based on complex network theory. Adv. Eng. Inform.
**2023**, 5, 101990. [Google Scholar] [CrossRef] - Aharon, N.; Orfaig, R.; Bobrovsky, B.-Z. BoT-SORT: Robust Associations Multi-Pedestrian Tracking. arXiv
**2022**, arXiv:2206.14651. [Google Scholar] [CrossRef] - Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. StrongSORT: Make DeepSORT Great Again. arXiv
**2022**, arXiv:2202.13514. [Google Scholar] [CrossRef] - Wang, Z.; Zhao, H.; Li, Y.L.; Wang, S.; Torr, P.; Bertinetto, L. Do Different Tracking Tasks Require Different Appearance Models? arXiv
**2021**, arXiv:2107.02156. [Google Scholar] [CrossRef] - Zhang, Y.; Wang, C.; Wang, X.; Zeng, W.; Liu, W. FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking. arXiv
**2020**, arXiv:2004.01888. [Google Scholar] [CrossRef] - Chu, P.; Wang, J.; You, Q.; Ling, H.; Liu, Z. TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking. arXiv
**2021**, arXiv:2104.00194. [Google Scholar] [CrossRef] - Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Bu, J.; Tian, Q. Person Re-identification Meets Image Search. arXiv
**2015**, arXiv:1502.02171. [Google Scholar] [CrossRef] - Bernardin, K.; Stiefelhagen, R. Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. Eurasip J. Image Video Process.
**2008**, 2008, 246309. [Google Scholar] [CrossRef] - Ristani, E.; Solera, F.; Zou, R.S.; Cucchiara, R.; Tomasi, C. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. arXiv
**2016**, arXiv:1609.01775. [Google Scholar] [CrossRef] - KubaRurak. Detectron2-Deepsort-Repo. 2021. Available online: https://github.com/KubaRurak/detectron2-deepsort-repo (accessed on 31 August 2023).
- Konstantinou, E.; Lasenby, J.; Brilakis, I. Adaptive computer vision-based 2D tracking of workers in complex environments. Autom. Constr.
**2019**, 103, 168–184. [Google Scholar] [CrossRef] - JonathonLuiten. TrackEval. 2021. Available online: https://github.com/JonathonLuiten/TrackEval (accessed on 31 August 2023).
- Mikel-Brostrom. YOLO_Tracking. 2023. Available online: https://github.com/mikel-brostrom/yolo_tracking#real-time-multi-object-segmentation-and-pose-tracking-using-yolov8--yolo-nas--yolox-with-deepocsort-and-lightmbn (accessed on 31 August 2023).
- pmj110119. YOLOX_Deepsort_Tracker. 2021. Available online: https://github.com/pmj110119/YOLOX_deepsort_tracker (accessed on 31 August 2023).
- Xiao, B.; Kang, S.-C. Vision-Based Method Integrating Deep Learning Detection for Tracking Multiple Construction Machines. J. Comput. Civ. Eng.
**2021**, 35, 04020071. [Google Scholar] [CrossRef] - Xiao, B.; Lin, Q.; Chen, Y. A vision-based method for automatic tracking of construction machines at nighttime based on deep learning illumination enhancement. Autom. Constr.
**2021**, 127, 13. [Google Scholar] [CrossRef] - Drive, G. Deepsort_Parameters. 2023. Available online: https://drive.google.com/drive/folders/1xhG0kRH1EX5B9_Iz8gQJb7UNnn_riXi6 (accessed on 31 August 2023).

No. | SOTA Methods | Year | Information Types | Advantages (A) & Shortcomings (S) |
---|---|---|---|---|

1 | SORT [19] | 2016 | O: Faster R-CNN; A: None; M: KF + IoU + Hungarian. | A: presented as the baseline; KF state is ${x}_{t}={\left[{x}_{c},{y}_{c},s,r,{v}_{x},{v}_{y},{v}_{s}\right]}^{T}$, s = aspect ratio and r = area. S: highly dependent on detection performance and many IDSWs; no occlusion-solving considerations. |

2 | DeepSORT [8] | 2017 | O: Faster R-CNN; A: ReID (128-d); M: KF + Cosine distance + IoU + Hungarian. | A: presented as the baseline integrate appearance information. KF state is ${x}_{t}={\left[{x}_{c},{y}_{c},\gamma ,h,{v}_{x},{v}_{y},{v}_{\gamma},{v}_{h}\right]}^{T}$, γ = aspect ratio and h = height. S: detection performance dependency; occlusion-related IDSWs reduced but still frequent; constant-velocity model. |

3 | ByteTrack [7] | 2022 | O: re-trained YOLOX-x on 1400 videos; A: ReID (1024-d); M: KF + Cosine distance + IoU + Hungarian + low scores re-match. | A: best performance in MOT20 and with already published codes. KF state is ${x}_{t}={\left[{x}_{c},{y}_{c},a,h,{v}_{x},{v}_{y},{v}_{a},{v}_{h}\right]}^{T}$; a = aspect ratio and h = height. S: highly dependent on detection performance and many IDSWs; no occlusion considerations; constant-velocity model. |

4 | OC_SORT [9] | 2022 | O: baseline detections in MOTChallenge; A: None; M: KF + IoU + Hungarian + re-update of KF + motion direction difference. | A: first to explain the KF predict errors accumulation in detail; motion direction difference is added in the association cost matrix. KF state is ${x}_{t}={\left[{x}_{c},{y}_{c},a,s,{v}_{x},{v}_{y},{v}_{a}\right]}^{T}$, a = area and s = aspect ratio. S: no real online method for KF update; needs future frame; constant-velocity assumption during occlusion, cannot remain effective during long-term occlusions; detection performance dependency; constant-velocity model. |

5 | Deep OC_SORT [26] | 2023 | O: YOLOX; A: ReID (SBS50, 287MB) + Camera Motion Compensation + Dynamic Appearance; M: KF + IoU + Hungarian + re-update of KF + motion direction difference. | A: Apply Camera Motion Compensation to correct the KF state for better locations of the bounding box; apply detection confidence to modify ReID output vectors. KF state is ${x}_{t}={\left[{x}_{c},{y}_{c},a,s,{v}_{x},{v}_{y},{v}_{a}\right]}^{T}$; a = area and s = aspect ratio. S: the same as the OC_SORT; constant-velocity model. |

6 | BoTSORT [29] | 2022 | O: Faster R-CNN; A: ReID + Camera Motion Compensation; M: KF + cosine distance + IoU + Hungarian. | A: modify KF state to ${x}_{t}={\left[{x}_{c},{y}_{c},s,a,{v}_{x},{v}_{y},{v}_{s}\right]}^{T}$, s = area and a = aspect ratio; apply Camera Motion Compensation to reduce errors of moving cameras; apply new cost matrix with weights of appearance cost and motion cost. S: the same as the OC_SORT; constant-velocity model; slow when working with sparse optical flow. |

7 | Strong_SORT [30] | 2022 | O: YOLOX-x; A: ReID (BoT) + Camera Motion Compensation; M: NSA-KF + cosine distance + IoU + Hungarian. | A: apply a new cost matrix with weights of appearance cost and motion cost; KF state is ${x}_{t}={\left[{x}_{c},{y}_{c},a,h,{v}_{x},{v}_{y},{v}_{a},{v}_{h}\right]}^{T}$; a = aspect ratio and h = height. S: MOTA is slightly lower, mainly due to the high detection score threshold leading to many missing detections; working speed is not high. |

8 | TransTrack [24] | 2020 | O: re-trained transformer; A: None; M: None. | A: Self-Attention Mechanism and Query-Key pipeline. S: hard to train; no motion information utilization; JDT not better than TBD in performance. |

9 | UniTrack [31] | 2021 | O: ResNet-50; A: ImageNet-supervised appearance model; M: KF + cosine distance + IoU + Hungarian. | A: can support different tracking tasks and leverage many existing general appearance models. KF state is ${x}_{t}={\left[{x}_{c},{y}_{c},a,h,{v}_{x},{v}_{y},{v}_{a},{v}_{h}\right]}^{T}$; a = aspect ratio and h = height. S: not better in terms of metrics performance. |

10 | FairMOT [32] | 2020 | O: encoder–decoder; A: encoder–decoder; M: KF + cosine distance + IoU + Hungarian. | A: one encoder–decoder network to obtain observation and appearance at the same time, with no need for an independent ReID model; S: need training for about 30 h on two RTX 2080 Ti GPUs; still a SORT-related method. |

11 | TransMOT [33] | 2021 | O: spatial–temporal graph Transformer; A: None; M: KF + cosine distance + IoU + Hungarian. | A: A cascaded association structure to handle low confidence detection and long-term occlusion. S: relatively large computing resources and data; no public codes. |

12 | Hybrid-SORT [27] | 2023 | O: YOLOX-x; A: ReID + Camera Motion Compensation; M: KF + cosine distance + IoU + Hungarian + weak cues. | A: apply a new cost matrix with weights of appearance cost, motion cost, four corners’ velocity direction, and height-modulated IoU. KF state is ${x}_{t}={\left[{x}_{c},{y}_{c},s,c,r,{v}_{x},{v}_{y},{v}_{s},{v}_{c}\right]}^{T}$; r = aspect ratio, s = area and c = confidence score. S: detection performance dependency. |

13 | Xiao et al. [2] | 2023 | O: Mask R-CNN; A: ReID (128-d); M: KF + cosine distance + IoU + Hungarian. | A: baseline in worker tracking; S: needs retraining on a new dataset; does not address severe occlusions; no public codes. |

Metrics | Video-1 | Video-2 | Video-3 | Video-4 | Video-5 | Video-6 | Video-7 | Video-8 | Video-9 | Combined | |
---|---|---|---|---|---|---|---|---|---|---|---|

1 | MOTA↑ (%) | 100 | 94.08 | 93.943 | 93.582 | 94.833 | 98.859 | 95.44 | 96.843 | 93.972 | 95.191 |

2 | IDF1↑ (%) | 100 | 97.04 | 96.966 | 96.824 | 97.468 | 99.431 | 97.73 | 98.422 | 97.007 | 97.609 |

3 | HOTA↑ (%) | 93.023 | 77.601 | 81.446 | 75.7 | 74.801 | 76.911 | 78.273 | 81.799 | 77.446 | 78.884 |

4 | AssA↑ (%) | 93.023 | 79.784 | 86.066 | 78.962 | 81.28 | 85.451 | 80.484 | 83.341 | 78.149 | 83.296 |

5 | AssRe↑ (%) | 94.298 | 84.026 | 89.916 | 84.525 | 87.219 | 89.145 | 85.352 | 86.762 | 84.28 | 87.83 |

6 | AssPr↑ (%) | 94.298 | 84.026 | 90.052 | 83.247 | 84.142 | 88.895 | 84.634 | 86.689 | 83.016 | 86.976 |

7 | LocA↑ (%) | 92.568 | 83.922 | 86.915 | 82.642 | 82.643 | 81.798 | 84.326 | 85.815 | 85.614 | 84.725 |

8 | IDSW↓ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

9 | Frag↓ | 0 | 4 | 7 | 21 | 22 | 4 | 6 | 10 | 12 | 86 |

10 | Hz↑ | 9.05 | 6.01 | 5.69 | 5.96 | 7.11 | 6.69 | 6.62 | 5.33 | 6.82 | 6.58 |

11 | MT | 1 | 3 | 9 | 5 | 5 | 4 | 4 | 6 | 4 | 41 |

Metrics | DeepSORT | ByteTrack | Deep OC_SORT | BoTSORT | OC_SORT | Strong_SORT | TransTrack | UniTrack | |
---|---|---|---|---|---|---|---|---|---|

1 | MOTA↑ (%) | 69.914 | 62.515 | 56.429 | 60.699 | 66.002 | 56.698 | 39.345 | 27.441 |

2 | IDF1↑ (%) | 79.771 | 81.829 | 74.471 | 77.365 | 80.633 | 77.065 | 70.459 | 61.887 |

3 | HOTA↑ (%) | 68.418 | 66.915 | 62.425 | 63.208 | 66.596 | 63.893 | 57.607 | 49.201 |

4 | AssA↑ (%) | 74.585 | 77.995 | 71.394 | 73.429 | 76.166 | 74.7 | 72.685 | 63.686 |

5 | AssRe↑ (%) | 78.896 | 82.717 | 75.462 | 76.777 | 80.316 | 78.679 | 76.712 | 68.208 |

6 | AssPr↑ (%) | 84.488 | 83.637 | 83.043 | 85.385 | 86.221 | 84.348 | 82.495 | 74.389 |

7 | LocA↑ (%) | 85.353 | 80.694 | 82.01 | 82.833 | 82.657 | 82.058 | 80.222 | 69.791 |

8 | IDSW↓ | 21 | 5 | 51 | 8 | 10 | 12 | 7 | 15 |

9 | Frag↓ | 101 | 107 | 178 | 180 | 168 | 140 | 161 | 268 |

10 | Hz↑ | 1.77 | 9.09 | 8.33 | 7.14 | 8.47 | 7.69 | 5.13 | 4.30 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Liu, Y.; Wang, Y.; Zhou, Z.
Bidirectional Tracking Method for Construction Workers in Dealing with Identity Errors. *Mathematics* **2024**, *12*, 1245.
https://doi.org/10.3390/math12081245

**AMA Style**

Liu Y, Wang Y, Zhou Z.
Bidirectional Tracking Method for Construction Workers in Dealing with Identity Errors. *Mathematics*. 2024; 12(8):1245.
https://doi.org/10.3390/math12081245

**Chicago/Turabian Style**

Liu, Yongyue, Yaowu Wang, and Zhenzong Zhou.
2024. "Bidirectional Tracking Method for Construction Workers in Dealing with Identity Errors" *Mathematics* 12, no. 8: 1245.
https://doi.org/10.3390/math12081245