Head-Integrated Detecting Method for Workers under Complex Construction Scenarios
Abstract
:1. Introduction
- (1)
- Adapting a head branch to body detecting models in the construction field. Each branch can work with or without the other one and can be replaced easily.
- (2)
- Demonstrating that a body branch designed for pedestrian detection can be directly used for worker detection without the need for re-training on a new dataset specific to the construction scenarios.
- (3)
- Addressing the worker detection challenges of inter/intra class occlusions, low confidence, and unusual behaviors by means of joint coordinates, trajectories, and anthropometric data.
2. Related Works
2.1. Object Detection with DNN
2.2. Worker Detection Uniques
- Different purposes. Pedestrian detection is only for metrics competition [21], while worker detection is for safety management issues. Therefore, worker detection needs to effectively address position recognition under complex conditions, especially for occlusions.
- Dissimilar durations. Pedestrian detection can be a collection of single frames in different scenes, while worker detection is based on continuous photography of certain projects. This provides us with the motion relationship between adjacent frames of the same worker.
- Various densities. Some pedestrian detection datasets exhibit high density (62.1–226.6 persons per frame) [30], whereas worker datasets usually have lower density owing to limited operating space. This provides us with a relatively simple computing environment, avoiding the need to consider complex situations involving more than 100 workers in one frame.
- Fixed perspectives. Worker datasets [11] are often captured from a top-down perspective to avoid hindering labor movements, offering better opportunities for detecting worker heads.
- Behavior variations. Pedestrians mainly engage in walking activities, while workers’ behaviors are more complex, and they may bend, kneel, or squat [11]. This results in poor performance (low confidence scores or target missing) of the universal detection DNN model in non-standing postures.
2.3. Head Related Detection
2.4. Strengths and Limitations of SOTA Models
3. Methods
- (1)
- Download the Keypoint R-CNN [38] and head-detection model from GitHub. If one cannot find a proper head model, a fast-trained head model can be deployed, as described in Section 3.2. Extract the region proposal networks (RPN) and Fast R-CNN parts from the head model and rename them as RPN_head and head_branch.
- (2)
- Add RPN_head and head_branch to the original Keypoint R-CNN model without training. Enable this final DNN model to detect both the head and the entire body simultaneously; the results are saved as bounding boxes.
- (3)
- Feed all heads and bodies’ bounding boxes into the post-processing module, and implement ID association, linear interpolation, and refinement sequentially. The ID association relates to the object tracking method, which will be discussed in subsequent papers.
Algorithm 1: Pseudo-code of the proposed method |
Already follow the first procedure to assemble a head_integrated Keypoint R-CNN model. Already downloaded the weights file of “.pth”. 1: Input: the current frame #t (contains M workers in real from #0 to #t-1); 2: Feed the frame #t into the model and run, let the outputs contain N1 heads and N2 bodies: The frame resolution = 1080 × 1920 × 3; Feature Pyramid Network (FPN) sub-networks: outputs of five 256-channel feature maps; RPN sub-networks: outputs of 257,796 anchors/proposals for heads or bodies; Regions of interest (ROI) sub-networks: N1 heads and N2 bodies bounding boxes, respectively. Suppose N1 ≠ N2: the N1 head-index list = (0, 1, 2, 3, 4, 5, 6, 7, 8), the N2 body-index list = (0, 1, 2, 3, 4, 5). 3: ID association for (N2 bodies, N1 heads), detailed in our other papers: Suppose = ((None, 3), (5, 6), (0, 0), (1, None), (3, 2), (2, 5), (4, 7), (None, 4), (None, 8)) the 3rd, 4th, 8th head do not find matched body; the 1st body do not find matched head. 4: Linear interpolation: for i in length(3rd, 4th, 8th, … heads) do: if the head[i] is in the view: head_bounding box [i, t] = function (head [i, t-1], head [i, t-2]); for j in length(1st, … bodies) do: if the body[j] is in the view: body_bounding box [j, t] = function (body [j, t-1], body [j, t-2]; head [i, t-1], head [i, t-2]); 5: The Refinements networks: Feed the features and proposals of frame #t into the networks of the head or body: |
# features = from P2 to P5, total 83.6 MB. # proposals = Array of np(1000, 4 = t, l, w, h). def forward (features: List[torch.Tensor], proposals: List[np.array]); def predict_boxes (predictions: Tuple[torch.Tensor, torch.Tensor], proposals: List[Boxes]); def predict_probs (predictions: Tuple[torch.Tensor, torch.Tensor], proposals: List[Boxes]); Output the final bounding boxes with the confidence of the head or body. |
3.1. The Size of the Head and the Range of Motion
3.2. Fast-Trained Head Branch
3.3. Head-Integrated Keypoint R-CNN
3.4. Post-Processing
3.4.1. Interpolation of Center Points
- (1)
- Not stationary and not out of view—in this most common state, simple linear interpolation is appropriate.
- (2)
- Stationary and not out of view: This state applies to workers standing still with minimal movement in their head position. If the head motion trajectory displacement is ≤1 pixel and the head bounding box confidence ≥ 0.0001 in successive frames, the head is stationary.
- (3)
- Not stationary and out of view: Workers may exit the frame or be obscured by large objects. All of the values were set to NaN. If head confidence ≤ 0.0001, regardless of head motion displacement, the worker will have left the field of view.
- (1)
- When there is a related head detection, the body can be determined to be stationary or leaving according to the head trajectory.
- (2)
- When there is an absence of head detection, the judgment of stationary must use keypoints to assist with the decision of stationarity, as in the following situations: if the worker is not likely to move smoothly under the rarer action behavior (bending, squatting, lying down, and so on); if the worker is walking toward peripheral edges of the view; or if the body bounding box confidence score ≤0.01.
3.4.2. Refinement of Widths and Heights
- (1)
- In adjacent frames and under polar coordinates, keep the change of modulus and argument of the body-to-head’s center vector (green arrow line in Figure 7) no more than 0.5× head width and 5°. If it cannot be guaranteed, change the position of the center of the body-bounding box based on the center of the head.
- (2)
- Keep the body’s center unchanged and adjust the position of the body’s top-left vertex. Limit the movement of the top-left vertex of the entire body to approximately 0.25× head width.
- (3)
- If there are two joint points of the foot, height can have a significant change, which can change slightly according to the “near-big, far-small” principle and head size.
- (4)
- The top edge of the bounding box can be adjusted using the head data. Because the head cannot be separated from the whole body, we rectified the whole-body top-edge position for changes in the relative position of the center of the head. The displacement of the top edge of the body from the center of the head did not change significantly between adjacent frames.
3.5. Evaluation Metrics
4. Results
4.1. Experimental Details
4.2. Quantitative Results
4.3. Qualitative Results
5. Discussion
5.1. Ground-Truth Annotations Standard
5.2. Comparison with Other Methods
5.3. Effectiveness and Limitations
5.4. Theoretical and Practical Implications
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Occupational Safety and Health Administration. Construction Industry. 2023. Available online: https://www.osha.gov/construction (accessed on 29 August 2023).
- U.S. Bureau of Labor Statistics. Census of Fatal Occupational Injuries Summary. 2021. Available online: https://www.bls.gov/news.release/cfoi.nr0.htm (accessed on 29 August 2023).
- Xu, W.; Wang, T.-K. Dynamic safety prewarning mechanism of human–machine–environment using computer vision. Eng. Constr. Archit. Manag. 2020, 27, 1813–1833. [Google Scholar] [CrossRef]
- Occupational Safety and Health Administration, OSH Act of 1970. Available online: https://www.osha.gov/laws-regs/oshact/toc (accessed on 29 August 2023).
- Mokhtari, F.; Cheng, Z.; Wang, C.H.; Foroughi, J. Advances in Wearable Piezoelectric Sensors for Hazardous Workplace Environments. Glob. Chall. 2023, 7, 2300019. [Google Scholar] [CrossRef] [PubMed]
- Duan, P.; Zhou, J.; Tao, S. Risk events recognition using smartphone and machine learning in construction workers’ material handling tasks. Eng. Constr. Archit. Manag. 2023, 30, 3562–3582. [Google Scholar] [CrossRef]
- Cai, J.; Zhang, Y.; Yang, L.; Cai, H.; Li, S. A context-augmented deep learning approach for worker trajectory prediction on unstructured and dynamic construction sites. Adv. Eng. Inform. 2020, 46, 101173. [Google Scholar] [CrossRef]
- Zhang, M.Y.; Cao, T.Z.; Zhao, X.F. Applying Sensor-Based Technology to Improve Construction Safety Management. Sensors 2017, 17, 1841. [Google Scholar] [CrossRef] [PubMed]
- Gondo, T.; Miura, R. Accelerometer-Based Activity Recognition of Workers at Construction Sites. Front. Built Environ. 2020, 6, 563353. [Google Scholar] [CrossRef]
- Ning, C.; Menglu, L.; Hao, Y.; Xueping, S.; Yunhong, L. Survey of pedestrian detection with occlusion. Complex Intell. Syst. 2021, 7, 577–587. [Google Scholar] [CrossRef]
- Konstantinou, E.; Lasenby, J.; Brilakis, I. Adaptive computer vision-based 2D tracking of workers in complex environments. Autom. Constr. 2019, 103, 168–184. [Google Scholar] [CrossRef]
- Li, F.; Li, X.; Liu, Q.; Li, Z. Occlusion Handling and Multi-Scale Pedestrian Detection Based on Deep Learning: A Review. IEEE Access 2022, 10, 19937–19957. [Google Scholar] [CrossRef]
- Zhan, G.; Xie, W.; Zisserman, A. A Tri-Layer Plugin to Improve Occluded Detection. arXiv 2022, arXiv:2210.10046. [Google Scholar] [CrossRef]
- Ke, L.; Tai, Y.-W.; Tang, C.-K. Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers. arXiv 2021, arXiv:2103.12340. [Google Scholar] [CrossRef]
- Chi, C.; Zhang, S.; Xing, J.; Lei, Z.; Li, S.Z.; Zou, X. PedHunter: Occlusion Robust Pedestrian Detector in Crowded Scenes. arXiv 2019, arXiv:1909.06826. [Google Scholar] [CrossRef]
- Wang, Q.; Chang, Y.-Y.; Cai, R.; Li, Z.; Hariharan, B.; Holynski, A.; Snavely, N. Tracking Everything Everywhere All at Once. arXiv 2023, arXiv:2306.05422. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
- Facebookresearch, Detectron2. 2023. Available online: https://github.com/facebookresearch/detectron2 (accessed on 30 August 2023).
- Lin, T.-Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L.; Dollár, P. Microsoft COCO: Common Objects in Context. arXiv 2014, arXiv:1405.0312. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. arXiv 2014, arXiv:1409.0575. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. arXiv 2015, arXiv:1504.08083. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2015, arXiv:1506.02640. [Google Scholar] [CrossRef]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. arXiv 2019, arXiv:1904.08189. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Rekavandi, A.M.; Rashidi, S.; Boussaid, F.; Hoefs, S.; Akbas, E. Transformers in Small Object Detection: A Benchmark and Survey of State-of-the-Art. arXiv 2023, arXiv:2309.04902. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. arXiv 2018, arXiv:1703.06870. [Google Scholar] [CrossRef]
- MOTChallenge, Pedestrian Detection Challenge. 2014. Available online: https://motchallenge.net/data/MOT20/ (accessed on 29 August 2023).
- Lin, C.Y.; Xie, H.X.; Zheng, H. PedJointNet: Joint Head-Shoulder and Full Body Deep Network for Pedestrian Detection. IEEE Access 2019, 7, 47687–47697. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, H.; Bao, W.; Lai, Z.; Zhang, Z.; Yuan, D. Handling Heavy Occlusion in Dense Crowd Tracking by Focusing on the Heads. arXiv 2023, arXiv:2304.07705. [Google Scholar] [CrossRef]
- Chi, C.; Zhang, S.; Xing, J.; Lei, Z.; Li, S.Z.; Zou, X. Relational Learning for Joint Head and Human Detection. arXiv 2019, arXiv:1909.10674. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
- Park, M.W.; Brilakis, I. Continuous localization of construction workers via integration of detection and tracking. Autom. Constr. 2016, 72, 129–142. [Google Scholar] [CrossRef]
- Xiao, B.; Xiao, H.; Wang, J.; Chen, Y. Vision-based method for tracking workers by integrating deep learning instance segmentation in off-site construction. Autom. Constr. 2022, 136, 104148. [Google Scholar] [CrossRef]
- Xiao, B.; Lin, Q.; Chen, Y. A vision-based method for automatic tracking of construction machines at nighttime based on deep learning illumination enhancement. Autom. Constr. 2021, 127, 103721. [Google Scholar] [CrossRef]
- Wu, A.K.Y.; Massa, F.; Lo, W.-Y.; Girshick, R. Detectron2. 2019. Available online: https://github.com/facebookresearch/detectron2 (accessed on 30 August 2023).
- Kim, M.; Ham, Y.; Koo, C.; Kim, T.W. Simulating travel paths of construction site workers via deep reinforcement learning considering their spatial cognition and wayfinding behavior. Autom. Constr. 2023, 147, 104715. [Google Scholar] [CrossRef]
- njvisionpower, Safety-Helmet-Wearing-Dataset. 2019. Available online: https://github.com/njvisionpower/Safety-Helmet-Wearing-Dataset (accessed on 30 August 2023).
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv 2013, arXiv:1311.2524. [Google Scholar]
- Dendorfer, P.; Ošep, A.; Milan, A.; Schindler, K.; Cremers, D.; Reid, I.; Roth, S.; Leal-Taixé, L. MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking. arXiv 2020, arXiv:2010.07548. [Google Scholar] [CrossRef]
- Leal-Taixé, L.; Milan, A.; Reid, I.; Roth, S.; Schindler, K. MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking. arXiv 2015, arXiv:1504.01942. [Google Scholar] [CrossRef]
- Bernardin, K.; Stiefelhagen, R. Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. Eurasip J. Image Video Process. 2008, 2008, 246309. [Google Scholar] [CrossRef]
- Xiao, B.; Kang, S.-C. Vision-Based Method Integrating Deep Learning Detection for Tracking Multiple Construction Machines. J. Comput. Civ. Eng. 2021, 35, 04020071. [Google Scholar] [CrossRef]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. arXiv 2022, arXiv:2110.06864. [Google Scholar] [CrossRef]
- Maggiolino, G.; Ahmad, A.; Cao, J.; Kitani, K. Deep OC-SORT: Multi-Pedestrian Tracking by Adaptive Re-Identification. arXiv 2023, arXiv:2302.11813. [Google Scholar] [CrossRef]
- Aharon, N.; Orfaig, R.; Bobrovsky, B.-Z. BoT-SORT: Robust Associations Multi-Pedestrian Tracking. arXiv 2022, arXiv:2206.14651. [Google Scholar] [CrossRef]
- Cao, J.; Pang, J.; Weng, X.; Khirodkar, R.; Kitani, K. Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. arXiv 2022, arXiv:2203.14360. [Google Scholar] [CrossRef]
- Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. StrongSORT: Make DeepSORT Great Again. arXiv 2022, arXiv:2202.13514. [Google Scholar] [CrossRef]
- Mikel-Brostrom, Yolo_Tracking. 2023. Available online: https://github.com/mikel-brostrom/yolo_tracking#real-time-multi-object-segmentation-and-pose-tracking-using-yolov8--yolo-nas--yolox-with-deepocsort-and-lightmbn (accessed on 31 August 2023).
- Sun, P.; Cao, J.; Jiang, Y.; Zhang, R.; Xie, E.; Yuan, Z.; Wang, C.; Luo, P. TransTrack: Multiple Object Tracking with Transformer. arXiv 2020, arXiv:2012.15460. [Google Scholar] [CrossRef]
- Wang, Z.; Zhao, H.; Li, Y.-L.; Wang, S.; Torr, P.H.S.; Bertinetto, L. Do Different Tracking Tasks Require Different Appearance Models? arXiv 2021, arXiv:2107.02156. [Google Scholar] [CrossRef]
- pmj110119, YOLOX_deepsort_tracker. 2021. Available online: https://github.com/pmj110119/YOLOX_deepsort_tracker (accessed on 31 August 2023).
Index | SOTA Models | Strengths | Limitations |
---|---|---|---|
1 | Faster R-CNN [24] | Anchor-based, two-stage, low-coupling, easy to deploy. | Limited performance when heavily occluded. |
2 | CenterNet [26] | Anchor-free, fast speed, high accuracy, and low memory usage. | One stage, high-coupling, more False Positives (FP). |
3 | Transformer [27] | CNN backbone, encoder–decoder, and no post-processing. | Billions of parameters, several days’ training, CNN backbone, not good in dealing with occlusions. |
4 | YOLOX-x [34] | Many models for download, outstanding in tracking field, good to address occlusions (but supposes all occluded persons are standing). | One stage, high-coupling, better GPU needed (>16 GB), hard to fine-tune. |
5 | Joint SimOTA [32] | YOLOX-x-based, head and body features are detected simultaneously. | One stage, high-coupling, the head and body must belong to the same person in re-training datasets, standing persons are in larger proportion. |
6 | Park et al. [35] | Early studies, but good baseline for future researchers. | Limited to the upper body, 64 × 128 HOG template to locate a worker, not DNN. |
7 | Konstantinou et al. [11] | First to propose workers’ videos, a good baseline for future researchers. | Heuristic computer vision model, not DNN. |
8 | Xiao et al. [36] | First to align with [11] datasets, two-stage, low-coupling, Mask R-CNN [29] variation. | Needs a manual collection of workers’ images, needs re-training for Mask R-CNN, ignores heavy occlusions (>70% [37]), non-public annotation files. |
9 | Our model | Preserves alignment with [11] datasets, two-stage, low-coupling, public annotation files on GitHub, and integrates head and motion information. Head and body features can be detected separately. | Needs trained head branch, head and body features have no relations during detections. |
AP | AP50 | AP75 | APs | APM | APL | Notes | |
---|---|---|---|---|---|---|---|
Head | 40.0 | 81.2 | 33.6 | 29.9 | 55.2 | 71.4 | Fast-trained in 1.25 h |
Body | 53.6 | 82.1 | 58.0 | 70.8 | 61.4 | 36.0 | No re-training |
Keypoints | 63.9 | 86.3 | 69.2 | N/A | 59.4 | 72.3 | No re-training |
Video# | Total Frames | Workers | Occlusion-Challenged Frames |
---|---|---|---|
Video 1 | 228 | 1 | No occlusion |
Video 2 | 259 | 3 | #139→#259 |
Video 3 | 211 | 9 | #14→#23, #195→#211 |
Video 4 | 197 | 5 | #38→#57, #168→#181 |
Video 5 | 355 | 5 | #48→#73 |
Video 6 | 132 | 4 | No occlusion |
Video 7 | 358 | 4 | #94→#313 |
Video 8 | 259 | 6 | #224→#243 |
Video 9 | 149 | 4 | #73→#149 |
In Total | 2148 | 41 | >500 |
Metrics | Video 1 | Video 2 | Video 3 | Video 4 | Video 5 | Video 6 | Video 7 | Video 8 | Video 9 | Combined | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | IDF1↑ (%) | 100 | 97.04 | 96.966 | 96.824 | 97.468 | 99.431 | 97.73 | 98.422 | 97.007 | 97.609 |
2 | MT↓ | 1 | 3 | 9 | 5 | 5 | 4 | 4 | 6 | 4 | 41 |
3 | ML↓ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | FP↓ | 0 | 23 | 48 | 41 | 79 | 4 | 25 | 25 | 21 | 266 |
5 | FN↓ | 0 | 23 | 54 | 21 | 9 | 2 | 17 | 24 | 13 | 163 |
6 | Rcll↑ (%) | 100 | 97.04 | 96.793 | 97.826 | 99.472 | 99.62 | 98.154 | 98.454 | 97.695 | 98.173 |
7 | Prcn↑ (%) | 100 | 97.04 | 97.139 | 95.842 | 95.544 | 99.242 | 97.309 | 98.39 | 96.329 | 97.052 |
8 | DetA↑ (%) | 93.023 | 75.624 | 78.646 | 72.991 | 69.741 | 71.198 | 76.396 | 80.397 | 76.77 | 75.591 |
9 | DetRe↑ (%) | 94.298 | 81.474 | 84.229 | 79.906 | 78.957 | 77.547 | 82.336 | 84.926 | 83.408 | 82.252 |
10 | DetPr↑ (%) | 94.298 | 81.474 | 84.53 | 78.285 | 75.839 | 77.253 | 81.627 | 84.871 | 82.241 | 81.313 |
11 | Hz↑ (fps) | 9.05 | 6.01 | 5.69 | 5.96 | 7.11 | 6.69 | 6.62 | 5.33 | 6.82 | 6.58 |
IDF1↑ (%) | Video 1 | Video 2 | Video 3 | Video 4 | Video 5 | Video 6 | Video 7 | Video 8 | Video 9 | Average |
---|---|---|---|---|---|---|---|---|---|---|
[35] | 55.5 | 50.8 | N/A | N/A | 43.9 | 46.9 | 15.7 | 25.8 | 61.0 | 42.8 |
[11] | 83.7 | 77.3 | N/A | N/A | 74.4 | 82.3 | 65.3 | 54.1 | 81.6 | 74.1 |
[36] | 99.8 | 99.2 | 93.4 | 98.1 | 99.4 | 99.2 | 95.3 | 99.6 | 94.8 | 97.6 |
Our | 100 | 97.04 | 96.966 | 96.824 | 97.468 | 99.431 | 97.73 | 98.422 | 97.007 | 97.609 |
Metrics | DeepSORT | ByteTrack | Deep OC_SORT | BoTSORT | OC_SORT | Strong_SORT | TransTrack | UniTrack | |
---|---|---|---|---|---|---|---|---|---|
1 | IDF1↑ (%) | 79.771 | 81.829 | 74.471 | 77.365 | 80.633 | 77.065 | 70.459 | 61.887 |
2 | MT↑ | 35 | 33 | 28 | 24 | 27 | 29 | 27 | 15 |
3 | ML↓ | 1 | 3 | 4 | 4 | 4 | 4 | 5 | 10 |
4 | FP↓ | 1761 | 1980 | 2167 | 1191 | 958 | 2211 | 3151 | 3124 |
5 | FN↓ | 902 | 1359 | 1669 | 2307 | 2065 | 1640 | 2253 | 3334 |
6 | Rcll↑ (%) | 89.889 | 84.766 | 81.291 | 74.14 | 76.852 | 81.616 | 74.745 | 62.628 |
7 | Prcn↑ (%) | 81.994 | 79.25 | 76.993 | 84.741 | 87.74 | 76.707 | 67.909 | 64.137 |
8 | DetA↑ (%) | 63.151 | 58.219 | 55.053 | 54.82 | 59.114 | 55.128 | 46.062 | 40.81 |
9 | DetRe↑ (%) | 77.871 | 71.347 | 69.081 | 63.111 | 66.059 | 69.433 | 62.474 | 49.839 |
10 | DetPr↑ (%) | 71.032 | 66.704 | 65.428 | 72.135 | 75.418 | 65.256 | 56.76 | 51.041 |
11 | Hz↑ | 1.77 | 9.09 | 8.33 | 7.14 | 8.47 | 7.69 | 5.13 | 4.30 |
Samples of Output Images | Descriptions |
---|---|
ByteTrack loses worker ID = 0 in Video 5 #89 given the inter-class occlusion by the steel prefabrication lathe. | |
ByteTrack wrongly predicted worker ID = 8/9 in Video 3 #22 as full size, although their full-size bodies have not been visible from #1. | |
Deep OC_SORT cannot utilize motion information for absent detections, like worker ID = 4 in Video 9 losing his bounding box when occluded by steel mesh. | |
Vanilla DeepSORT failed to solve the intra-class occlusion of worker ID = 2 in Video 3 #14 and #18. | |
BoTSORT cannot filter out the FP detection of worker ID = 8 in Video 8. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Y.; Zhou, Z.; Wang, Y.; Sun, C. Head-Integrated Detecting Method for Workers under Complex Construction Scenarios. Buildings 2024, 14, 859. https://doi.org/10.3390/buildings14040859
Liu Y, Zhou Z, Wang Y, Sun C. Head-Integrated Detecting Method for Workers under Complex Construction Scenarios. Buildings. 2024; 14(4):859. https://doi.org/10.3390/buildings14040859
Chicago/Turabian StyleLiu, Yongyue, Zhenzong Zhou, Yaowu Wang, and Chengshuang Sun. 2024. "Head-Integrated Detecting Method for Workers under Complex Construction Scenarios" Buildings 14, no. 4: 859. https://doi.org/10.3390/buildings14040859