ASFF-YOLOv5: Multielement Detection Method for Road Traffic in UAV Images Based on Multiscale Feature Fusion
Abstract
:1. Introduction
- (1)
- The K-means++ [23] clustering method was used for data processing to obtain the optimal candidate box size of the object so that the detection anchor box is more consistent with a multielement road traffic dataset.
- (2)
- To address the problems of low detection accuracy, serious error detection and missed detections of road elements, and difficult recognition of small dense objects, an ASFF-YOLOV5 algorithm was proposed. In this algorithm, the classification accuracy and speed of multiple elements of road traffic are improved using the SPPF [24] structure. Moreover, by integrating the ASFF [25] structure of the RFB module [26], the receptive field is improved and the feature information of detection objects at different scales is improved to achieve richer feature information extraction, especially the detection and recognition ability of small objects.
- (3)
- The proposed ASFF-YOLOv5 algorithm was proven to be superior in detecting multiple elements of road traffic through comparative experiments and ablation experiments and provides a new solution for updating basic traffic geographical information databases.
2. Datasets and Scale Statistics
3. Research Method
3.1. ASFF-YOLOv5
- (1)
- The height and width are compressed in the feature layer by focusing the structure, and the number of expansion channels is quadrupled to obtain a 320 × 320 × 12 feature map.
- (2)
- A 320 × 320 × 64 feature map is obtained through a series of operations, such as convolution, normalization, and activation functions.
- (3)
- In the backbone feature extraction network, three effective feature layers are obtained by stacking residual extraction, and the SPPF structure is introduced in the last effective feature layer. The SPPF structure improves the classification accuracy and speed of the feature map by using the maximum pooling of the same pooling kernel for feature extraction; at this time, the three effective feature layers obtained in the backbone feature extraction network are 80 × 80 × 256, 40 × 40 × 512, and 20 × 20 × 1024.
- (4)
- The obtained effective feature layer is transferred to the PANet structure, and the feature extraction is further enhanced through upsampling and downsampling. In this stage, the proposed ASFF + RFB module is integrated and used for the extraction of multiscale road traffic element information. It can achieve the enhancement of the perceptual field, realize the extraction of feature information of detectors at different scales, and complete the extraction of richer feature information.
- (5)
- Three enhanced effective feature layers are obtained, and the prediction and regression results are obtained by the classifier and regressor.
3.2. ASFF + RFB Module
3.3. Spatial Pyramid Pooling Fast
3.4. EIoU Loss
4. Experimental Results and Analysis
4.1. Experimental Environment
4.2. Evaluation Indicators
4.3. Comparison Experiments
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Huang, L.; Qiu, M.L.; Xu, A.Z.; Sun, Y.; Zhu, J.J. UAV imagery for automatic multi-element recognition and detection of road traffic elements. Aerospace 2022, 9, 198. [Google Scholar] [CrossRef]
- Dewi, C.; Chen, R.C.; Liu, Y.T.; Jiang, X.Y.; Hartomo, K.D. YOLO v4 for advanced traffic sign recognition with synthetic training data generated by various GAN. IEEE Access 2021, 9, 97228–97242. [Google Scholar] [CrossRef]
- Haque, W.A.; Arefin, S.; Shihavuddin, A.S.M.; Hasan, M.A. DeepThin: A novel lightweight CNN architecture for traffic sign recognition without GPU requirements. Expert Syst. Appl. 2021, 168, 114481. [Google Scholar] [CrossRef]
- Zhou, K.; Zhan, Y.F.; Fu, D.M. Learning region-based attention network for traffic sign recognition. Sensors 2021, 21, 686. [Google Scholar] [CrossRef]
- Cui, Y.L.; Yu, Y.; Cai, Z.Y.; Wang, D.H. Optimizing road network density considering automobile traffic efficiency: Theoretical approach. J. Urban Plan. Dev. 2022, 148, 04021062. [Google Scholar] [CrossRef]
- Zhou, G.D.A.; Chen, W.T.; Gui, Q.S.; Li, X.J.; Wang, L.Z. Split depth-wise separable graph-convolution network for road extraction in complex environments from high-resolution remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
- Liu, C.F.; Yin, H.; Sun, Y.X.; Wang, L.; Guo, X.D. A grade identification method of critical node in urban road network based on multi-attribute evaluation correction. Appl. Sci. 2022, 12, 813. [Google Scholar] [CrossRef]
- Dai, J.G.; Ma, R.C.; Ai, H.B. Semi-automatic extraction of rural roads from high-resolution remote sensing images based on a multifeature combination. IEEE Geosci. Remote Sens. Lett. 2020, 19, 1–5. [Google Scholar] [CrossRef]
- Hu, C.H.; Fan, W.C.; Zeng, E.L.; Hang, Z.; Wang, F.; Qi, L.Y.; Bhuiyan, M.Z.A. Digital twin-assisted real-time traffic data prediction method for 5G-enabled internet of vehicles. IEEE Trans. Ind. Inform. 2021, 18, 2811–2819. [Google Scholar] [CrossRef]
- Li, Z.S.; Xiong, G.; Tian, Y.L.; Lv, Y.S.; Chen, Y.Y.; Hui, P.; Su, X. A multi-stream feature fusion approach for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2022, 23, 1456–1466. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
- Yu, R.N.; Li, H.G.; Jiang, Y.L.; Zhang, B.C.; Wang, Y.F. Tiny vehicle detection for mid-to-high altitude UAV images based on visual attention and spatial-temporal information. Sensors 2022, 22, 2354. [Google Scholar] [CrossRef]
- Xiao, R.; Wang, Y.Z.; Tao, C. Fine-grained road scene understanding from aerial images based on semisupervised semantic segmentation networks. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar]
- Wang, H.F.; Wang, J.Z.; Bai, K.M.; Sun, Y. Centered multi-task generative adversarial network for small object detection. Sensors 2021, 21, 5194. [Google Scholar] [CrossRef]
- Xu, D.Q.; Wu, Y.Q. FE-YOLO: A feature enhancement network for remote sensing target detection. Remote Sens. 2021, 13, 1311. [Google Scholar] [CrossRef]
- Qing, Y.H.; Liu, W.Y.; Feng, L.Y.; Gao, W.J. Improved Yolo network for free-angle remote sensing target detection. Remote Sens. 2021, 13, 2171. [Google Scholar] [CrossRef]
- Hu, J.M.; Zhi, X.Y.; Shi, T.J.; Zhang, W.; Cui, Y.; Zhao, S.G. PAG-YOLO: A portable attention-guided YOLO network for small ship detection. Remote Sens. 2021, 13, 3059. [Google Scholar] [CrossRef]
- Kim, M.; Jeong, J.; Kim, S. ECAP-YOLO: Efficient Channel Attention Pyramid YOLO for Small Object Detection in Aerial Image. Remote Sens. 2021, 13, 4851. [Google Scholar] [CrossRef]
- Liu, M.J.; Wang, X.H.; Zhou, A.J.; Fu, X.Y.; Ma, Y.W.; Piao, C.H. Uav-yolo: Small object detection on unmanned aerial vehicle perspective. Sensors 2020, 20, 2238. [Google Scholar] [CrossRef] [Green Version]
- Lee, S.; Kim, S.; Moon, S. Development of a car-free street mapping model using an integrated system with unmanned aerial vehicles, aerial mapping cameras, and a deep learning algorithm. J. Comput. Civil. Eng. 2022, 36, 04022003. [Google Scholar] [CrossRef]
- Hong, Z.H.; Yang, F.; Pan, H.Y.; Zhou, R.Y.; Zhang, Y.; Han, Y.L.; Wang, J.; Yang, S.H.; Chen, P.; Tong, X.H.; et al. Highway crack segmentation from unmanned aerial vehicle images using deep learning. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Sultonov, F.; Park, J.H.; Yun, S.; Lim, D.W.; Kang, J.M. Mixer U-Net: An improved automatic road extraction from UAV imagery. Appl. Sci. 2022, 12, 1953. [Google Scholar] [CrossRef]
- Yoder, J.; Carey, E. Semi-supervised k-means++. J. Stat. Comput. Simul. 2017, 87, 2597–2608. [Google Scholar] [CrossRef]
- Ultralytics. YOLOv5. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 1 November 2021).
- Liu, S.T.; Huang, D.; Wang, Y.H. Receptive field block net for accurate and fast object detection. In Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Liu, S.T.; Di, H.; Wang, Y.H. Learning spatial fusion for single-shot object detection. In Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Pelleg, D.; Moore, A. X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the Seventeenth International Conference on Machine Learning, San Francisco, CA, USA, 29 June–2 July 2000. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.F.; Shi, J.P.; Jia, J.Y. Path aggregation network for instance segmentation. In Proceedings of the 31st IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.M.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proceedings of the 13th European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
- Zheng, Z.H.; Wang, P.; Liu, W.; Li, J.Z.; Ye, R.G.; Ren, D.W. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- Zhang, Y.F.; Ren, W.Q.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T.N. Focal and efficient IoU loss for accurate bounding box regression. arXiv 2021, arXiv:2101.08158. [Google Scholar]
- Jiang, B.R.; Luo, R.X.; Mao, J.Y.; Xiao, T.T.; Jiang, Y.N. Acquisition of localization confidence for accurate object detection. In Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.M.; Dollar, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [Green Version]
- Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. Available online: https://arxiv.org/abs/1804.02767 (accessed on 8 April 2018).
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. Available online: https://arxiv.org/abs/2004.10934 (accessed on 23 April 2020).
- Tian, Y.; Gelernter, J.; Wang, X.; Li, J.Y.; Yu, Y.Z. Traffic sign detection using a multi-scale recurrent attention network. IEEE Trans. Intell. Transp. Syst. 2020, 20, 4466–4475. [Google Scholar] [CrossRef]
- Chen, P.D.; Huang, L.; Xia, Y.; Yu, X.N.; Gao, X.X. Detection and recognition of road traffic signs in UAV images based on Mask R-CNN. Remote Sens. Land Resour. 2020, 32, 61–67. [Google Scholar]
- Lin, C.J.; Jhang, J.Y. Intelligent traffic-monitoring system based on YOLO and convolutional fuzzy neural networks. IEEE Access 2022, 10, 14120–14133. [Google Scholar] [CrossRef]
- Wang, M.Y.; Liu, R.F.; Yang, J.B.; Lu, X.S.; Yu, J.Y.; Ren, H.W. Traffic Sign Three Dimensional Reconstruction Based on Point Clouds and Panoramic Images. Photogramm. Rec. 2022, 37, 87–110. [Google Scholar] [CrossRef]
- Liang, T.J.; Bao, H.; Pan, W.G.; Pan, F. Traffic sign detection via improved sparse R-CNN for autonomous vehicles. J. Adv. Transp. 2022, 2022, 3825532. [Google Scholar] [CrossRef]
- Mishra, J.; Goyal, S. An effective automatic traffic sign classification and recognition deep convolutional networks. Multimed. Tools Appl. 2022, 81, 18915–18934. [Google Scholar] [CrossRef]
Serial No. | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
x | 10 | 15 | 16 | 19 | 23 | 25 | 33 | 52 | 125 |
y | 17 | 29 | 19 | 38 | 22 | 26 | 70 | 34 | 123 |
Evaluation Indicators | Calculation Formula | Definition |
---|---|---|
Precision | represents the positive samples detected incorrectly, indicating the number of other classes detected as actually road traffic element classes. | |
Recall | ||
AP | in the range [0, 1] is the recall. | |
mAP | category. |
Network Model | Transport Elements | AP/% | Precision/% | Recall/% | mAP/% | Rise Points |
---|---|---|---|---|---|---|
Faster R-CNN | zebra crossings | 64.3 | 59.0 | 71.4 | 56.9 | 36.2 |
bus stations | 71.5 | 73.3 | 71.0 | |||
roadside parking spaces | 35.0 | 31.5 | 48.8 | |||
Retinanet | zebra crossings | 70.0 | 87.8 | 61.2 | 57.3 | 35.8 |
bus stations | 67.3 | 86.4 | 61.3 | |||
roadside parking spaces | 34.5 | 74.3 | 24.4 | |||
SSD | zebra crossings | 52.9 | 76.8 | 32.6 | 53.9 | 39.2 |
bus stations | 75.1 | 100.0 | 54.8 | |||
roadside parking spaces | 33.8 | 73.4 | 14.7 | |||
YOLOv3 | zebra crossings | 84.3 | 87.4 | 80.4 | 81.5 | 11.6 |
bus stations | 83.8 | 88.9 | 77.4 | |||
roadside parking spaces | 76.5 | 76.1 | 75.9 | |||
YOLOv4 | zebra crossings | 81.8 | 90.0 | 79.2 | 74.7 | 18.4 |
bus stations | 76.8 | 90.5 | 61.3 | |||
roadside parking spaces | 65.4 | 70.2 | 71.0 | |||
YOLOv5 | zebra crossings | 85.3 | 75.2 | 82.0 | 73.9 | 19.2 |
bus stations | 65.2 | 50.9 | 58.1 | |||
roadside parking spaces | 78.9 | 74.2 | 77.5 | |||
YOLOv4 + ECA | zebra crossings | 94.3 | 90.1 | 93.9 | 90.5 | 2.6 |
bus stations | 99.6 | 91.3 | 100.0 | |||
roadside parking spaces | 77.4 | 81.0 | 78.1 | |||
ASFF-YOLOv5 (YOLOv5 + ASFF + RFB) | zebra crossings | 94.0 | 85.9 | 94.4 | 93.1 | |
bus stations | 96.2 | 93.1 | 96.3 | |||
roadside parking spaces | 83.7 | 86.9 | 88.6 |
Network Model | Transport Elements | AP/% | Precision/% | Recall/% | mAP/% | Rise Points |
---|---|---|---|---|---|---|
YOLOv5 | zebra crossings | 85.3 | 75.2 | 82.0 | 73.9 | 18.3 |
bus stations | 65.2 | 50.9 | 58.1 | |||
roadside parking spaces | 78.9 | 74.2 | 77.5 | |||
YOLOv5 + K-means | zebra crossings | 93.4 | 84.0 | 92.5 | 89.2 | 3 |
bus stations | 95.3 | 74.0 | 83.4 | |||
roadside parking spaces | 84.7 | 84.9 | 88.7 | |||
YOLOv5 + K-means++ | zebra crossings | 94.6 | 84.8 | 93.8 | 92.2 | |
bus stations | 95.5 | 78.2 | 93.9 | |||
roadside parking spaces | 86.0 | 86.9 | 88.9 |
Network Model | Transport Elements | AP/% | Precision/% | Recall/% | mAP/% | Rise Points |
---|---|---|---|---|---|---|
YOLOv5 | zebra crossings | 85.3 | 75.2 | 82.0 | 73.9 | 19.2 |
bus stations | 65.2 | 50.9 | 58.1 | |||
roadside parking spaces | 78.9 | 74.2 | 77.5 | |||
YOLOv5 + SPPF | zebra crossings | 94.9 | 90.3 | 94.2 | 90.7 | 2.4 |
bus stations | 100.0 | 83 | 88.8 | |||
roadside parking spaces | 85.6 | 86.2 | 89.1 | |||
YOLOv5 + K-means++ | zebra crossings | 94.6 | 84.8 | 93.8 | 92.2 | 0.9 |
bus stations | 95.5 | 78.2 | 93.9 | |||
roadside parking spaces | 86.0 | 86.9 | 88.9 | |||
YOLOv5 + ASFF | zebra crossings | 91.3 | 92.1 | 94.1 | 92.8 | 0.3 |
bus stations | 100.0 | 91.6 | 95.1 | |||
roadside parking spaces | 77.1 | 91.4 | 89.0 | |||
ASFF-YOLOv5 (YOLOv5 + ASFF + RFB) | zebra crossings | 94.0 | 85.9 | 94.4 | 93.1 | |
bus stations | 96.2 | 93.1 | 96.3 | |||
roadside parking spaces | 83.7 | 86.9 | 88.6 |
Network Model | Transport Elements | Normal Road Scene | Small Target Scenes Scene 1 (Unobstructed) | Small Target Scenes Scene 2 (Obstructed) | |||
---|---|---|---|---|---|---|---|
Number | AP/% | Number | AP/% | Number | AP/% | ||
YOLOv5 | zebra crossings | 10 | 87.0 | - | - | 2 | 87 |
bus stations | 1 | 81.7 | - | - | - | - | |
roadside parking spaces | - | - | 9 | 72.9 | 42 | 81.6 | |
YOLOv5 + SPPF | zebra crossings | 10 | 85.3 | - | - | 2 | 99.5 |
bus stations | 1 | 87.8 | - | - | - | - | |
roadside parking spaces | - | - | 9 | 76.6 | 34 | 96.3 | |
YOLOv5 + K-means++ | zebra crossings | 10 | 87.1 | - | - | 2 | 100 |
bus stations | 1 | 84.8 | - | - | - | - | |
roadside parking spaces | - | - | 9 | 82.9 | 38 | 93.3 | |
YOLOv5 + ASFF | zebra crossings | 10 | 86.7 | - | - | 2 | 100 |
bus stations | 1 | 85.7 | - | - | - | - | |
roadside parking spaces | - | - | 9 | 89.7 | 40 | 93.3 | |
ASFF-YOLOv5 (YOLOv5 + ASFF + RFB) | zebra crossings | 10 | 88.4 | - | - | 2 | 99.5 |
bus stations | 1 | 89.1 | - | - | - | - | |
roadside parking spaces | - | - | 9 | 91.2 | 42 | 93.5 |
Network Model | Transport Elements | Number 1 | Number 2 | AP/% |
---|---|---|---|---|
YOLOv5 | zebra crossings | 18 | 0 | 87.2 |
bus stations | 4 | 1 | 75.5 | |
roadside parking spaces | 40 | 18 | 62.3 | |
YOLOv5 + SPPF | zebra crossings | 17 | 1 | 89.1 |
bus stations | 5 | 0 | 80.3 | |
roadside parking spaces | 29 | 29 | 68.7 | |
YOLOv5 + K-means++ | zebra crossings | 18 | 0 | 88.4 |
bus stations | 4 | 1 | 81.7 | |
roadside parking spaces | 37 | 21 | 73.3 | |
YOLOv5 + ASFF | zebra crossings | 18 | 0 | 88.6 |
bus stations | 4 | 1 | 84.0 | |
roadside parking spaces | 44 | 14 | 71.8 | |
ASFF-YOLOv5 (YOLOv5 + ASFF + RFB) | zebra crossings | 18 | 0 | 89.5 |
bus stations | 5 | 0 | 89.7 | |
roadside parking spaces | 50 | 8 | 80.3 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qiu, M.; Huang, L.; Tang, B.-H. ASFF-YOLOv5: Multielement Detection Method for Road Traffic in UAV Images Based on Multiscale Feature Fusion. Remote Sens. 2022, 14, 3498. https://doi.org/10.3390/rs14143498
Qiu M, Huang L, Tang B-H. ASFF-YOLOv5: Multielement Detection Method for Road Traffic in UAV Images Based on Multiscale Feature Fusion. Remote Sensing. 2022; 14(14):3498. https://doi.org/10.3390/rs14143498
Chicago/Turabian StyleQiu, Mulan, Liang Huang, and Bo-Hui Tang. 2022. "ASFF-YOLOv5: Multielement Detection Method for Road Traffic in UAV Images Based on Multiscale Feature Fusion" Remote Sensing 14, no. 14: 3498. https://doi.org/10.3390/rs14143498