Birds Detection in Natural Scenes Based on Improved Faster RCNN
Abstract
:1. Introduction
- K-means [14] clustering algorithm is used to cluster the bounding boxes. We improve the anchoring according to the clustering results. The improved anchor frame tends toward the real bounding box of the dataset;
- Soft NMS method [15] is used to solve the problem of occlusion of birds, and multi-scale training is used to improve the generalization ability of the model in the training stage.
2. Related Work
3. Materials and Methods
3.1. Feature Extraction Network
3.2. Multi-Scale Fusion Network
3.3. K-Means Clustering Algorithm
3.4. Soft Non-Maximum Suppression
4. Experiment and Result Analysis
4.1. Experimental Environment and Experimental Data
4.2. Evaluation Indicators
4.3. Model Parameters
4.4. Result and Analysis
4.4.1. Ablation Studies
4.4.2. Comparison of Our Model and State-of-the-Art Detection Models
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Iqbal, Z.; Khan, M.A.; Sharif, M.; Shah, J.H.; ur Rehman, M.H.; Javed, K. An automated detection and classification of citrus plant diseases using image processing techniques: A review. Comput. Electron. Agric. 2018, 153, 12–32. [Google Scholar] [CrossRef]
- Saijo, Y.; Loo, E.P.I.; Yasuda, S. Pattern recognition receptors and signaling in plant-microbe interactions. Plant J. 2018, 93, 592–613. [Google Scholar] [CrossRef] [PubMed]
- Scharr, H.; Dee, H.; French, A.P.; Tsaftaris, S.A. Special issue on computer vision and image analysis in plant phenotyping. Mach. Vis. Appl. 2016, 27, 607–609. [Google Scholar] [CrossRef] [Green Version]
- Yue, X.; Li, H.; Shimizu, M.; Kawamura, S.; Meng, L. YOLO-GD: A Deep Learning-Based Object Detection Algorithm for Empty-Dish Recycling Robots. Machines 2022, 10, 294. [Google Scholar] [CrossRef]
- Wang, J.; Su, S.; Wang, W.; Chu, C.; Jiang, L.; Ji, Y. An Object Detection Model for Paint Surface Detection Based on Improved YOLOv3. Machines 2022, 10, 261. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Fei-Fei, L.; Deng, J.; Li, K. ImageNet: Constructing a large-scale image database. J. Vis. 2010, 9, 1037. [Google Scholar] [CrossRef]
- Chauhan, R.; Ghanshala, K.K.; Joshi, R. Convolutional neural network (CNN) for image detection and recognition. In Proceedings of the 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), Jalandhar, India, 15–17 December 2018; pp. 278–282. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster RCNN: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Juan, PR, USA, 17–19 January 2017; pp. 2117–2125. [Google Scholar]
- Zhong, Y.; Wang, J.; Peng, J.; Zhang, L. Anchor box optimization for object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2020; pp. 1286–1294. [Google Scholar]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS–improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Ke, W.; Zhang, T.; Huang, Z.; Ye, Q.; Liu, J.; Huang, D. Multiple anchor learning for visual object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10206–10215. [Google Scholar]
- Dong, Z.; Li, G.; Liao, Y.; Wang, F.; Ren, P.; Qian, C. Centripetalnet: Pursuing high-quality keypoint pairs for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10519–10528. [Google Scholar]
- Li, H.; Wu, Z.; Zhu, C.; Xiong, C.; Socher, R.; Davis, L.S. Learning from noisy anchors for one-stage object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10588–10597. [Google Scholar]
- Ren, Z.; Yu, Z.; Yang, X.; Liu, M.Y.; Lee, Y.J.; Schwing, A.G.; Kautz, J. Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10598–10607. [Google Scholar]
- Li, Y.; Wang, T.; Kang, B.; Tang, S.; Wang, C.; Li, J.; Feng, J. Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10991–11000. [Google Scholar]
- Cao, Y.; Chen, K.; Loy, C.C.; Lin, D. Prime sample attention in object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11583–11591. [Google Scholar]
- Guo, C.; Fan, B.; Zhang, Q.; Xiang, S.; Pan, C. Augfpn: Improving multi-scale feature learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12595–12604. [Google Scholar]
- Wang, A.; Sun, Y.; Kortylewski, A.; Yuille, A.L. Robust object detection under occlusion with context-aware compositionalnets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12645–12654. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, San Juan, PR, USA, 17–19 June 2016; pp. 2818–2826. [Google Scholar]
- da Silva, J.R.; de Almeida, G.M.; Cuadros, M.A.d.S.; Campos, H.L.; Nunes, R.B.; Simão, J.; Muniz, P.R. Recognition of Human Face Regions under Adverse Conditions—Face Masks and Glasses—In Thermographic Sanitary Barriers through Learning Transfer from an Object Detector. Machines 2022, 10, 43. [Google Scholar] [CrossRef]
- Qin, Y.; He, S.; Zhao, Y.; Gong, Y. RoI pooling based fast multi-domain convolutional neural networks for visual tracking. In Proceedings of the International Conference on Artificial Intelligence and Industrial Engineering, Phuket, Thailand, 26–27 July 2016. [Google Scholar]
- Xavier, A.I.; Villavicencio, C.; Macrohon, J.J.; Jeng, J.H.; Hsieh, J.G. Object Detection via Gradient-Based Mask R-CNN Using Machine Learning Algorithms. Machines 2022, 10, 340. [Google Scholar] [CrossRef]
- Zitnick, C.L.; Dollár, P. Edge boxes: Locating object proposals from edges. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 391–405. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Juan, PR, USA, 17–19 June 2017; pp. 7263–7271. [Google Scholar]
- Nalli, G.; Amendola, D.; Perali, A.; Mostarda, L. Comparative Analysis of Clustering Algorithms and Moodle Plugin for Creation of Student Heterogeneous Groups in Online University Courses. Appl. Sci. 2021, 11, 5800. [Google Scholar] [CrossRef]
- Hosang, J.; Benenson, R.; Schiele, B. Learning non-maximum suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Juan, PR, USA, 17–19 June 2017; pp. 4507–4515. [Google Scholar]
Pretraining Model | Network Layers | Pretraining Parameter |
---|---|---|
VGG19 | 26 | 549 MB |
InceptionNet-V3 | 159 | 92 MB |
ResNet-50 | 168 | 99 MB |
Num | Anchor | KM | MSFIF | SNMS | MT | NET | mAP | FPS | P | R | F |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 9 | × | × | × | × | VGG19 | 84.6 | 12.52 | 73.15% | 74.22% | 73.68% |
2 | 15 | √ | × | × | × | VGG19 | 86.07 | 11.86 | 74.99% | 75.37% | 75.18% |
3 | 15 | × | √ | × | × | VGG19 | 85.23 | 11.99 | 75.83% | 76.51% | 76.17% |
4 | 15 | × | × | √ | × | VGG19 | 85.08 | 12.43 | 77.61% | 76.13% | 76.86% |
5 | 15 | × | × | × | √ | VGG19 | 85.81 | 12.18 | 76.39% | 77.43% | 77.41% |
6 | 15 | √ | √ | × | × | VGG19 | 86.52 | 12.63 | 77.22% | 75.48% | 76.34% |
7 | 15 | √ | √ | √ | × | VGG19 | 87.59 | 13.91 | 76.84% | 78.91% | 77.86% |
8 | 15 | √ | √ | √ | √ | VGG19 | 88.67 | 13.78 | 77.67% | 78.35% | 78.00% |
9 | 15 | √ | √ | √ | √ | INet-V3 | 88.72 | 13.17 | 78.94% | 79.76% | 79.35% |
10 | 15 | √ | √ | √ | √ | Res-50 | 89.04 | 13.63 | 80.03% | 79.68% | 79.85% |
Algorithm | mAP (IoU = 0.5) | FPS | Precision | Recall | F1_Score |
---|---|---|---|---|---|
SSD300 | 71.23 | 45.64 | 70.86% | 72.09% | 71.47% |
YOLOv4 | 86.91 | 31.22 | 79.55% | 81.13% | 80.33% |
Faster RCNN | 84.60 | 12.52 | 73.15% | 74.22% | 73.68% |
Proposed model | 89.04 | 13.63 | 80.03% | 79.68% | 79.85% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiang, W.; Song, Z.; Zhang, G.; Wu, X. Birds Detection in Natural Scenes Based on Improved Faster RCNN. Appl. Sci. 2022, 12, 6094. https://doi.org/10.3390/app12126094
Xiang W, Song Z, Zhang G, Wu X. Birds Detection in Natural Scenes Based on Improved Faster RCNN. Applied Sciences. 2022; 12(12):6094. https://doi.org/10.3390/app12126094
Chicago/Turabian StyleXiang, Wenbin, Ziying Song, Guoxin Zhang, and Xuncheng Wu. 2022. "Birds Detection in Natural Scenes Based on Improved Faster RCNN" Applied Sciences 12, no. 12: 6094. https://doi.org/10.3390/app12126094