YOLOv5s-Fog: An Improved Model Based on YOLOv5s for Object Detection in Foggy Weather Scenarios

Meng, Xianglin; Liu, Yi; Fan, Lili; Fan, Jingjing

doi:10.3390/s23115321

Open AccessArticle

YOLOv5s-Fog: An Improved Model Based on YOLOv5s for Object Detection in Foggy Weather Scenarios

by

Xianglin Meng

¹,

Yi Liu

²,

Lili Fan

² and

Jingjing Fan

^1,2,*

¹

School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China

²

National Industrial Innovation Center of Intelligent Equipment, Changzhou 213300, China

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(11), 5321; https://doi.org/10.3390/s23115321

Submission received: 8 May 2023 / Revised: 29 May 2023 / Accepted: 31 May 2023 / Published: 3 June 2023

(This article belongs to the Special Issue Research Progress on Intelligent Electric Vehicles-2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

In foggy weather scenarios, the scattering and absorption of light by water droplets and particulate matter cause object features in images to become blurred or lost, presenting a significant challenge for target detection in autonomous driving vehicles. To address this issue, this study proposes a foggy weather detection method based on the YOLOv5s framework, named YOLOv5s-Fog. The model enhances the feature extraction and expression capabilities of YOLOv5s by introducing a novel target detection layer called SwinFocus. Additionally, the decoupled head is incorporated into the model, and the conventional non-maximum suppression method is replaced with Soft-NMS. The experimental results demonstrate that these improvements effectively enhance the detection performance for blurry objects and small targets in foggy weather conditions. Compared to the baseline model, YOLOv5s, YOLOv5s-Fog achieves a 5.4% increase in mAP on the RTTS dataset, reaching 73.4%. This method provides technical support for rapid and accurate target detection in adverse weather conditions, such as foggy weather, for autonomous driving vehicles.

Keywords:

foggy weather scenarios; deep learning; SwinFoucs; decoupled head; soft-NMS

1. Introduction

In the field of autonomous driving, object detection is a crucial technology [1], and its accuracy and robustness are of paramount importance for practical applications [2]. However, in foggy weather scenarios, challenges arise due to weakened light and issues such as blurred object edges, which lead to a decline in algorithm performance, consequently affecting the safety and reliability of autonomous vehicles [3]. Therefore, conducting research on target detection in foggy weather scenes holds great significance.

In recent years, researchers have made certain progress in addressing the problem of object detection in foggy weather conditions [4,5]. Traditional methods primarily rely on conventional computer vision techniques such as edge detection, filtering, and background modeling. While these methods can partially handle foggy images, their effectiveness in complex scenes and under challenging foggy conditions is limited. To address the issue of object detection in complex foggy scenes, scholars have started exploring the utilization of physical models to represent foggy images. He et al. [6] proposed a single-image dehazing method based on the dark channel prior, while Zhu et al. [7] presented a fast single-image dehazing approach based on color attenuation prior. These dehazing methods improve the visibility of foggy images and subsequently enhance the accuracy of object detection. However, physical model-based methods require the estimation of fog density, making it difficult to handle multiple fog densities in complex scenes.

With the continuous development of deep learning techniques, deep learning has gradually become a research hotspot in the field of object detection [8,9]. Compared to traditional methods, deep learning models can directly learn tasks from raw data and exhibit improved generalization through training on large-scale datasets [10]. Deep learning-based object detection algorithms can be categorized into two-stage detectors and one-stage detectors. Two-stage detectors first generate a set of candidate boxes and then perform classification and position regression for each candidate box. Faster R-CNN [11] is the most representative algorithm in this category, which employs an RPN [11] to generate candidate boxes and utilizes ROI Pooling [12] for classification and position regression of each candidate box. In addressing the problem of object detection in foggy weather conditions, Chen et al. [13] proposed a domain adaptive method that aligns features and adapts domains between source and target domains, thereby improving the detection performance in the target domain. However, region proposal-based methods require more computational resources and incur higher costs, making them less suitable for real-time applications with stringent timing requirements [14].

One-stage detectors directly perform classification and position regression on the input image without the need for generating candidate boxes. The most representative algorithms in this category are the YOLO series [15,16,17] and SSD [18]. YOLO divides the image into a grid and predicts bounding boxes and class probabilities for each grid cell, while SSD predicts bounding boxes of different sizes on different feature layers. Compared to two-stage detectors, one-stage detectors have a significant advantage in terms of speed, making them particularly suitable for real-time applications. In previous studies, Qiu et al. [19] combined the coordinated attention (CA) [20] mechanism with the GhostNet [21] algorithm to improve the accuracy of the YOLOv5 algorithm, providing technical support for fast identification of multiple foxtail millet ear targets in complex field environments. Fan et al. [22] proposed a solution to reduce mispicking and missed picking of strawberry fruits by combining YOLOv5 with dark channel [6] enhancement. This study addressed the issue of low illumination during nighttime image capture and compared the detection results of five image enhancement algorithms, namely histogram equalization [23], Laplace transform [24], gamma transform [25], logarithmic variation, and dark channel [6] enhancement processing, across different time periods. The final results demonstrated that YOLOv5 outperformed SSD [18], DSSD [26], and EfficientDet [27] in terms of recognition accuracy, with a correct rate exceeding 90%. Baidya et al. [28] added an additional detection head to YOLOv5 and incorporated ConvMixers [29] in the context of unmanned aerial vehicle detection scenarios. They trained and tested the proposed architecture on the VisDrone2021 dataset, achieving results comparable to state-of-the-art methods. Ge et al. [30] embedded the Coordinated Attention [20] (CA) module and the Squeeze-and-Excitation [31] (SE) module into the YOLOv5s network for underwater target detection. The modified YOLOv5s showed a 2.4% improvement in mean average mAP compared to the baseline model.

However, improving the accuracy of object detection in complex weather and lighting conditions remains a challenge. To tackle this challenging problem, Huang et al. [32] employed two subnetworks to jointly learn visibility enhancement and object detection, reducing the impact of image degradation by sharing feature extraction layers. Hang et al. [33] and Guo et al. [34], on the other hand, addressed the issue by employing image defogging and image enhancement techniques, respectively, to mitigate the influence of weather-specific information. Hnewa et al. [35] proposed a cross-domain object detection method that utilizes multi-scale features and domain adaptation techniques to enhance the detection performance in complex weather conditions. Liu et al. [36] designed a fully differentiable image processing module based on YOLOv3 [15] for object detection in foggy and low-light scenarios. Although this image-adaptive approach improves detection accuracy, it also introduces some undesirable noise.

In the aforementioned research on foggy weather object detection, although the detection accuracy has been improved, most of these methods are primarily focused on defogging and image enhancement [37]. This study aims to enable object detection algorithms to achieve clear detection in foggy weather scenes without any preprocessing of the original image. In recent years, the application of Transformer models [38,39,40] in computer vision has been increasing. These models leverage self-attention mechanisms to capture relationships within an image, thereby enhancing model performance. In this study, the Swin Transformer [40] component is incorporated into the YOLOv5s model to improve detection accuracy in adverse weather conditions.

The main contributions of this study are as follows:

On the basis of the YOLOv5s model, we introduce a multi-scale attention feature detection layer called SwinFocus, based on the Swin Transformer, to better capture the correlations among different regions in foggy images;
The traditional YOLO Head is replaced with a decoupled head, which decomposes the object detection task into different subtasks, reducing the model’s reliance on specific regions in the input image;
In the stage of non-maximum suppression (NMS), Soft-NMS is employed to better preserve the target information, thereby effectively reducing issues such as false positives and false negatives.

The remaining sections of this paper are organized as follows. In Section 2, we provide a brief overview of the original YOLOv5s model and elaborate on the innovations proposed in this study. Section 3 presents the dataset, experimental details, and results obtained in our experiments. Finally, in Section 4, we summarize our work and propose some future research directions.

2. YOLOv5s-Fog

2.1. Overview of YOLOv5

YOLOv5 [17] is an efficient and highly accurate real-time object detection algorithm that extends the YOLO series [15,16]. This algorithm employs a single neural network to perform bounding box and category predictions. In comparison to its previous versions, YOLOv5 incorporates several improvements, including a new backbone network based on the CSP architecture [41], dynamic anchor allocation methods, and data augmentation techniques such as Mixup [42]. These enhancements have enabled the algorithm to achieve outstanding performance on multiple benchmark datasets while maintaining real-time inference speeds on both CPU and GPU platforms. The YOLOv5 model consists of four different configurations: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. In general, YOLOv5s is well-suited for real-time object detection in scenarios with limited computational resources, while YOLOv5x is more suitable for applications that require high-precision detection. Considering the real-time detection requirements in foggy weather conditions, this study employs YOLOv5s as the experimental model. The operational flow of YOLOv5s-Fog proposed in this paper is illustrated in Figure 1.

2.2. The Architecture of YOLOv5s-Fog Network

In foggy weather conditions, object features in images often become blurry or even lost due to the presence of fog [43]. In this paper, we propose a novel approach to address the aforementioned issue by improving the YOLOv5s network architecture. Our proposed network, YOLOv5s-Fog, is illustrated in Figure 2. Firstly, we introduce a new feature detection layer called SwinFocus, which enhances the object detection capability by better capturing subtle features of objects in the image. Compared to traditional convolutional neural networks, SwinFocus achieves global interaction and aggregation of feature map information by decomposing the spatial and channel dimensions of the feature maps, enabling the network to better detect objects concealed in the fog. Secondly, to enhance the flexibility of the model during the detection stage, we employ a decoupled head, where the classification and regression heads are separately processed, making better use of the network’s expressive power. Finally, we utilize Soft-NMS in the post-processing stage to effectively handle the issue of overlapping objects in foggy images.

2.3. Construction of Object Detection Model for Foggy Scenes

2.3.1. The Swin Transformer Architecture

In challenging weather conditions such as foggy environments, traditional Convolutional Neural Networks (CNNs) face a range of limitations and challenges in object detection tasks [44]. Firstly, the presence of fog causes image blurring, reduced contrast, and color distortion, making it difficult for traditional convolutional operations to effectively extract clear object edges and fine details. Secondly, lighting variations and occlusions in foggy scenes make it challenging for traditional CNNs to accurately localize and detect objects. Swin Transformer [40] is a neural network based on the Transformer [38] architecture that has demonstrated outstanding performance in computer vision tasks such as image classification, object detection, and semantic segmentation. The architecture of Swin Transformer is illustrated in Figure 3.

Unlike traditional CNNs, Swin Transformer introduces the Patch Partition module, which divides the input image into blocks and flattens them in the channel dimension to better capture the subtle features of objects in the image. In Swin Transformer, the image is first input into the Patch Partition module for block-wise processing, where each 4 × 4 adjacent pixels form a patch that is then flattened in the channel dimension. Subsequently, four stages are constructed to generate feature maps of different sizes. Stage 1 utilizes a linear embedding layer, while the remaining three stages employ Patch Merging for downsampling. Finally, these stages are stacked in a repeated manner.

To enhance the feature representation in adverse weather conditions, we introduce an additional Swin Transformer-based feature detection layer, SwinFocus, to the YOLOv5 framework. The basic structure of SwinFocus is illustrated in Figure 4. SwinFocus plays a critical role in object detection under challenging weather scenarios, with its hierarchical feature representation mechanism being the core component. Through multiple stages of downsampling, it can extract features at different scales, capturing information from objects of various sizes. This ability enables SwinFocus to adapt better to size variations and diversity in targets. Furthermore, SwinFocus inherits the window attention mechanism, which transforms global attention into local attention, allowing it to focus more on subtle details and edge information in the image. In foggy conditions where images may be affected by blurring and reduced visual quality, the window attention mechanism can precisely localize objects and extract crucial features. The computational formulas for two consecutive SwinFocus layers are as follows:

{\hat{z}}^{l} = W - M S A (L N (z^{l - 1})) + z^{l - 1}

(1)

z^{l} = M L P (L N ({\hat{z}}^{l})) + {\hat{z}}^{l}

(2)

{\hat{z}}^{l + 1} = S W - M S A (L N (z^{l})) + z^{l}

(3)

z^{l + 1} = M L P (L N ({\hat{z}}^{l + 1})) + {\hat{z}}^{l + 1}

(4)

2.3.2. Decoupled Head

Deep learning-based object detection methods typically adopt a shared feature detector to simultaneously predict the class and location information of objects [45]. This coupling approach is beneficial as it improves model efficiency and accuracy through shared feature representations. However, in foggy weather conditions, this tightly coupled approach may face limitations and challenges. Firstly, the image quality in foggy environments is severely affected, resulting in visual impairments that make it difficult for traditional shared feature detectors to accurately extract clear object edges and fine details. Consequently, this impacts the accuracy of object localization and detection. Secondly, due to light absorption and scattering effects in foggy weather, the visibility of objects is reduced, causing indistinct object edges and easy blending with the background.

The Decoupled Head separates feature extraction from spatial position information by employing two independent network heads [46]. This design effectively reduces the coupling between feature extraction and spatial position information [45], enabling the model to better handle complex lighting variations caused by light propagation attenuation and scattering in foggy weather conditions. Moreover, the separation of feature extraction and spatial localization tasks allows the feature extraction head to focus on extracting discriminative features, while the spatial information head can concentrate on processing positional information. The structure of the decoupled head, as illustrated in Figure 5, involves a 1 × 1 convolutional layer to reduce the channel dimension, followed by two parallel branches, each containing two 3 × 3 convolutional layers [46]. This approach not only reduces the complexity of the network architecture but also enhances the accuracy of the model.

2.3.3. Soft-NMS

In comparison to normal weather conditions, foggy weather exhibits differences in light propagation, contrast, color, and visibility [1]. These issues result in more prominent overlapping of objects. Traditional non-maximum suppression (NMS) methods may excessively suppress overlapping bounding boxes when selecting the one with the highest confidence, leading to the erroneous exclusion of important objects [47]. The specific details are shown in Figure 6. Soft-NMS addresses this by introducing a confidence decay factor, which helps preserve the confidence information of overlapping objects to a certain extent and reduces the likelihood of suppressing important objects. Additionally, traditional NMS methods solely rely on the intersection over union (IoU) between bounding boxes for suppression, disregarding the confidence information of the objects [48]. This can cause low-confidence bounding boxes to have a high IoU, while high-confidence bounding boxes may be erroneously suppressed. Soft-NMS adjusts the suppression based on both the confidence and overlap of the bounding boxes, thereby better preserving high-confidence bounding boxes and improving the localization accuracy of objects in foggy weather conditions.

Compared to traditional non-maximum suppression (NMS), Soft-NMS introduces a softening function that gradually reduces the scores of other bounding boxes overlapping with the one having the highest confidence, instead of directly setting their scores to zero [47]. This principle can be represented by the following formula: for a set of input bounding boxes

B = \{b_{1}, b_{2}, . . ., b_{n}\}

,where each bounding box

b_{i}

consists of four coordinates and a confidence score

s_{i}

, Soft-NMS measures their similarity by computing the Intersection over Union (IoU) values between the boxes.

I o U (b_{i}, b_{j}) = \frac{b_{i} \cap b_{j}}{b_{i} \cup b_{j}}

(5)

Then, the scores of each detection box are adjusted based on their similarity. Specifically, for the currently processed detection box

b_{i}

, its final weight is given by:

{ω_{i}}^{*} = \{\begin{matrix} s_{i} & if s_{i} > θ \\ e^{- \frac{{(I o U (b_{i}, b_{k}))}^{2}}{σ}} \cdot s_{k} & otherwise \end{matrix}

(6)

In this equation,

θ

represents a threshold. When

s_{i}

is greater than

θ

, the original score is retained. Otherwise, a Gaussian function is used to suppress other similar detection boxes, with

σ

controlling the rate of weight reduction. The final weight

{ω_{i}}^{*}

is adjusted based on a linear interpolation with the confidence score

s_{i}

of the current detection box.

ω_{i} = (1 - α) s_{i} + α {ω_{i}}^{*}

(7)

Among them,

α

is a parameter that controls the ratio between the adjusted score and the original score. Finally, for each detection box

b_{i}

, the Soft-NMS function adjusts it to

ω_{i}

, where detection boxes with higher similarity will appear with lower weights in the output results, thus avoiding issues such as excessive suppression and exclusion of correct detections. Soft-NMS gradually reduces the scores of overlapping bounding boxes while preserving a certain degree of overlap. This approach allows for better handling of occlusions, blurriness, and overlapping instances in complex environments during object detection tasks. As a result, the selection of object detection boxes becomes more reasonable and stable, enhancing the overall performance of the detection system.

3. Experimental Setup and Results

3.1. Dataset

Insufficient datasets are available for training and testing object detection algorithms under adverse weather conditions, which can adversely affect their performance, particularly those based on a CNN. Additionally, the traditional atmospheric scattering model [15] fails to accurately simulate real-world foggy scenes [36]. To ensure fairness, we selected a total of 8201 images as the training set (V_C_t), sourced from VOC [49] and COCO [50] datasets. For the test set, we utilized V_n_ts [36] and RTTS [51]. RTTS was employed to evaluate the method’s object detection capability in foggy weather conditions, while V_n_ts was used to assess its performance on standard datasets. The dataset encompasses five categories: people, cars, buses, bicycles, and motorcycles. Further details regarding dataset usage are presented in Table 1.

3.2. Experimental Details

The experimental setup of YOLOv5s-Fog is shown in Table 2. During the training process of this study, we employed various effective data augmentation techniques, including MixUP [42] and Mosaic [16]. Additionally, we utilized a cosine learning rate scheduling strategy, setting the initial learning rate to

3 \times 10^{- 4}

, batch size to 16, and conducting 30 iterations.

3.3. Evaluation Metrics

This study evaluates the detection performance of the model using mean Average Precision (mAP). mAP is a metric commonly used to assess the performance of object detection algorithms. It represents the average area under the Precision-Recall curve, which provides a comprehensive evaluation of both the localization accuracy and recognition accuracy of the classifier. A higher mAP value indicates better detection performance of the model. The specific calculation method is as follows:

P = \frac{T P}{T P + F P}

(8)

R = \frac{T P}{T P + F N}

(9)

A P = \sum_{n} (R_{n} - R_{n - 1}) P_{n}

(10)

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(11)

In this context,

T P

represents True Positive,

F P

represents False Positive,

F N

represents False Negative,

R_{n}

denotes Recall,

P_{n}

represents the maximum Precision at that Recall, and N indicates the number of classes.

3.4. Experimental Results

To validate the effectiveness of YOLOv5s-Fog, we compared it with various existing methods for foggy scene object detection, including deep learning-based object detection networks [15,16,17], dehazing methods [33,52], domain adaptation [32,35], and image adaptive enhancement [36]. The specific results are shown in Table 3. Table 4 presents a comprehensive comparison between our proposed approach and state-of-the-art object detection models in terms of experimental results. Figure 7 illustrates the variations of key metrics, including bounding box loss, object loss, and class loss, as well as Precision, Recall, mAP, and mAP50-95 after each epoch during the training and validation process of YOLOv5s-Fog.

The Analysis of Experimental Results

From Table 3, it can be observed that YOLOv5s-Fog outperforms other methods both on conventional weather datasets and foggy weather datasets. Specifically, the combination of deep learning architecture with image-adaptive methods outperforms traditional image dehazing approaches. One notable example is IA-YOLO [36], which achieves performances of 72.65% and 36.73% on V_n_ts [49] and RTTS [51], respectively. This superiority can be attributed to the ability of image-adaptive algorithms to consider different regions within the image and make adaptive adjustments based on regional characteristics and requirements. In contrast, conventional dehazing [35] algorithms typically apply the same processing method to the entire image, without fully considering local variations within the image. IA-YOLO utilizes the YOLOv3 [15] network architecture. To further investigate the impact of network architecture on object detection results in foggy conditions, we conducted experiments using the original YOLOv5s [17]. YOLOv5s achieves significant improvements with performances of 87.56% on V_n_ts and 68% on RTTS compared to IA-YOLO. YOLOv5s incorporates a range of network architecture and techniques, including multi-scale fusion, anchor box design, and classifier optimization. Building upon YOLOv5s, YOLOv5s-Fog introduces additional feature detection layers [40] and a Decoupled Head [45,46] to enhance the network’s ability to explore challenging details in foggy scenes. Additionally, Soft-NMS [47] is employed in the post-processing stage to address occlusion issues in foggy conditions. Ultimately, our proposed method achieves mAP scores of 92.23% and 73.40% on V_n_ts and RTTS, respectively. Furthermore, YOLOv5s-Fog does not heavily focus on image dehazing, maintaining its original end-to-end detection approach and avoiding interference from artificially added noise during the detection phase. Figure 8 showcases partial detection results of the three models that performed well in RTTS. The first row presents IA-YOLO [36], which employs image adaptive techniques to remove specific weather information and restore the underlying content. Although this approach improves detection performance, it introduces undesired noise to the object detector. The second and third rows display the detection results of YOLOv5s and YOLOv5s-Fog, respectively, without image dehazing or image enhancement. It is evident from Figure 8 that YOLOv5s-Fog exhibits excellent detection capabilities in foggy weather conditions and low-light environments. Additionally, YOLOv5s-Fog can identify smaller objects in dense fog more effectively.

Table 4 clearly demonstrates that YOLOv5s-Fog outperforms YOLOv7 [54], both on the V_n_ts and RTTS datasets. The feature enhancement technique of YOLOv5s-Fog is specifically designed for foggy weather scenarios, and it achieves this with a smaller parameter count. In comparison to the YOLOv8 [56] series, YOLOv5s-Fog exhibits greater flexibility. While YOLOv8m and YOLOv8x may have superior performance, YOLOv5s-Fog offers significant advantages in terms of model complexity, as well as shorter training time.

3.5. Ablation Studies

In order to validate the effectiveness of each module, we conducted an ablation study on the RTTS dataset. To ensure the scientific rigor of this paper and comprehensively evaluate the proposed model, we employed three specific metrics: mAP, mAP50-95, and GFLOPs. The impact of each module on the detection results is listed in Table 5. Table 6 documents the detection performance of YOLOv5s-Fog on each object category in the RTTS dataset after incorporating different modules.

3.5.1. The Impact of the Additional Feature Detection Layer

Through experimental validation, we have observed that SwinFocus significantly enhances the model’s mAP. This can be attributed to the adoption of the cross-domain self-attention mechanism during training, enabling the model to capture global features more effectively. Despite the introduction of additional object detection layers, which increase the model’s parameters and computational burden, it is justified considering the application scenarios in adverse weather conditions such as foggy weather. Table 5 demonstrates its notable performance improvements.

3.5.2. The Impact of the Decoupled Head

By incorporating the Decoupled Head, the total number of layers in the model increased by 12, and the GFLOPs rose by 1.2. The adoption of the Decoupled Head not only enhances mAP but also enables adaptability to diverse object detection tasks and datasets, showcasing excellent scalability.

3.5.3. The Impact of Soft-NMS

For object detection in foggy conditions, Soft-NMS primarily functions to address densely overlapping instances in large quantities. In Figure 9, we present the detection results of YOLOv5s-Fog on the RTTS dataset. Compared to traditional NMS, Soft-NMS exhibits superior handling of similar objects in complex environments, highlighting its significant advantage.

4. Conclusions

In this paper, we propose YOLOv5s-Fog, a novel approach to address the challenges of object detection under foggy conditions. Unlike previous research, we do not rely on dehazing or adaptive enhancement techniques applied to the original images. Instead, we enhance the YOLOv5s model by introducing additional detection layers and integrating advanced modules. Our improved model demonstrates higher accuracy in foggy conditions. Experimental results show the potential of our proposed method in object detection tasks under adverse weather conditions. In the future, we plan to invest more efforts in constructing datasets for object detection in extreme weather conditions and develop more efficient network architectures to enhance the model’s accuracy in extreme weather detection.

Author Contributions

Conceptualization: L.F.; methodology: L.F.; investigation: L.F. and X.M.; software: X.M. and Y.L.; supervision: L.F. and Y.L.; visualization: X.M.; writing—original draft: X.M.; writing—review and editing: L.F., Y.L. and X.M.; funding acquisition: J.F.; validation: J.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Academician Mao Ming Workstation and the National Industrial Innovation Center of Intelligent Equipment (XS-JSFW-KCZNJS-202303-001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Acknowledgments

The authors would like to acknowledge the anonymous reviewers and editors whose thoughtful comments helped to improve this manuscript.

Conflicts of Interest

There are no conflicts of interest associated with the publication of this paper.

References

Bijelic, M.; Gruber, T.; Mannan, F.; Kraus, F.; Ritter, W.; Dietmayer, K.; Heide, F. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11682–11692. [Google Scholar]
Walambe, R.; Marathe, A.; Kotecha, K.; Ghinea, G. Lightweight object detection ensemble framework for autonomous vehicles in challenging weather conditions. Comput. Intell. Neurosci. 2021, 2021, 5278820. [Google Scholar] [CrossRef]
Liu, Z.; He, Y.; Wang, C.; Song, R. Analysis of the influence of foggy weather environment on the detection effect of machine vision obstacles. Sensors 2020, 20, 349. [Google Scholar] [CrossRef] [Green Version]
Hahner, M.; Sakaridis, C.; Dai, D.; Van Gool, L. Fog simulation on real LiDAR point clouds for 3D object detection in adverse weather. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 15283–15292. [Google Scholar]
Krišto, M.; Ivasic-Kos, M.; Pobar, M. Thermal object detection in difficult weather conditions using YOLO. IEEE Access 2020, 8, 125459–125476. [Google Scholar] [CrossRef]
He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar]
Zhu, Q.; Mai, J.; Shao, L. A fast single image haze removal algorithm using color attenuation prior. IEEE Trans. Image Process. 2015, 24, 3522–3533. [Google Scholar] [PubMed] [Green Version]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS 2015), Montreal, QC, Canada, 11–12 December 2015. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; Van Gool, L. Domain adaptive faster r-cnn for object detection in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3339–3348. [Google Scholar]
Yao, J.; Fan, X.; Li, B.; Qin, W. Adverse Weather Target Detection Algorithm Based on Adaptive Color Levels and Improved YOLOv5. Sensors 2022, 22, 8577. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Jocher, G.; Stoken, A.; Borovec, J.; Chaurasia, A.; Changyu, L.; Hogan, A.; Hajek, J.; Diaconu, L.; Kwon, Y.; Defretin, Y.; et al. ultralytics/yolov5: V5. 0-YOLOv5-P6 1280 models, AWS, Supervise. ly and YouTube integrations. Zenodo 2021. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Qiu, S.; Li, Y.; Zhao, H.; Li, X.; Yuan, X. Foxtail Millet Ear Detection Method Based on Attention Mechanism and Improved YOLOv5. Sensors 2022, 22, 8206. [Google Scholar] [CrossRef]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
Fan, Y.; Zhang, S.; Feng, K.; Qian, K.; Wang, Y.; Qin, S. Strawberry maturity recognition algorithm combining dark channel enhancement and YOLOv5. Sensors 2022, 22, 419. [Google Scholar] [CrossRef]
Hameed, Z.; Wang, C. Edge detection using histogram equalization and multi-filtering process. In Proceedings of the 2011 IEEE International Symposium of Circuits and Systems (ISCAS), Rio de Janeiro, Brazil, 15–18 May 2011; pp. 1077–1080. [Google Scholar]
Häser, M.; Almlöf, J. Laplace transform techniques in Mo/ller–Plesset perturbation theory. J. Chem. Phys. 1992, 96, 489–494. [Google Scholar] [CrossRef]
Dai-xian, Z.; Zhe, S.; Jing, W. Face recognition method combined with gamma transform and Gabor transform. In Proceedings of the 2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Ningbo, China, 19–22 September 2015; pp. 1–4. [Google Scholar]
Fu, C.Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. Dssd: Deconvolutional single shot detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10781–10790. [Google Scholar]
Baidya, R.; Jeong, H. YOLOv5 with ConvMixer Prediction Heads for Precise Object Detection in Drone Imagery. Sensors 2022, 22, 8424. [Google Scholar] [CrossRef]
Ng, D.; Chen, Y.; Tian, B.; Fu, Q.; Chng, E.S. Convmixer: Feature interactive convolution with curriculum learning for small footprint and noisy far-field keyword spotting. In Proceedings of the ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022; pp. 3603–3607. [Google Scholar]
Wen, G.; Li, S.; Liu, F.; Luo, X.; Er, M.J.; Mahmud, M.; Wu, T. YOLOv5s-CA: A Modified YOLOv5s Network with Coordinate Attention for Underwater Target Detection. Sensors 2023, 23, 3367. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Huang, S.C.; Le, T.H.; Jaw, D.W. DSNet: Joint semantic learning for object detection in inclement weather conditions. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2623–2633. [Google Scholar] [CrossRef]
Dong, H.; Pan, J.; Xiang, L.; Hu, Z.; Zhang, X.; Wang, F.; Yang, M.H. Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2157–2167. [Google Scholar]
Guo, C.; Li, C.; Guo, J.; Loy, C.C.; Hou, J.; Kwong, S.; Cong, R. Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1780–1789. [Google Scholar]
Hnewa, M.; Radha, H. Multiscale domain adaptive yolo for cross-domain object detection. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 3323–3327. [Google Scholar]
Liu, W.; Ren, G.; Yu, R.; Guo, S.; Zhu, J.; Zhang, L. Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; pp. 1792–1800. [Google Scholar] [CrossRef]
Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. Aod-net: All-in-one dehazing network. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4770–4778. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017.
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
Ancuti, C.; Ancuti, C.O.; Timofte, R. Ntire 2018 challenge on image dehazing: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 891–901. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS–improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]
Assefa, A.A.; Tian, W.; Acheampong, K.N.; Aftab, M.U.; Ahmad, M. Small-scale and occluded pedestrian detection using multi mapping feature extraction function and Modified Soft-NMS. Comput. Intell. Neurosci. 2022, 2022, 9325803. [Google Scholar] [CrossRef] [PubMed]
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
Li, B.; Ren, W.; Fu, D.; Tao, D.; Feng, D.; Zeng, W.; Wang, Z. Benchmarking single-image dehazing and beyond. IEEE Trans. Image Process. 2018, 28, 492–505. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liu, X.; Ma, Y.; Shi, Z.; Chen, J. Griddehazenet: Attention-based multi-scale network for image dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7314–7323. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Aboah, A.; Wang, B.; Bagci, U.; Adu-Gyamfi, Y. Real-time multi-class helmet violation detection using few-shot data sampling technique and yolov8. arXiv 2023, arXiv:2304.08256. [Google Scholar]
Terven, J.; Cordova-Esparza, D. A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond. arXiv 2023, arXiv:2304.00501. [Google Scholar]

Figure 1. Operational procedure of YOLOv5s-Fog. This framework incorporates an augmented predictive feature layer to bolster the network’s regional comprehension. Additionally, we employ a decoupled head to effectively address scenarios characterized by diminished contrast and indistinct boundaries. Lastly, the Soft-NMS technique is employed for the integration of bounding boxes.

Figure 2. The network architecture of YOLOv5s-Fog introduces the following enhancements compared to the original version: addition of a target detection layer called SwinFocus based on Swin Transformer; use of a decoupled detection head to accomplish the final stage of the detection task. (The numbers 1–28 correspond to the numbering of each layer in the network architecture.)

Figure 3. Swin Transformer Architecture.

Figure 4. Two consecutive SwinFocus layers are employed in our approach. The SwinFocus layer introduces the concept of Windows Multi-head Self Attention (W-MSA), which significantly reduces computational complexity compared to the traditional Multi-head Self Attention (MSA). However, W-MSA performs self-attention calculations within each individual window. To enable information propagation between different windows, we further introduce the Shifted Windows Multi-Head Self-Attention (SW-MSA) mechanism.

Figure 5. Decoupled Head Structure. Decoupled head is a multi-task learning approach that divides object detection into two steps: image classification and object localization within the image.(YOLOv5s-Fog incorporates four detection heads).

Figure 6. The issues that can occur during the post-processing stage of NMS. In (a), there are two reliable pedestrian detections (green bounding box and red bounding box) with scores of 0.85 and 0.75, respectively. However, due to the significant overlap between the green and red bounding boxes, the green bounding box is assigned a lower score. The situation in (b) is similar to that in (a).

Figure 7. The visualization of various metrics during the training process, including bounding box loss, object loss, class loss, Precision, Recall, mAP, and mAP50-95.

Figure 8. Partial detection results of IA-YOLO, YOLOv5s, and YOLOv5s-Fog on RTTS are shown below. The first row corresponds to IA-YOLO, the second row corresponds to YOLOv5s, and the third row corresponds to YOLOv5s-Fog.

Figure 9. Visualization of the detection results of YOLOv5s-Fog on the RTTS dataset. The green, blue, and red boxes represent true positive (TP), false positive (FP), and false negative (FN) detections, respectively.

Table 1. The relevant datasets used for training and testing purposes include V_C_t from VOC and COCO, V_n_ts from VOC2007_test, and RTTS, which is currently the only real-world foggy scene object detection dataset with multi-class detection labels.

Dataset	Image	Ps	Car	Bus	Bicycle	Motorcycle	Total
V_C_t	8201	14,012	3471	850	1478	1277	21,088
V_n_ts	2734	4528	337	1201	213	325	6604
RTTS	4322	7950	18,413	1838	534	862	29,597

Table 2. Experimental Setup of YOLOv5s-Fog.

Configuration	Parameter
CPU	Intel Xeon(R) CPU E5-2678 v3
GPU	Nvidia Titan Xp*2
Pytorch	1.12
CUDA	11.1
cuDNN	8.5.0

Table 3. Comparison of the performance of each method on the conventional dataset (V_n_ts) and the foggy weather dataset (RTTS). The rightmost two columns present the mAP(%) on the two test datasets, including V_n_ts and RTTS.

Methods	V_n_ts	RTTS
YOLOv3 [15]	64.13	28.82
YOLOv3-SPP [53]	70.10	30.80
YOLOv4 [16]	79.84	35.15
MSBDN [33]	/	30.20
GridDehaze [52]	/	32.41
DAYOLO [35]	56.51	29.93
DSNet [32]	53.29	28.91
IA-YOLO [36]	72.65	36.73
YOLOv5s [17]	87.56	68.00
YOLOv5s-Fog	92.23	73.40

Table 4. The experimental results present a comprehensive comparison between YOLOv5s-Fog and the state-of-the-art object detection algorithms, namely YOLOv7 and the YOLOv8 series. The “#Param.” is used to denote the number of model parameters. GFLOPs is a metric that measures the computational complexity or efficiency of the model. It represents the number of floating-point operations executed by the model per second, indicating the computational requirements.

Method	V_n_ts	RTTS	#Param.	GFLOPs
YOLOv7 [54]	92.83	72.16	36.9 M	104.70
YOLOv8n [55]	87.43	68.52	3.2 M	8.70
YOLOv8s	88.52	70.68	11.2 M	28.60
YOLOv8m	90.39	72.71	25.9 M	78.90
YOLOv8l	93.76	73.58	43.7 M	165.20
YOLOv8x	93.80	73.71	68.2 M	257.80
YOLOv5s-Fog	92.23	73.40	26.9 M	59.0

Table 5. The ablation experiment on the RTTS Dataset. (The green arrows and numbers indicate the specific improvement values of mAP after the addition of the module.)

Methods	mAP (%)	mAP50-95 (%)	GFLOPs
YOLOv5s	68.00	41.17	15.8
YOLOv5s + SwinFocus	70.15 (↑2.15)	43.40 (↑2.23)	56.2
YOLOv5s + SwinFocus + Decoupled Head	71.79 (↑1.64)	44.38 (↑0.98)	57.4
YOLOv5s + SwinFocus + Decoupled Head + Soft-NMS	73.40 (↑1.61)	45.58 (↑1.20)	59.0

Table 6. The impact of incorporating the component on the Precision (P) and Recall (R) of the model was evaluated on the RTTS dataset. Among the variants, YOLOv5s-Fog_1 represents the combination of YOLOv5s and SwinFocus. YOLOv5s-Fog_2 includes YOLOv5s, SwinFocus, and a Decoupled Head. Lastly, YOLOv5s-Fog_3 combines YOLOv5s, SwinFocus, a Decoupled Head, and Soft-NMS.

Methods	P						R
Methods	All	Person	Car	Bus	Bicycle	Motorcycle	All	Person	Car	Bus	Bicycle	Motorcycle
YOLOv5s	0.87	0.912	0.926	0.795	0.86	0.856	0.489	0.725	0.504	0.318	0.485	0.413
YOLOv5s-Fog_1	0.74	0.69	0.911	0.753	0.647	0.7	0.635	0.641	0.632	0.496	0.697	0.712
YOLOv5s-Fog_2	0.88	0.924	0.938	0.835	0.83	0.88	0.55	0.735	0.51	0.397	0.614	0.493
YOLOv5s-Fog_3	0.78	0.851	0.762	0.675	0.81	0.807	0.70	0.809	0.793	0.601	0.694	0.631

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Meng, X.; Liu, Y.; Fan, L.; Fan, J. YOLOv5s-Fog: An Improved Model Based on YOLOv5s for Object Detection in Foggy Weather Scenarios. Sensors 2023, 23, 5321. https://doi.org/10.3390/s23115321

AMA Style

Meng X, Liu Y, Fan L, Fan J. YOLOv5s-Fog: An Improved Model Based on YOLOv5s for Object Detection in Foggy Weather Scenarios. Sensors. 2023; 23(11):5321. https://doi.org/10.3390/s23115321

Chicago/Turabian Style

Meng, Xianglin, Yi Liu, Lili Fan, and Jingjing Fan. 2023. "YOLOv5s-Fog: An Improved Model Based on YOLOv5s for Object Detection in Foggy Weather Scenarios" Sensors 23, no. 11: 5321. https://doi.org/10.3390/s23115321

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLOv5s-Fog: An Improved Model Based on YOLOv5s for Object Detection in Foggy Weather Scenarios

Abstract

1. Introduction

2. YOLOv5s-Fog

2.1. Overview of YOLOv5

2.2. The Architecture of YOLOv5s-Fog Network

2.3. Construction of Object Detection Model for Foggy Scenes

2.3.1. The Swin Transformer Architecture

2.3.2. Decoupled Head

2.3.3. Soft-NMS

3. Experimental Setup and Results

3.1. Dataset

3.2. Experimental Details

3.3. Evaluation Metrics

3.4. Experimental Results

The Analysis of Experimental Results

3.5. Ablation Studies

3.5.1. The Impact of the Additional Feature Detection Layer

3.5.2. The Impact of the Decoupled Head

3.5.3. The Impact of Soft-NMS

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI