Enhanced Non-Maximum Suppression for the Detection of Steel Surface Defects

Kang, Seong-Hwan; Palakonda, Vikas; Kim, Il-Min; Kang, Jae-Mo; Yun, Sangseok

doi:10.3390/math11183898

Open AccessArticle

Enhanced Non-Maximum Suppression for the Detection of Steel Surface Defects

by

Seong-Hwan Kang

¹,

Vikas Palakonda

¹,

Il-Min Kim

²,

Jae-Mo Kang

^1,*

and

Sangseok Yun

^3,*

¹

Department of Artificial Intelligence, Kyungpook National University, Daegu 41566, Republic of Korea

²

Department of Electrical and Computer Engineering, Queen’s University, Kingston, ON K7L 3N6, Canada

³

Department of Information and Communications Engineering, Pukyong National University, Busan 48513, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Mathematics 2023, 11(18), 3898; https://doi.org/10.3390/math11183898

Submission received: 21 July 2023 / Revised: 3 September 2023 / Accepted: 9 September 2023 / Published: 13 September 2023

(This article belongs to the Special Issue Object Detection: Algorithms, Computations and Practices)

Download

Browse Figures

Versions Notes

Abstract

:

Quality control in manufacturing equipment relies heavily on the detection of steel surface defects. Recently, there have been an increasing number of efforts in which object detection techniques have been utilized to achieve promising results in the detection of steel surface defects since the defect patterns can be considered objects. To enhance the detection performance in the object detection problem, the non-maximum suppression (NMS) step, which eliminates redundant boxes overlapped with a box having the greatest detection score, is essential. In this work, we propose a novel NMS to improve the detection method of steel surface defects. The proposed NMS approach is composed of three novel techniques: IoU regularization, threshold adjustment, and comparison rule modification to enhance the detection performance. To evaluate the performance of the proposed NMS, we carry out extensive numerical experiments using the YOLOv7 and EfficientDet models on the steel surface defect datasets, NEU-DET and GC10-DET. The experimental results demonstrate that the proposed NMS outperforms the conventional NMS methods in both quantitative and qualitative manners.

Keywords:

computer vision; deep learning; non-maximum suppression; object detection; steel surface defect

MSC:

68T45

1. Introduction

In the manufacturing industry, the quality of products is highly dependent on the quality of the product surface [1], which is easily damaged during the production process, and thus, the detection of surface defects is crucial [2]. Recently, defect inspection on steel surfaces has gained increased attention as steel plates are a common material in numerous products. Metallurgical and mechanical flaws can result in a variety of defects on steel surfaces, including stains, cracks, pits, inclusions, and scratches [3]. These surface defects degrade the appearance and harm the fatigue strength of steel plates, resulting in poor-quality steel products. Moreover, quality issues affect not only the reputation and economy of manufacturers but also the safety and reliability of products for end-users [4].

Traditional defect detection methods mainly relied on manual inspection, which requires workers to work long hours at high intensity. Thus, traditional methods suffer from drawbacks such as low effectiveness, unreliable detection, high costs, and potential safety risks. To address these issues, recently, the inspection of surface defects using computer vision techniques has been widely adopted to replace manual inspection [3,4,5]. In particular, due to the recent advances in convolutional neural networks (CNNs) [6], CNNs have been actively adopted to detect various types of surface defects by regarding the surface defects as objects [7].

The CNN-based defect (object) detection approaches, in general, fall into two categories: the segmentation-based approach [8,9,10] and the detection-based approach [11,12]. The segmentation-based approaches have the following drawbacks. First, in the supervised segmentation approaches [8,9], pixel-level classification is conducted, which in turn requires the pixel-level ground truth in the training process, which is usually very expensive to acquire. Second, in the unsupervised segmentation approaches, a very sensitive thresholding process should often be involved in the segmentation algorithms [10]. On the other hand, the detection-based methods [11,12] only require annotating defect areas with bounding boxes (BBoxes) for training since they are threshold-free. Combined with Fast R-CNN [13], YOLO [14,15,16], or SSD [17] as the backbone network, these detection-based approaches have been extensively studied in the field of defect (object) detection [4,5,18,19,20,21].

Specifically, in [5], an end-to-end steel surface defect identification system was proposed based on fusing multiple hierarchical features. The authors adopted a convolution neural network (CNN) [6] to achieve a strong classification ability and proposed a multi-level feature fusion network to effectively combine multiple hierarchical (lower-level and higher-level) features into one feature. Additionally, a region proposal network (RPN) [22] is adopted based on the multi-level features to generate regions of interest (ROIs), and a detector consisting of a classifier and a bounding box regressor is adopted for each ROI to produce final results. In [18], a fast and accurate one-stage detector, called SSDNet, was proposed to detect steel surface defects where a feature retaining block (FRB) was first proposed to preserve surface defects’ texture information systematically. Specifically, the FRB is applied after the pooling layer to compensate for downsampling-induced texture loss, and a skip-densely connected module (SDCM) is designed to carry the fine-grained information of small surface defects to the high-level layers [18] by skipping connections between early and later prediction layers.

In [19], a steel surface defect detection system based on a fused-attention network (FANet) was proposed, which employs an attention mechanism on a single balanced feature map rather than on multiple feature maps. In particular, an adaptively balanced feature fusion (ABFF) and a fused-attention module (FAM) were respectively proposed in [19] to handle defects with multiple scales and to locate/classify defects more accurately. Additionally, a modified YoloV4 algorithm in [20] for defect identification on steel surfaces proposed the adoption of a convolutional block attrition module (CBAM) [23] and a custom receptive field block (RFB) [24] to enhance classification and feature reconstruction.

In [4], DCC-CenterNet, a steel surface defect detector was developed that uses keypoint estimation to specify center points and regresses other defect properties. Specifically, to enlarge the detector’s receptive field, a dilated feature enhancement model is proposed, and a novel center-weighting function is proposed to make keypoint estimation more accurate. Moreover, in [21], a lightweight feature fusion network, LFF-YOLO, was proposed for identifying steel surface defects with improved detection accuracy and speed. In LFF-YOLO, to accommodate defects at different scales, an adaptive receptive field feature extraction (ARFFE) method was designed to obtain features for adaptive sensory fields. In addition, a lightweight feature pyramid network (LFPN) was proposed, introducing fewer parameters to improve the fusion efficiency of multiscale features [21].

Object detection techniques generally consist of the following three-step procedure: (i) generating BBoxes for specified object categories, (ii) refining or scoring the BBoxes with a regressor or classifier, and (iii) merging BBoxes belonging to the same object together. The final step in the object detection procedure is commonly referred to as non-maximum suppression (NMS) [13,14,15]. The NMS procedure can improve the detection accuracy by eliminating the redundant boxes, and thus, it has been integrated as a post-processing method in many object detection algorithms in computer vision, including edge detection [25], feature point detection [26,27], face detection [28], object detection [13,14,15], etc.

The greedy NMS [29], in which the bounding box with the highest detection score is selected and the boxes surrounding it are suppressed by applying a pre-defined threshold, has demonstrated improved performance over the approach used in [28]. Since then, the greedy NMS and its extensions have been widely adopted as a post-processing step in object detection [13,14,15,30]. However, the existing NMS approaches leave many false positive BBoxes, i.e., normal (flawless) products would be misclassified as defective and discarded. This means that the production costs of the manufacturing procedure are likely to rise due to misclassification (false positives) in the NMS procedure. In this work, therefore, we develop an enhanced NMS approach that consists of three novel and innovative techniques, i.e., IoU regularization, threshold adjustment, and comparison rule modification. Specifically, in the IoU regularization process, we introduce a regularizing parameter that can regularize the size of bounding boxes. It will be shown in the experimental results that by using a fine-tuned regularization parameter, the proposed NMS not only successfully suppresses more false positives in the detection of steel surface defects but also outperforms the comparable NMS approaches in both quantitative and qualitative manners.

In the following, we first briefly summarize the related works on NMS, and then we discuss the motivation and contributions of the proposed NMS in detail.

1.1. Related Works

The greedy NMS in [29] is a well-known NMS technique in which the BBox with the highest detection score is selected, and the BBoxes surrounding or surrounded by it are suppressed in accordance with a predefined thresholding algorithm. Specifically, the greedy NMS starts with a list of detection boxes

B = {b_{1}, b_{2}, \dots, b_{K}}

with corresponding confidence scores

S = {s_{1}, s_{2}, \dots, s_{K}}

where K is the number of initially detected BBoxes. The first step of the greedy NMS is to remove the box with the highest confidence score among

B

(resp., the highest confidence score among

s

), i.e.,

b_{m}

(resp.,

s_{m}

) for

m = {arg max}_{i} s_{i}

, from

B

(resp.,

s

), and to include

b_{m}

in the set of the finally detected BBoxes,

D

. The second step is to calculate the values of the intersection over union (IoU) between the boxes in the set

B

and the box

b_{m}

, and to suppress the boxes in

B

with IoU exceeding the predefined threshold, which is hereafter called the IoU threshold. These two steps are repeated for the remaining boxes until

B

becomes empty.

In [31], Soft NMS was proposed as a single-line modification to the greedy NMS approach. In Soft NMS, instead of immediately suppressing the detection boxes whose IoU exceeds an IoU threshold, the confidence score is reduced using a Gaussian penalty function. Then, the bounding boxes are suppressed if the confidence score drops below a confidence score threshold. In [32], DIoU (distance IoU) NMS was proposed, which extends DIoU Loss to NMS. DIoU NMS uses the Euclidean distance of the center coordinates between two boxes and the diagonal distance of two boxes as a penalty term in the IoU calculation process. As a result, the farther the center coordinates between the boxes and/or the shorter the diagonal distance, the smaller the IoU. That is, by considering the overlap area as well as the distance between central points of two bounding boxes for suppressing redundancy, DIoU MNS is more robust to the cases with occlusions. In [33], Exponential NMS was proposed, which adjusts the sizes of the bounding boxes using an exponential operation in the IoU calculation process. By adjusting this, the problem of suppression failure, which occurs when the difference between the sizes of two BBoxes is large, can be effectively resolved. Meanwhile, a novel method for fusing predicted BBoxes, called weighted boxes fusion (WBF), was proposed in [30]. It was shown in [30] that by ensembling BBoxes with the corresponding confidence score instead of suppressing them, the performance of object detection could be improved. Finally, in [34], Confluence NMS was proposed, which utilizes a normalized Manhattan distance-inspired proximity metric instead of the confidence score for clustering BBoxes.

1.2. Motivations and Contributions

The conventional NMS approaches discussed above, however, have critical shortcomings as follows. First, in the greedy NMS, the confidence scores of boxes overlapping with the box that have the greatest confidence score are set to zero. That is, only the boxes overlapping with the most confident box can be suppressed. This would result in a suppression failure when some false positive boxes have IoUs that fall behind the IoU threshold. Furthermore, the Soft NMS and DIoU NMS will likely eliminate fewer boxes. As a result, many false positive BBoxes could be obtained in the qualitative analysis.

In addition, the conventional NMS schemes are validated only on the MS-COCO [35] dataset. Though the MS-COCO dataset is one of the most frequently used datasets for object detection, there are distinct differences between the MS-COCO dataset and the steel surface defect dataset, e.g., the NEU-DET dataset as shown in Figure 1. Thus, the performances of the NMS algorithms for steel surface defect detection should be evaluated and compared on a practical steel surface defect dataset such as the NEU-DET dataset [36] or GC10-DET [37].

Motivated by these drawbacks, in this paper, we propose a novel NMS approach that effectively improves the performance of the object detection algorithm, especially for detecting steel surface defects. The main contributions of this paper are listed as follows:

We propose an enhanced NMS approach that consists of three novel and innovative techniques, IoU regularization, threshold adjustment, and comparison rule modification, to enhance the detection performance by strictly suppressing false positives.
We carry out extensive experiments on the popular steel surface defect datasets NEU-DET and GC10-DET to evaluate the performance of the proposed NMS method for the YOLOv7 and EfficientDet models and compare it to the conventional NMS approaches. The experimental results demonstrate that the proposed method outperforms the comparable NMS approaches in quantitative and qualitative manners.

1.3. Organization

The remainder of the paper is organized as follows. Section 2 introduces the framework of the proposed NMS algorithm. Section 3 presents the experimental results and discussions, and finally, Section 4 concludes the paper with future research directions.

2. Proposed Scheme

In this section, the description of the proposed NMS approach is presented. First, to point out the difference between the traditional NMS and the proposed NMS algorithms, we briefly introduce the procedure of the traditional NMS (the greedy NMS). The greedy NMS algorithm can be written as follows:

s_{i} = \{\begin{matrix} s_{i}, & if I o U (M, b_{i}) < N_{t}, \\ 0, & if I o U (M, b_{i}) \geq N_{t}, \end{matrix}

(1)

for

I o U (M, b_{i}) = \frac{M \cap b_{i}}{M \cup b_{i}}

(2)

where

b_{i}

and

s_{i}

are the ith BBox and its corresponding confidence score, and

N_{t}

is the IoU threshold. In addition, M denotes the BBox having the highest confidence score among

B

, i.e.,

M = b_{m}

where

m = {arg max}_{i} s_{i}

. As shown in Equation (1), if the ratio of overlap between the M and

b_{i}

is higher than a specified threshold,

N_{t}

, then the BBox

b_{i}

is suppressed, or its confidence score is set to zero.

Next, we present the proposed NMS algorithm. In this work, to enhance the performance of steel surface defect detection, we focus on eliminating false positives in the predicted BBoxes. To this end, we modify the traditional NMS to strictly suppress false positives. The proposed modification is three-fold: (1) IoU regularization, (2) threshold adjustment, and (3) comparison rule modification. The key components of the proposed NMS are described as follows.

2.1. IoU Regularization

Firstly, in the proposed NMS method, a novel regularized IoU is developed instead of the original IoU. The regularized IoU between two BBoxes M and

b_{i}

can be characterized by a regularizing parameter

λ

as follows:

r I o U (M, b_{i}; λ) = \frac{\frac{b_{L} \cap b_{S}}{2}}{λ b_{L} + (1 - λ) b_{S} - \frac{b_{L} \cap b_{S}}{2}},

(3)

for

0 < λ < \frac{A_{S}}{A_{L} + A_{S}}

where

b_{L}

(resp.,

b_{S}

) means the larger (resp., smaller) box of M and

b_{i}

, and

A_{L}

(resp.,

A_{S}

) means the area of the larger (resp., smaller) box. The rationale behind this is as follows: suppose that the original IoU is adopted and the IoU threshold is 0.5; if the sizes of M and

b_{i}

are the same, then the two boxes must overlap by at least two-thirds to suppress the less confident box,

b_{i}

; however, if the sizes of M and

b_{i}

are twice the difference, the two boxes must overlap completely to remove the less confident box,

b_{i}

. That is, by regularizing the sizes of two compared boxes, more false positives can be suppressed.

As depicted in Equation (3), the parameter

λ

helps to regularize the sizes of

b_{i}

and

b_{j}

. The parameter

λ

is set to a number less than

\frac{A_{S}}{A_{L} + A_{S}}

to keep the order in the size of the box. Further, as defined in Equation (3), the intersection area is divided into half to match the size, which was refined by extensive experiments. By dividing this, it is possible to suppress more false positives even when there is a large difference in the sizes of the compared BBoxes.

2.2. Threshold Adjustment

Secondly, the proposed NMS method exploits the confidence scores of BBoxes to further suppress false positives by intelligently adjusting the IoU threshold. As shown in Equation (1), the greedy NMS adopts a constant IoU threshold regardless of the confidence scores of the compared boxes. Otherwise, in the proposed NMS method, we have adjusted the IoU threshold according to the compared boxes with the corresponding confidence scores.

The proposed IoU threshold for the boxes M and

b_{i}

is given by

N_{t} (M, b_{i}) = N_{t} \times [1 - (s_{m} - s_{b_{i}})]

(4)

where

N_{t}

is the base IoU threshold. From the definition of index m, i.e.,

m = {arg max}_{i} s_{i}

, it is always satisfied that

s_{m} \geq s_{b_{i}}

. Thus, the adjusted IoU threshold decreases by the difference between the confidence scores. Consequently, boxes with low confidence scores could be suppressed even if the IoU values between the most confident box and themselves are slightly less than the base IoU threshold.

2.3. Suppression Rule Modification

In the greedy NMS method, when the IoU of M and

b_{i}

exceeds the IoU threshold

N_{t}

, the box

b_{i}

is suppressed from

B

. Once

b_{i}

is suppressed, it can no longer be compared to the other BBoxes in

B

. However, the fact that the IoU between M and

b_{i}

exceeds the IoU threshold indicates that

b_{i}

is also a most likely correct prediction. In the proposed NMS, therefore, instead of immediately suppressing

b_{i}

, it is proposed to temporarily retain the suppressed box

b_{i}

and to further suppress any other box with an IoU (between

b_{i}

and itself) exceeding the IoU threshold. The detailed comparison rule is summarized in Algorithm 1 shown below.

Algorithm 1: Modified comparison rule

First, initialize the temporary sets

T_{b}

and

T_{s}

as empty sets, and set the list of final detection boxes,

D = B

(lines 7 and 8). Next, find the most confident box M, and remove it from

B

(lines 10–12). Then, check whether the IoU between M and any box

b_{i}

in

B

exceeds the IoU threshold or not. If so, instead of deleting

b_{i}

from

B

as in the original NMS,

b_{i}

and

s_{i}

are copied into the temporary sets

T_{b}

and

T_{s}

, respectively. Repeat lines 10–18 until set

B

becomes empty. Finally, remove the elements in

T_{b}

(resp.,

T_{s}

) from

D

(resp.,

S

). Then, sets

D

and

S

, respectively, become the set of final detection boxes and the set of the corresponding confidence scores (Lines 20 and 21).

It is noteworthy that by replacing

I o U (M, b_{i})

and

N_{t}

with

r I o U (M, b_{i}; λ)

in (3) and

N_{t} (M, b_{i})

in (4), respectively, one can readily extend Algorithm 1 to the entire procedure of the proposed NMS scheme.

3. Experimental Results and Discussions

In this section, we examine the performance of the proposed NMS and compare it with the conventional NMS approaches, i.e., the greedy NMS [29], DIoU NMS [32], Soft NMS [31], Exponential NMS [33], WBF [30], and Confluence NMS [34]. To this end, we evaluate these NMS methods on a dataset of steel surface defects using the well-known object detection models YOLOv7 and EfficientDet. The details of the datasets and the hyperparameters of the models are described in the later sections.

3.1. Dataset Description

In this paper, we adopt two popular steel surface defect datasets for the purpose of cross-validation of the performance of the proposed NMS. The first dataset is the NEU-DET [36] dataset, which is the most popular steel surface defect dataset from Northeastern University. The NEU-DET dataset has 1800 gray-scale images in six classes (i.e., each class has 300 images with

200 \times 200

resolution). Figure 2 illustrates the examples for the six classes of the NEU-DET dataset, including crazing, inclusion, patches, pitted surface, rolled-in scale, and scratches. Among the 1800 images, 1260, 180, and 360 images are used for training, validation, and test images, respectively.

The second dataset is the GC10-DET [37] steel surface defect dataset from Tianjin University. The GC10-DET dataset has 2300 gray-scale images in ten classes. Unlike the NEU-DET dataset, the GC10-DET dataset is unbalanced, i.e., the defect classes have different numbers of images. Figure 3 illustrates the examples for the ten classes of the GC10-DET dataset, including punching, weld line, crescent gap, water spot, oil spot, silk spot, inclusion, rolled pit, crease, and waist folding. Among the 2300 images, 1610, 230, and 460 images are used for training, validation, and test images, respectively.

3.2. Hyperparameters for Models and NMS Schemes

In this work, we conduct experiments on two well-known real-time object detection models, YOLOv7 [16] and EfficientDet [38], with several BoF (Bag of Freebies) and BoS (Bag of Specials) techniques. Specifically, for the YOLOv7 model, the mosaic data augmentation technique is adopted while the EfficientDet mode utilizes the cosine annealing scheduler [39]. Furthermore, in the case of the EfficientDet model, the d0 model is employed with consideration for the dataset size. The details of models including the number of parameters, fine-tuned hyperparameters, augmentation methods, BoF, and BoS for each model are summarized in Table 1.

The IoU threshold and the confidence score threshold for all NMS schemes are set to 0.5 and 0.25, respectively. In addition, the Gaussian penalty parameter

σ

in Soft NMS and Confluence NMS, the exponent parameter n in Exponential NMS, the confluence threshold

C_{t}

in Confluence NMS, and the regularizing parameter

λ

in the proposed NMS for each pair of object detection model and dataset are summarized in Table 2.

3.3. Performance Evaluation Metric

In this paper, we utilize recall, precision, and F1-score as evaluation metrics to verify the performance of the proposed NMS approach. In addition, for a more thorough comparison, the three metrics are further divided into micro-average and macro-average. The mathematical formulation for the metrics of micro-average recall, micro-average precision, and micro-average F1-score are presented in Equations (5)–(7).

Micro-Average Recall

$M i c r o_{R} = \frac{\sum_{i = 1}^{N} T P_{i}}{\sum_{i = 1}^{N} T P_{i} + \sum_{i = 1}^{N} F N_{i}},$

(5)
Micro-Average Precision

$M i c r o_{P} = \frac{\sum_{i = 1}^{N} T P_{i}}{\sum_{i = 1}^{N} T P_{i} + \sum_{i = 1}^{N} F P_{i}},$

(6)
Micro-Average F1-score

$M i c r o_{F_{1}} = 2 \times \frac{M i c r o_{R} \times M i c r o_{P}}{M i c r o_{R} + M i c r o_{P}},$

(7)

where N is the number of classes in the dataset (e.g.,

N = 6

for the NEU-DET dataset and

N = 10

for the GC10-DET dataset) and

T P_{i}

(resp.,

F P_{i}

) denotes true positives (false positives, resp.) in class i, i.e., boxes predicted as class i and with an IoU (with the ground truth) exceeding (resp., falling behind) 0.5. In addition,

F N_{i}

represents false negatives in class i, meaning the objects in class i that the model fails to predict.

Similarly, the macro-average recall, macro-average precision, and macro-average F1-score metrics can be mathematically formulated as in Equations (8)–(10) shown below.

Macro-Average Recall

$M a c r o_{R} = \frac{1}{N} \sum_{i = 1}^{N} \frac{T P_{i}}{T P_{i} + F N_{i}},$

(8)
Macro-Average Precision

$M a c r o_{P} = \frac{1}{N} \sum_{i = 1}^{N} \frac{T P_{i}}{T P_{i} + F P_{i}},$

(9)
Macro-Average F1-score

$M a c r o_{F_{1}} = 2 \times \frac{M a c r o_{R} \times M a c r o_{P}}{M a c r o_{R} + M a c r o_{P}},$

(10)

3.4. Results and Discussions

3.4.1. Quantitative Result

In this section, we have presented the results of evaluation metrics obtained on the NEU-DET and GC1-DET datasets by incorporating the NMS approaches (the greedy NMS, DIoU NMS, Soft NMS, Exponential NMS, WBF, Confluence NMS, and the proposed NMS) into the YOLOv7 and EfficientDet models.

The micro- and macro-average F1-score metric results on the NEU-DET dataset using the YOLOv7 model are presented in Table 3. It is observed in Table 3 that the proposed NMS approach achieves performance improvements of 2.24%, 2.61%, 1.66%, 1.40%, 1.19%, and 4.57% when compared to the greedy NMS, DIoU NMS, Soft NMS, Exponential NMS, WBF, and Confluence NMS, respectively, in terms of the micro-average F1-score. In addition, regarding the macro-average F1-score, the proposed NMS approach achieves performance improvements of 2.60%, 2.96%, 1.36%, 1.61%, 1.58%, and 5.25% when compared, respectively, to the greedy NMS, DIoU NMS, Soft NMS, Exponential NMS, WBF, and Confluence NMS.

Similar results with the EfficientDet model for the NEU-DET dataset and the YOLOv7 model for GC10-DET can also be found in Table 3. Specifically, Table 3 illustrates that the proposed NMS approach with the EfficientDet model achieves up to approximately a 13% and 12% performance improvement in terms of the micro-average F1-score metric and micro-average F1-score metric, respectively, for the NEU-DET dataset. In addition, the YOLOv7 model with the proposed NMS achieves up to approximately a 6% and 5% performance improvement in terms of the micro-average F1-score metric and micro-average F1-score metric, respectively, for the NEU-DET dataset. Thus, one can conclude that the proposed NMS considerably and consistently outperforms the existing NMS techniques in terms of the micro- and macro-average F1-score metrics when integrated with both the YOLOv7 and EfficientDet object detectors.

3.4.2. Qualitative Result

In this section, we have analyzed the qualitative performance of the proposed NMS against the existing NMS approaches. The detection images acquired by the YOLOv7 and EfficientDet models with each NMS approach are shown in Figure 4, Figure 5, Figure 6 and Figure 7. The results regarding the Scratches class, obtained by YOLOv7, are presented in Figure 4. It is observed in Figure 4 that the results from the proposed NMS prediction are the most similar to the ground truth. However, the predictions given by the greedy NMS, DIoU NMS, Soft NMS, and Exponential NMS approaches look inaccurate. The detection images of the Patches class obtained by YOLOv7, presented in Figure 5, also depict that the results from the proposed NMS method are again the most similar to the ground truth when compared to other NMS approaches.

Similar observations can be found in Figure 6 and Figure 7 in which we have presented the detection images obtained by EfficientDet with various NMS approaches for predicting the Pitted_surface and Scratches classes. As shown in Figure 6, the greedy NMS and DIoU NMS leave more false positives (four BBoxes) than the other NMS approaches. Further, the Soft NMS and Exponential NMS approaches also failed to predict the images accurately with respect to the ground truth. In contrast, the proposed NMS successfully predicts the ground truth by suppressing all false positives. The detection results of Scratches presented in Figure 7 also depict that the greedy NMS, DIoU NMS, Soft NMS, and Exponential NMS failed to obtain predictions similar to the ground truth. The proposed method suppresses false positives well and shows similar results to the ground truth.

Through the qualitative evaluation, we can observe that the proposed method predicts the most similar results to the ground truth by suppressing false positives well, regardless of the object detection model.

3.5. Ablation Study

The proposed NMS approach consists of three strategies: IoU regularization (IR), threshold adjustment (TA), and comparison rule modification (CRM). Thus, to evaluate the efficacy of each strategy, we conducted an ablation study using the YOLOv7 and EfficientDet models on the NEU-DET dataset. The experiments are conducted in different setups to confirm the effectiveness of the proposed NMS algorithm. First, we have implemented each strategy of the proposed NMS alone and identified the detection performance of the model. Next, we incorporated a combination of two strategies from the proposed NMS and assessed the performance. The results in Table 4 show that using each of the three proposed strategies improves micro- and macro-average precision but degrades micro- and macro-average recall. This is because the proposed strategies strictly suppress BBoxes, and this may cause a decrease in true positives. However, since the performance improvement achieved in precision is greater than the performance degradation in recall, the micro- and macro-average F1-score for each strategy is substantially increased. Similar trends are observed in the cases of combinations of two and all three of the proposed strategies. The best results in the micro- and macro-average F1-scores with YOLOv7 are obtained by the proposed NMS scheme, but, in the case of the micro- and macro-average F1-scores with EfficientDet, the best result is obtained with the proposed NMS without suppression rule modification.

4. Conclusions and Discussions

This paper proposed a novel NMS approach to enhancing the object detection method for identifying steel surface defects. The proposed NMS approach presented three modifications: IoU regularization, threshold adjustment, and comparison rule modification. The experiments were carried out by incorporating the NMS approaches into the YOLOv7 and EfficientDet models on the NEU-DET and GC10-DET datasets, i.e., two well-known steel surface defect detection datasets. The proposed NMS outperformed the existing NMS approaches in terms of the micro- and macro-average F1-score with the YOLOv7 and EfficientDet models. Further, it was also observed that the proposed NMS technique dominates the other NMS approaches qualitatively.

The main reason behind the enhanced performance of the proposed NMS is suppressing more false positive BBoxes. However, this may lead to a reduction in the true positives. Therefore, as future research topics, we will improve the performance of the proposed NMS to suppress false positive BBoxes more while simultaneously preserving true positive BBoxes and to consider the theoretical analysis of the proposed NMS.

Author Contributions

Formal analysis, S.-H.K.; Writing—original draft, S.-H.K. and J.-M.K.; Writing—review & editing, V.P., I.-M.K. and S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported, in part, by the National Research Foundation of Korea (NRF) grant no. 2021R1G1A1094982 funded by the Korean government (MSIT); in part, by the National Research Foundation of Korea (NRF) grant no. 2022R1A4A1033830 funded by the Korean government (MSIT); and, in part, by the Pukyong National University Research Fund in 2021 (CD20210998).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shu, Y.F.; Li, B.; Li, X.; Xiong, C.; Cao, S.; Wen, X.Y. Deep learning-based fast recognition of commutator surface defects. Measurement 2021, 178, 109324. [Google Scholar] [CrossRef]
Xu, Y.; Li, D.; Xie, Q.; Wu, Q.; Wang, J. Automatic defect detection and segmentation of tunnel surface using modified Mask R-CNN. Measurement 2021, 178, 109316. [Google Scholar] [CrossRef]
Luo, Q.; Fang, X.; Liu, L.; Yang, C.; Sun, Y. Automated visual defect detection for flat steel surface: A survey. IEEE Trans. Instrum. Meas. 2020, 69, 626–644. [Google Scholar] [CrossRef]
Tian, R.; Jia, M. DCC-CenterNet: A rapid detection method for steel surface defects. Measurement 2022, 187, 110211. [Google Scholar] [CrossRef]
He, Y.; Song, K.; Meng, Q.; Yan, Y. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Trans. Instrum. Meas. 2019, 69, 1493–1504. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Proc. Conf. Neural Inf. Process. Syst. 2012, 60, 84–90. [Google Scholar] [CrossRef]
Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 6999–7019. [Google Scholar] [CrossRef]
Ren, R.; Hung, T.; Tan, K.C. A generic deep-learning-based approach for automated surface inspection. IEEE Trans. Cybern. 2018, 48, 929–940. [Google Scholar] [CrossRef]
Racki, D.; Tomazevic, D.; Skocaj, D. A compact convolutional neural network for textured surface anomaly detection. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1331–1339. [Google Scholar]
Mei, S.; Yang, H.; Yin, Z. An unsupervised-learning-based approach for automated defect inspection on textured surfaces. IEEE Trans. Instrum. Meas. 2018, 67, 1266–1277. [Google Scholar] [CrossRef]
Wang, T.; Chen, Y.; Qiao, M.; Snoussi, H. A fast and robust convolutional neural network-based defect detection model in product quality control. Int. J. Adv. Manuf. Technol. 2018, 94, 3465–3471. [Google Scholar] [CrossRef]
Chen, J.; Liu, Z.; Wang, H.; Nunez, A.; Han, Z. Automatic defect detection of fasteners on the catenary support device using deep convolutional neural network. IEEE Trans. Instrum. Meas. 2018, 67, 257–269. [Google Scholar] [CrossRef]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Cui, L.; Jiang, X.; Xu, M.; Li, W.; Lv, P.; Zhou, B. SDDNet: A fast and accurate network for surface defect detection. IEEE Trans. Instrum. Meas. 2021, 70, 1–13. [Google Scholar] [CrossRef]
Yeung, C.C.; Lam, K.M. Efficient fused-attention model for steel surface defect detection. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [Google Scholar] [CrossRef]
Li, M.; Wang, H.; Wan, Z. Surface defect detection of steel strips based on improved YOLOv4. Comput. Electr. Eng. 2022, 102, 108208. [Google Scholar] [CrossRef]
Qian, X.; Wang, X.; Yang, S.; Lei, J. LFF-YOLO: A YOLO Algorithm with Lightweight Feature Fusion Network for Multi-Scale Defect Detection. IEEE Access 2022, 10, 130339–130349. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Liu, S.; Huang, D. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400. [Google Scholar]
Rosenfeld, A.; Thurston, M. Edge and curve detection for visual scene analysis. IEEE Trans. Comput. 1971, 100, 562–569. [Google Scholar] [CrossRef]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Mikolajczyk, K.; Schmid, C. Scale & affine invariant interest point detectors. Int. J. Comput. Vis. 2004, 60, 63–86. [Google Scholar]
Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the CVPR 2001—2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001; Volune 1, pp. I-511–I-518. [Google Scholar]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
Solovyev, R.; Wang, W.; Gabruseva, T. Weighted boxes fusion: Ensembling boxes from different object detection models. Image Vis. Comput. 2021, 107, 104117. [Google Scholar] [CrossRef]
Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS–improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision 2017, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
Kang, S.; Shin, Y.; Lee, S.; Park, J.; Kang, J. Exponential NMS: An Effective Bounding Box Overlap Suppression Technique for Improving False Positive Performance of Metal Appearance. J. Korea Inst. Intell. Syst. 2022, 32, 464–472. [Google Scholar]
Shepley, A.J.; Falzon, G.; Kwan, P.; Brankovic, L. Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 11561–11574. [Google Scholar] [CrossRef] [PubMed]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Computer Vision-ECCV 2014, Proceedings of the 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
Song, K.; Yan, Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surf. Sci. 2013, 285, 858–864. [Google Scholar] [CrossRef]
Lv, X.; Duan, F.; Jiang, J.J.; Fu, X.; Gan, L. Deep Metallic Surface Defect Detection: The New Benchmark and Detection Network. Sensors 2020, 20, 1562. [Google Scholar] [CrossRef] [PubMed]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Loshchilov, I.; Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]

Figure 1. Ground-truth image examples of MS-COCO and NEU-DET datasets.

Figure 2. Examples of NEU-DET images by each class.

Figure 3. Description of GC10-DET images by each class.

Figure 4. Detection examples of Scratches class from NEU-DET using YOLOv7: (a) Greedy NMS, (b) DIoU NMS, (c) Soft NMS, (d) Exponential NMS, (e) WBF, (f) Confluence NMS, (g) Proposed Method, and (h) Ground Truth. The proposed result (g) is the most similar to the Ground Truth (h).

Figure 5. Detection examples of Patches class from NEU-DET using YOLOv7: (a) Greedy NMS, (b) DIoU NMS, (c) Soft NMS, (d) Exponential NMS, (e) WBF, (f) Confluence NMS, (g) Proposed Method, and (h) Ground Truth. The proposed result (g) is the most similar to the Ground Truth (h).

Figure 6. Detection examples of Pitted_Surface class from NEU-DET using EfficientDet: (a) Greedy NMS, (b) DIoU NMS, (c) Soft NMS, (d) Exponential NMS, (e) WBF, (f) Confluence NMS, (g) Proposed Method, and (h) Ground Truth. The proposed result (g) is the most similar to the Ground Truth (h).

Figure 7. Detection examples of Scratches class from NEU-DET using EfficientDet: (a) Greedy NMS, (b) DIoU NMS, (c) Soft NMS, (d) Exponential NMS, (e) WBF, (f) Confluence NMS, (g) Proposed Method, and (h) Ground Truth. The proposed result (g) is the most similar to the Ground Truth (h).

Table 1. The hyperparameters of each object detection model.

	YOLOv7	EfficientDet
Data augmentation	Random horizontal flip Random translation	Random horizontal flip
Number of parameters	36.6 M	3.9 M
Learning rate	0.005	0.0001
Batch size	8	8
Optimizer	SGD	Adam
Epochs	130	120
Warm-up epochs	5	5
BoF/BoS	Mosaic data augmentation	Cosine annealing scheduler

Table 2. The hyperparameters of each NMS method.

Model (Dataset)	Soft NMS	Exponential NMS	Confluence NMS	Proposed NMS
YOLOv7 (NEU-DET)	$σ = 0.7$	$n = 1.1$	$σ = 0.4$ , $C_{t} = 0.5$	$λ = 0.4$
EfficientDet (NEU-DET)	$σ = 0.8$	$n = 1.3$	$σ = 0.2$ , $C_{t} = 0.5$	$λ = 0.4$
YOLOv7 (GC10-DET)	$σ = 0.7$	$n = 1.1$	$σ = 0.5$ , $C_{t} = 0.5$	$λ = 0.4$

Table 3. Results using YOLOv7 and EfficientDet with various NMS methods.

Models (Datasets)	Metrics	Greedy NMS	DIoU NMS	Soft NMS	Exponential NMS	WBF	Confluence NMS	Proposed
YOLOv7 (NEU-DET)	${Micro}_{F_{1}}$	0.6642	0.6618	0.6680	0.6697	0.6711	0.6494	0.6791
YOLOv7 (NEU-DET)	${Macro}_{F_{1}}$	0.6686	0.6663	0.6768	0.6751	0.6753	0.6518	0.6860
EfficientDet (NEU-DET)	${Micro}_{F_{1}}$	0.5947	0.5883	0.6488	0.6566	0.5955	0.5825	0.6604
EfficientDet (NEU-DET)	${Macro}_{F_{1}}$	0.6009	0.5952	0.6491	0.6541	0.6016	0.5908	0.6631
YOLOv7 (GC10-DET)	${Micro}_{F_{1}}$	0.6373	0.6368	0.6346	0.6386	0.6364	0.6032	0.6396
YOLOv7 (GC10-DET)	${Macro}_{F_{1}}$	0.6215	0.6214	0.6141	0.6226	0.6198	0.5938	0.6232

Bold in table means the highest score.

Table 4. Results for the ablation study.

Method			YOLOv7						EfficientDet
IR	TA	SRM	${Micro}_{P}$	${Micro}_{R}$	${Micro}_{F_{1}}$	${Macro}_{P}$	${Macro}_{R}$	${Macro}_{F_{1}}$	${Micro}_{P}$	${Micro}_{R}$	${Micro}_{F_{1}}$	${Macro}_{P}$	${Macro}_{R}$	${Macro}_{F_{1}}$
-	-	-	0.5928	0.7552	0.6642	0.6048	0.7568	0.6686	0.4730	0.8009	0.5947	0.4921	0.7987	0.6009
✓	-	-	0.6031	0.7529	0.6697	0.6162	0.7539	0.6743	0.5233	0.7728	0.6241	0.5405	0.7715	0.6280
-	✓	-	0.6192	0.7447	0.6762	0.6385	0.7457	0.6837	0.5741	0.7482	0.6497	0.5910	0.7455	0.6536
-	-	✓	0.6060	0.7529	0.6715	0.6215	0.7545	0.6777	0.4750	0.8009	0.5963	0.4934	0.7987	0.6021
✓	✓	-	0.6257	0.7400	0.6781	0.6428	0.7399	0.6841	0.6024	0.7400	0.6642	0.6152	0.7374	0.6664
✓	-	✓	0.6085	0.7517	0.6726	0.6232	0.7526	0.6780	0.5166	0.7810	0.6219	0.5342	0.7799	0.6268
-	✓	✓	0.6203	0.7423	0.6759	0.6400	0.7425	0.6831	0.5806	0.7459	0.6529	0.5956	0.7431	0.6562
✓	✓	✓	0.6258	0.7423	0.6791	0.6445	0.7426	0.6860	0.5954	0.7412	0.6600	0.6106	0.7384	0.6631

Bold in table means the highest score.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kang, S.-H.; Palakonda, V.; Kim, I.-M.; Kang, J.-M.; Yun, S. Enhanced Non-Maximum Suppression for the Detection of Steel Surface Defects. Mathematics 2023, 11, 3898. https://doi.org/10.3390/math11183898

AMA Style

Kang S-H, Palakonda V, Kim I-M, Kang J-M, Yun S. Enhanced Non-Maximum Suppression for the Detection of Steel Surface Defects. Mathematics. 2023; 11(18):3898. https://doi.org/10.3390/math11183898

Chicago/Turabian Style

Kang, Seong-Hwan, Vikas Palakonda, Il-Min Kim, Jae-Mo Kang, and Sangseok Yun. 2023. "Enhanced Non-Maximum Suppression for the Detection of Steel Surface Defects" Mathematics 11, no. 18: 3898. https://doi.org/10.3390/math11183898

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Non-Maximum Suppression for the Detection of Steel Surface Defects

Abstract

1. Introduction

1.1. Related Works

1.2. Motivations and Contributions

1.3. Organization

2. Proposed Scheme

2.1. IoU Regularization

2.2. Threshold Adjustment

2.3. Suppression Rule Modification

3. Experimental Results and Discussions

3.1. Dataset Description

3.2. Hyperparameters for Models and NMS Schemes

3.3. Performance Evaluation Metric

3.4. Results and Discussions

3.4.1. Quantitative Result

3.4.2. Qualitative Result

3.5. Ablation Study

4. Conclusions and Discussions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI