Next Article in Journal
Short-Process Preparation of High-Purity V2O5 from Shale Acid Leaching Solution via Chlorination
Next Article in Special Issue
Construction of a Green-Comprehensive Evaluation System for Flotation Collectors
Previous Article in Journal
New Energy Power System Dynamic Security and Stability Region Calculation Based on AVURPSO-RLS Hybrid Algorithm
Previous Article in Special Issue
A Measurement Method for the Pore Structure of Coal Slime Filter Cake
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Lightweight Target Detection for Coal and Gangue Based on Improved Yolov5s

School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan 232001, China
*
Author to whom correspondence should be addressed.
Processes 2023, 11(4), 1268; https://doi.org/10.3390/pr11041268
Submission received: 1 April 2023 / Revised: 17 April 2023 / Accepted: 18 April 2023 / Published: 19 April 2023
(This article belongs to the Special Issue Process Analysis and Carbon Emission of Mineral Separation Processes)

Abstract

:
The detection of coal and gangue is an essential part of intelligent sorting. A lightweight coal and gangue detection algorithm based on You Only Look Once version 5s (Yolov5s) is proposed for the current coal and gangue target detection algorithm with the low accuracy of small target detection, high model complexity, and sizeable computational memory consumption. Firstly, we build a new convolutional block based on the Funnel Rectified Linear Unit (FReLU) activation function and apply it to the original Yolov5s network so that the model adaptively captures local contextual information of the image. Secondly, the neck of the original network is redesigned to improve the detection accuracy of small samples by adding a small target detection head to achieve multi-scale feature fusion. Next, some of the standard convolution modules in the original network are replaced with Depthwise Convolution (DWC) and Ghost Shuffle Convolution (GSC) modules to build a lightweight feature extraction network while ensuring the model detection accuracy. Finally, an efficient channel attention (ECA) module is embedded in the backbone of the lightweight network to facilitate accurate localization of the prediction region by improving the information interaction of the model with the channel features. In addition, the importance of each component is fully demonstrated by ablation experiments and visualization analysis comparison experiments. The experimental results show that the mean average precision (mAP) and the model size of our proposed model reach 0.985 and 4.9 M, respectively. The mAP is improved by 0.6%, and the number of parameters is reduced by 72.76% compared with the original Yolov5s network. The improved algorithm has higher localization and recognition accuracy while significantly reducing the number of floating-point calculations and of parameters, reducing the dependence on hardware, and providing a specific reference basis for deploying automated underground gangue sorting.

1. Introduction

The share of coal energy in China’s primary energy will continue to exceed 50% until 2025 [1]. However, mined gangue accounts for 15–20% of the raw coal production during coal production [2]. It not only raises the transportation cost of the industry but also reduces the combustion efficiency of raw coal and aggravates ecological pollution [3,4]. Therefore, accurately sorting coal and gangue is essential for efficiently utilizing coal energy. The traditional methods of sorting coal and gangue are mainly manual gangue sorting, heavy media gangue sorting [5], dynamic sieve skip sorting [6], dual-energy gamma-ray detection [7], X-ray detection [8], and laser detection [9]. However, the above methods have disadvantages such as high cost, low precision, low environmental protection, and harm to human health, which are not conducive to the long-term development of coal mining [10]. The development of computer technology and machine vision has effectively solved these problems.
With the development of technologies for sorting robots to identify coal and gangue, automatic sorting of coal and gangue is becoming an increasingly popular area of research [11]. Machine vision-based methods for coal and gangue localization and recognition are divided into machine learning and deep learning [12]. Machine learning mainly relies on the design of manual image features, while deep learning enables the model to learn image features automatically. Li et al. used the standard features of grayscale skewness and texture contrast of coal gangue as the input vector of the least squares supports vector machine (LS-SVM) to achieve coal gangue recognition [13]. Dou et al. used a topographic support vector machine (Relief-SVM) for color, texture, and other features of coal gangue pictures and constructed the optimal coal gangue classifier [14]. Wang et al. constructed SVM classification models to identify coal and gangue by extracting their dielectric and geometric features [15]. However, machine learning still faces some challenges, such as occlusion, intraclass, and background [16]. Moreover, deep learning-based target detection algorithms can better handle these challenges through effective feature representation and the use of powerful model optimization techniques.
Two mainstream detection algorithms have been developed for deep learning-based target detection. One is the two-stage target detection algorithm represented by Faster R-CNN [17]. The other is the single-stage target detection algorithm represented by the Yolo series [18]. Although the accuracy of the two-stage-based target detection algorithm is generally higher than that of the single-stage, the single-stage detection algorithm is high in real-time and fast. It has been validated in some studies of coal and gangue detection based on the Yolo algorithms. Li et al. build a deformed convolutional Yolov3 network model based on the detection algorithm Yolov3 utilizing deformed convolution, multiple k-means clustering, and data augmentation [19]. Pan et al. improved the Yolov3 network using a spatial pyramid pooling (SPP) network, a squeeze excitation (SE) module, and null convolution to speed up recognition while ensuring the model accuracy requirements [20]. Liu et al. optimized the image quality and model anchor frame by adding deeper feature pyramids to the Yolov4 network [4]. Zhang et al. combined mosaic data enhancement, learning rate cosine annealing decay strategy, and label smoothing optimization methods with Yolov4 networks to achieve a mAP value of 0.975 [21]. Li et al. combined the Yolov4 target detection algorithm and hybrid domain attention mechanism to construct a coal gangue detection model. Both coal and gangue achieved high recognition accuracy [22]. Yan et al. added spatial and channel compression excitation (scSE) to the Yolov5 network structure module, and the average accuracy of the model was as high as 0.983 [23]. Therefore, end-to-end detection algorithms are chosen for coal and gangue detection in this paper. However, due to the limited storage space and processor performance of mobile or embedded devices, the substantial computational consumption makes it difficult for these devices to use these complex networks further.
This paper proposes a lightweight target detection model based on improved Yolov5s for locating and identifying coal and gangue in images. The algorithm aims to reduce the model complexity with guaranteed accuracy and prepare for engineering deployment. Firstly, based on the Yolov5s network, the image context information is extracted adaptively by the visual activation function FReLU. Next, we redesign the neck structure of the network and adapt it to a multiscale target detection layer to improve the detection capability for small samples. After this the DWC and GSC modules replace some convolutional blocks in the original network to construct a lightweight feature extraction network, respectively. Finally, the ECA module is embedded in the backbone part further to improve the feature extraction capability of the model. As a result, our proposed improved Yolov5s model can achieve better detection performance with limited computational resources. The effectiveness of each improvement is verified through a series of experiments, and the improved algorithm is compared with commonly used lightweight target detection algorithms.
The rest of the paper is organized as follows. Section 2 describes the experimental material used in the paper. The structure of the Yolov5s model is outlined, and an improved approach is proposed. In Section 3, we show the experimental results of the improved model on actual data and analyze the results of the ablation and comparison experiments. Finally, in Section 4, we summarize the conclusions obtained in this paper.

2. Materials and Methods

2.1. Acquisition and Labeling of Coal and Gangue Images

This study used coal and gangue as the research objects, and the collection site was a mining area in Huainan City, Anhui Province. The FLIR E50, a thermal imaging camera, was applied as the image acquisition device. Images of coal and gangue were acquired under the same lighting and background conditions at a distance of 30–40 cm. The particle size range of coal and gangue is 10–150 mm. A total of 600 visible photos of coal and gangue were collected, and the resolution of each photo is 2048 pixel × 1536 pixel. in order to improve the detection accuracy, not only the image data containing single coal or gangue need to be collected but also the image data of the mixture of coal and gangue with different sizes. Experimentally obtained images of partial coal and gangue are shown in Figure 1.
LabelImg software manually annotated all images based on the collected dataset. The annotated text files containing category and coordinate information were generated. The annotated dataset is divided into training and validation sets in a ratio of 4:1 while ensuring no duplicate information between each group. The collection information of coal and gangue is shown in Table 1.

2.2. Principle of Yolov5s Network

Yolov5s network is one of the most widely used detection networks in industrial applications such as production processes [24], autonomous driving [25], monitoring and safety [26], and it achieves high detection accuracy in target detection as well as good results in inference speed. The detection principle of the Yolov5s network is shown in Figure 2, and the structure includes the input, backbone network, neck network, and output. Among them, the input is the input image size and normalization method. The backbone network extracts the processed image to get the multi-scale feature map. The main ones in the backbone network consist of the CBS module, the C3 (Concentrated comprehensive convolution) module, and the SPPF (Spatial Pyramid Pooling-Fast) module [27], which are responsible for feature extraction and feature segmentation of the processed image to obtain the depth features and different scales of the image. In addition, C3 modules can be divided into C3_1 and C3_2 depending on the bottleneck structure [28]. The neck network mainly consists of a feature pyramid network (FPN) [29] and a path aggregation network (PAN) [30], which achieves a multi-scale fusion of the features extracted by the backbone network after multiple upsampling and downsampling for use at the output. The output side outputs the detection results, including information on each test frame’s position, category, and confidence level.

2.3. Improvements of the Yolov5s

2.3.1. Activation Function

The activation function is one of the main components of deep learning. It enhances the model’s expressiveness by determining the network’s depth and nonlinear properties. The SiLU activation function in the original network [31] only activates a feature point in the input feature map without considering the feature information around the current feature point, which leads to a fixed activation region of the model that does not focus well on the general information of the image. Therefore, in this paper, the FReLU activation function [32], which is more applicable to vision tasks, is used to capture the complex feature distribution in two dimensions and achieve the ability to model spatial information at the pixel level, thus improving the accuracy of the model. The specific expressions are shown below.
FReLU ( x c , i , j ) = m a x ( x c , i , j , T ( x c , i , j ) )
T ( x c , i , j ) = x c , i , j ω p c ω
where  x c , i , j  is the input pixel point of the nonlinear activation function.  T ( x )  represents the spatial condition, which is realized by depth-separable convolution and BN layer.  x c , i , j ω  represents the result of the convolution operation with  x c , i , j  as the center point after a convolution kernel of size  ω p c ω  represents the shared weights of the convolution kernel of the same channel in depth-separable convolution. In practice, two-dimensional convolution is first done on the surrounding information of the current feature point, followed by discriminative activation of the result of the convolution operation with the current feature point. Therefore, the model will have a broader receiver domain when performing nonlinear activation.

2.3.2. Small Target Detection Head

In the coal and gangue transportation process, the smaller-sized coal blocks are challenging to detect under the influence of large gangue. Therefore, the gangue detection model used must be able to detect small targets. In this paper, a small target detection head is added to the Yolov5s network structure and the multilayer features of the network are fused. This operation improves the detection accuracy of coal and gangue by ensuring that the shallow features of the model will be recovered while also enabling accurate detection at multiple scales. The steps include first upsampling the feature map after the 17th layer. Then the features in layers 21 and 22 are tensor stitched and fused by stitching layers to obtain a new detection layer at 160 × 160 scale for identifying small targets.
Since the Yolov5 network is designed for large-scale vision challenges, and the self-made dataset in this paper differs significantly from the COCO dataset, the calculation method of automatically learning the anchor points of the detection frame is used in Yolov5. By automatically learning the size of anchor frames, 3 sets of anchor frames are added for a feature map of 160 × 160 scale. The final 12 sets of anchor frame sizes were (5, 6), (8, 14), (15, 11), (10, 13), (16, 30), (33,23), (30,61), (62, 45), (59, 119), (116, 90), (156, 198), and (373, 326).

2.3.3. Lightweight Operation

Due to the limited computational space of available embedded devices, standard target detection models are complex and challenging to deploy due to their large memory footprint. Therefore, a lightweight target detection algorithm is needed to reduce the percentage of computational resources while ensuring the accuracy of model detection. For this purpose, DWC [33] and GSC [34] modules are used in this paper to replace the ordinary convolution in the model, respectively, to achieve a lightweight improvement. The specific structures are shown in Figure 3.
DWC module is essentially group convolution, a particular case when the number of groups equals the number of input feature map channels. It reduces the model’s training parameters and increases the diagonal correlation between adjacent layer filters, making the training less prone to overfitting. The GSC module generates more features using fewer parameters based on ordinary convolution, thus reducing the computational cost of ordinary convolution layers. In addition, channel shuffle in GSC helps the information interaction between intrinsic feature maps and ghost feature maps, thus encoding more information and improving the stability of the model.
The computational costs of the ordinary convolution, DWC, and GSC modules are shown in Equations (3)–(5), respectively.
C o s t Conv = h × w × n × d × d × c
C o s t DWC = h × w × n × d × d
C o s t GSC = h × w × n s × d × d × c + ( s 1 ) × h × w × n s × d × d
where  h w , and  n  are the height, width, and number of channels of the output feature map, respectively.  d  is the size of the convolution kernel.  c  is the number of channels of the input feature map,  n s  is the number of channels of the intrinsic feature maps, and specifically  c s .
The above calculation shows that the DWC and GSC modules reduce the computational effort by a factor of  c  and  s , respectively, compared to the ordinary convolution for the same operation. The specific derivation is shown in the following equation.
C o s t Conv C o s t DWC = h × w × n × d × d × c h × w × n s × d × d = c
C o s t Conv C o s t GSC = h × w × n × d × d × c h × w × n s × d × d × c + ( s 1 ) × h × w × n s × d × d = c 1 s × c + ( s 1 ) × 1 s = s × c s + c 1 s
However, the structure of the original Yolov5s model is complex. Replacing the DWC and GSC modules at different locations may result in a different quality of extracted feature information, affecting final recognition accuracy. On the one hand, different modules work differently and have different effects on the feature extraction ability of the model. On the other hand, the different structures of the parts in the network lead to different degrees of lightweight of the model. Therefore, in this paper, based on the original Yolov5s network, the DWC and GSC modules are introduced in different positions of the backbone and neck networks, respectively. A total of 15 lightweight schemes are finally designed.

2.3.4. ECA Attention Mechanism

The size of the coal and gangue in the image is significantly different from the background area, thus resulting in a large amount of useless information in the image. To enhance the ability of the network to extract features and suppress the effect of useless information on model training, we introduce the ECA module in the backbone of the model [35]. On the one hand, it enhances the important feature extraction ability of the model while increasing the number of negligible parameters. On the other hand, it compensates for the lack of information interaction between the individual channels of the feature map in the lightweight network. The workflow of this module is shown in Figure 4. The input feature map is first compressed with spatial features by global average pooling. Subsequently, the compressed feature maps are subjected to channel feature learning by 1 × 1 convolution. Finally, the channel-focused feature maps are multiplied with the original input feature maps channel by channel to output the channel-focused feature maps.
Where  h  and  w  represent the width and height of the input feature map, respectively.  GAP  represents the global average pooling.  k  represents the coverage of cross-channel interactions.  σ  represents the sigmoid activation function.   represents the element-wise product.
In order to solve the problem that the original Yolov5s network cannot detect small-size coal and gangue well and the model occupies large memory, this paper combines the FReLU activation function, small target detection head, lightweight convolution module and ECA module to construct the whole improved Yolov5s framework, as shown in Figure 5.

3. Experiments and Analysis

3.1. Design of Experiments

3.1.1. Conditions of Experiments

The experiments were conducted using Windows 11 64-bit operating system, computer configuration with Intel(R) Core (TM) i7-12700F CPU @ 2.10 GHz and NVIDIA GeForce RTX 3060 12 G graphics card; the deep learning framework Pytorch1.12.1 was used for training and validation of the model. These experiments rely on independently collected coal and gangue datasets for training. In order to improve the robustness and generalization of the model while increasing the amount of training data and thus reducing the risk of overfitting, mosaic enhancement is set during the model training [36]. The main idea is to randomly scale, crop, flip and add noise to 8 images, and then stitch them onto one image as training data. The learning rate is set based on Warmup [37], using the Stochastic gradient descent with momentum (SGDM) optimizer. The specific parameter settings are shown in Table 2.
The downsampling of Yolov5s is generally 32 times, so the height and width of the internal feature map must be divided by 32. Since multi-scale training will generally choose 32 times, the experiments will choose a 640 × 640 pixel resolution for detection. In addition, the adaptive anchor frame in Yolov5s can effectively reduce the grey fill generated by the input 2048 × 1536 pixel image scaling, thus reducing the information redundancy.

3.1.2. Experiment Process

The goal of this paper is to improve the accuracy of coal and gangue in small target detection tasks by improving the original Yolov5s network while reducing the complexity of the model. Therefore, it can be considered as a kind of learning environment [16].
Firstly, the coal and gangue in each image are annotated manually to obtain training label images. Subsequently, the image dataset and labels of coal and gangue are divided into training and validation sets in the ratio of 4:1. Next, the training set is input to the improved YOLOv5s network for training. During the training process, the stochastic gradient descent algorithm is used to optimize the network model, and the model weights with the highest mAP obtained on the validation set are saved as the experimental results. Finally, the trained optimal weights are loaded into the model, and the performance of the network model is tested using the images in the validation set. The specific experimental flow is shown in Figure 6.

3.2. Model Evaluation Indicators

In this paper, both performance and complexity metrics are used to evaluate the proposed model respectively. The indicators of model performance include precision (P), recall (R), F1-Score and mAP. Precision and recall provide a visual representation of the accuracy of the target prediction. F1-Score is the weighted average of precision and recall. The AP value for each category is the region consisting of the label P-R curve for that category. The mAP is the average of the average precision (AP) values for each category of labels. Therefore, these indicators can represent the overall detection performance of the model. The specific formulas for each indicator are defined as follows.
P = T P T P + F P
R = T P T P + F N
F 1 - S c o r e = 2 × P × R P + R
A P = 0 1 P ( R ) d R
m A P = i = 1 N C A P i N C
where  T P  represents the number of samples with correctly detected coal,  F P  represents the number of samples with incorrectly detected coal, and  F N  represents the number of samples with missed gangue.  N C  is the number of detection categories.
The indicators for the complexity of the model include the number of parameters and the number of floating-point calculations (FLOPs) [27]. The formulas are as follows.
P a r a s = [ h × w × ( d × d ) × h × w ] + h × w
F L O P s ( Conv ) = 2 × h × w × ( c × d 2 + 1 ) × n
F L O P s ( Pool ) = h S × w S × n
where  S  is the step size.

3.3. Analysis of Model Training and Validation Results

According to the data set type, the prediction model’s loss function can be divided into training loss and validation loss. The curves are shown in Figure 7a. The figure shows that during the model training, the training loss and validation loss decrease rapidly when the number of iterations is between 0 and 50. When the number of iterations reaches 200 or more, the loss value of the prediction model starts to stabilize gradually. From the curves of precision, recall, F1-Score, and mAP in Figure 7b, it can be seen that the trained prediction model does not show any overfitting phenomenon. In addition, each accuracy index of the model ends up at 0.96 or above.
It can be seen that the improved Yolov5s network has high detection accuracy for coal and gangue. The following figure shows some of the detection results of the improved Yolov5s model on the validation set. In this case, the predicted borders with low confidence have been filtered. It can be found that when single coal or gangue of different sizes are in one image, or even when coal and gangue are mixed, the improved Yolov5s can still identify and locate them accurately, which further indicates the effectiveness of the proposed model in coal and gangue detection.
The coordinate and scores of the bounding box of coal and gangue in Figure 8 are shown in Table 3. Where ( x lt , y lt ) is the upper left coordinate of the bounding box and ( x rb , y rb ) is the lower right coordinate of the bounding box. The labels are divided into coal and gangue. The coordinates of the bounding box can be obtained to get the positions of coal and gangue, and the relative sizes of coal and gangue can be obtained from the bounding box coordinates. It shows that the improved Yolov5s can identify the coal and gangue accurately and obtain the position and pair size of coal and gangue, which helps the sorting operation of coal and gangue.

3.4. Optimization of Lightweight Scheme and Attention Mechanism

The results of comparing the lightweight schemes of the 15 models proposed in Section 2.3.2 are shown in Table 4, where “√” indicates that the corresponding strategy is used and “×” indicates that the corresponding strategy is not used. Schemes 1–3 are experiments of replacing the DWC module alone. Schemes 4–6 are experiments of replacing the GSC module alone. Schemes 7–9 are DWC modules that only introduce backbone network experiments. Schemes 10–12 are experiments of introducing the DWC module into the neck network only. Schemes 13–15 are the experiments introducing the DWC module to the whole network. By comparing Schemes 1–3 and 4–6, it can be seen that the DWC module and the GSC module differ in the degree of lightweight of the network. Although the DWC module shows some advantages in reducing the model complexity and size, the accompanying price is that the detection accuracy of the model is also lost to some extent. The comparison results of Schemes 7–15 show that the computational cost of the model and the detection accuracy can be effectively balanced by analyzing the pairing of DWC and GSC modules in different positions in the network.
Throughout the experiment, the mAP of Scheme 5, Scheme 6, Scheme 7, Scheme 9, Scheme 11, and Scheme 14 remained above 0.98. However, scheme 14 maintained a higher mAP while the model size was reduced to its lowest. Therefore, considering the detection accuracy and complexity of the model, we finally chose scheme 14 as the basis for the next step of improving the model. This Scheme is as follows: in the backbone network, all the standard convolutions are replaced with deep ones except for the first layer and the bottleneck block. In the neck network, the normal convolution used for downsampling is replaced with a deep convolution. Except for the bottleneck block, the ordinary convolution in the C3 structure is replaced with the GSC module.
From the above experimental results, we find that although the lightweight operation can reduce the computational cost of the model, the performance of the model is also affected to some extent. Therefore, we compensate for the accuracy degradation caused by the lightweight operation by adding an attention mechanism to the model in the previous step. However, different attention mechanisms bring different effects to the model. On this basis, we introduced the ECA, Coordinate Attention (CA) [38], Convolutional Block Attention Module (CBAM) [39], and Global Attention Mechanism (GAM) [40]. Among them, the ECA module focuses on increasing attention to the channel dimension, and the remaining three modules combine attention to the channel and to spatial dimensions. The comparison results are shown in Table 5. The ECA module improves the model’s performance in this paper the most compared with other attention modules. The addition of the ECA module improves the network accuracy to a certain extent while not burdening the complexity of the model. However, after the introduction of CA, CBAM, and GAM modules, the complexity of the model increased, and the accuracy decreased. The results indicate that the lightweight model is more sensitive to the information interaction between feature map channels.
To observe more intuitively the effect of adding different attention mechanisms to the model, we used Grad-CAM as a visualization tool [41]. It can be seen from Figure 9 that the ECA module helps the model locate the object of interest more precisely than the other modules while achieving the highest detection accuracy.

3.5. Ablation Experiments

In order to verify the improvement of the model’s performance in this paper, one of the improvement points is compared and analyzed in this paper. The ablation study is performed on the homemade coal and gangue datasets with the same validation set and image input size. The results of the ablation experiments are summarized in Table 6.
In Table 6, Model A is the original Yolov5s; Model B replaces the activation function in Yolov5s with FReLU; Model C adds a small target detection head based on model B to achieve a four-scale detection output; Model D is a lightweight operation based on C; and Model E introduces the ECA module based on D. The details are as follows.
FReLU: The original nonlinear activation function SiLU in the network is replaced with FReLU to make the model adaptive and acquire the local image context for coal and gangue. The APs of coal and gangue reach 0.988 and 0.976, respectively. mAP and F1-Score are as high as 0.982 and 0.971, respectively, which are 0.3% and 1.2% better than the original Yolov5s network. The results show that the detection rate of the correct target frame for most of the coal and gangue has been improved, proving the effectiveness of the FReLU activation function for network optimization.
Small target detection head: By adding the small target detection head to the original network, the F1-Score and mAP improved by 0.1% and 0.3%, respectively. It indicates that the added detection branch anchor frames are set to the size of small targets, which can significantly reduce missed and false detections due to oversized anchor frame settings. However, the model size increases due to the added feature layer.
Lightweight: The model is lightened by pairing DWC and GSC modules on the network of multilayer detection heads. The results show that although the mAP and F1-Score are reduced to 0.981 and 0.968, respectively, the model’s size is reduced by 67.97% compared to model C.
ECA module: By integrating the ECA attention mechanism into the backbone network of model D, the F1-Score and mAP is improved by 0.8% and 0.4%, respectively, with no change in the size of the model. The results show that this module helps improve the model performance while avoiding increasing the model complexity.
In addition, the performance of the above five models in terms of precision, recall, the number of parameters and FLOPs of the model are shown in Figure 10. The improved Yolov5s have improved the precision and recall of the model while significantly reducing the complexity of the model compared to the original network.

3.6. Comparison Experiments with Related Methods

To further validate the performance of the proposed model, we compare it with Yolov3-tiny [42], Yolov4-tiny [36], Yolox-tiny [43], Yolov6-tiny [44], Yolov7-tiny [45] and some lightweight variants of Yolov5-based networks. For example, GhostNet [46], ShuffleNetv2 [47], and MobileNetv3 [48] are used instead of Yolov5 backbone networks. However, previous studies differed regarding computer hardware, parameter settings, and data sets used in training and validation, which prevented scientific comparisons. To avoid the impact of these differences, the target detection algorithm described above was retrained in this study. The results are shown in Table 7.
Our proposed model is optimal in terms of recall, F1-Score, number of model parameters, and model size compared to other networks. The highest recall and F1-Score improved by 2.9% and 3.3%, respectively, while the highest number of model parameters and size decreased by 81.95% and 78.7%, respectively. Compared with the previous Yolo series lightweight networks, the mAPs of each network are similar in terms of accuracy. Although the precision of the proposed model is second only to Yolov7-tiny, our proposed model has a clear advantage in characterizing the model size. For example, the number of parameters, FLOPs and size of the proposed model is reduced by 68.39%, 45.45% and 60.16%, respectively, compared to Yolov7-tiny. Compared with the lightweight variant of the Yolov5 network, although Yolov5 (MobileNetv3) and Yolov5 (ShuffleNetv2) slightly outperform the model in this paper in terms of FLOPs and mAP, respectively, the rest of the evaluation metrics are not outstanding. In summary, the overall performance of the proposed model is better than that of the compared models.
In addition, some detection results of coal and gangue mixtures are selected to visualize the difference in the detection performance of the above algorithms. The results are shown in Figure 11. In addition to the inspection boxes set in the algorithm, the green elliptical boxes represent the cases of missed and false detection by the models. The figure shows that Yolov5s, Yolo series lightweight models and variants of the Yolov5 network have different degrees of false detection and missed detection for different sizes of coal and gangue. However, the model proposed in this paper is free of false and missed detections under the same detection conditions. In addition, compared with the original Yolov5s network, the prediction values of this model for small-scale coal and gangue are mostly higher than those of the original network model. Thus, it is proved that the proposed model is adequate for coal and gangue target detection.

4. Conclusions

In this study, we explore a lightweight detection model while improving small-size coal and gangue detection accuracy. The validity of our proposed model was confirmed by comparing multiple detection models. The main conclusions are as follows.
  • By replacing the activation function of the original network and adding small target detection heads, the APs of coal and gangue reach 0.983 and 0.988, respectively. The mAP and F1-Score reach 0.985 and 0.972, respectively, while the computational effort increases only a tiny amount. This method improves the detection accuracy and efficiency of small-size coal and gangue.
  • DWC and GSC modules are introduced into the model to reduce the complexity of the improved multiscale network. Compared with the original network, the number of parameters and FLOPs of the model are reduced by 72.76% and 54.43%, respectively. In addition, the detection accuracy is maintained at the original level.
  • The ECA module is embedded in the network to help the model focus on the targets of coal and gangue in the images during the training process. In addition, it can adaptively adjust the weight parameters of the feature maps, thus improving the feature extraction ability of the model. Compared with the original network, the mAP and F1 scores of the improved model are improved by 0.6% and 1.7%, respectively.
  • The results clearly show that the proposed strategy has the best comprehensive performance compared to some standard lightweight models. It can provide theoretical and technical references for detecting and identifying coal and gangue. At the same time, it provides a reliable basis for deploying detection models for resource-limited embedded devices.
This study comprehensively compares the detection performance and computational complexity of the model. Experiments show that the improved model has higher detection accuracy and lower computational cost. The platform storage and computational power requirement are reduced, and it is easy to deploy on resource-constrained equipment. In future research, we will focus on improving the detection speed of the model for real-time underground coal and gangue detection applications.

Author Contributions

Conceptualization, Z.C.; Data curation, L.F., Z.L. and J.L.; Formal analysis, J.L.; Funding acquisition, Z.C.; Methodology, L.F.; Project administration, Z.C.; Resources, Z.C.; Software, L.F.; Validation, Z.L.; Writing—original draft, L.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Natural Science Research Project of Anhui Educational Committee (No. KJ2021A0427), Postgraduate Innovation Fund of Anhui University of Science and Technology (No. 2022CX2085).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Xie, H.; Wu, L.; Zheng, D. Energy Consumption and Coal Demand Forecast in 2025 in China. J. China Coal Soc. 2019, 44, 1949–1960. [Google Scholar]
  2. Zhang, B.; Zhang, H.B. Coal Gangue Detection Method Based on Improved SSD Algorithm. In Proceedings of the 2021 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Xi’an, China, 27–28 March 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 634–637. [Google Scholar]
  3. Cao, X.; Fei, J.; Wang, P. Research on coal and gangue sorting method based on multi-manipulator cooperation. Coal Sci. Technol. 2019, 47, 7–12. [Google Scholar]
  4. Liu, Q.; Li, J.; Li, Y.; Gao, M. Recognition Methods for Coal and Coal Gangue Based on Deep Learning. IEEE Access 2021, 9, 77599–77610. [Google Scholar] [CrossRef]
  5. Chen, Z.; Huang, L.; Jiang, H.; Zhao, Y.; Liu, C.; Duan, C.; Zhang, B.; Yang, G.; Chai, J.; Ban, H.; et al. Application of screening using a flip-flow screen and shallow groove dense-medium separation in a steam coal preparation plant. Int. J. Coal Prep. Util. 2022, 42, 2438–2451. [Google Scholar] [CrossRef]
  6. Guo, X.J. Research and application of coal gangue separation technology. Coal. Eng. 2019, 1, 74–76. [Google Scholar]
  7. Yazdi, M.; Esmaeilnia, S.A. Dual-energy gamma-ray technique for quantitative measurement of coal ash in the Shahroud mine. Iran. Int. J. Coal Geol. 2003, 55, 151–156. [Google Scholar] [CrossRef]
  8. Dong, Z.-C.; Xia, J.-W.; Duan, X.-M.; Cao, J.-C. Based on Curing Age of Calcined Coal Gangue Fine Aggregate Mortar of X-ray Diffraction and Scanning Electron Microscopy Analysis. Guang Pu Xue Yu Guang Pu Fen Xi Guang Pu 2016, 36, 842–847. [Google Scholar]
  9. Wang, W.; Zhang, C. Separating coal and gangue using three-dimensional laser scanning. Int. J. Miner. Process. 2017, 169, 79–84. [Google Scholar] [CrossRef]
  10. Zhou, J.; Guo, Y.; Wang, S.; Cheng, G. Research on intelligent optimization separation technology of coal and gangue base on LS-FSVM by using a binary artificial sheep algorithm. Fuel 2022, 319, 123837. [Google Scholar] [CrossRef]
  11. McCoy, J.T.; Auret, L. Machine learning applications in minerals processing: A review. Miner. Eng. 2019, 132, 95–109. [Google Scholar] [CrossRef]
  12. Guo, Y.; Zhang, Y.; Li, F.; Wang, S.; Cheng, G. Research of coal and gangue identification and positioning method at mobile device. Int. J. Coal Prep. Util. 2022, 43, 691–707. [Google Scholar] [CrossRef]
  13. Li, M.; Duan, Y.; He, X.; Yang, M. Image positioning and identification method and system for coal and gangue sorting robot. Int. J. Coal Prep. Util. 2022, 42, 1759–1777. [Google Scholar] [CrossRef]
  14. Dou, D.; Wu, W.; Yang, J.; Zhang, Y. Classification of coal and gangue under multiple surface conditions via machine vision and relief-SVM. Powder Technol. 2019, 356, 1024–1028. [Google Scholar] [CrossRef]
  15. Wang, X.; Wang, S.; Guo, Y.; Hu, K.; Wang, W. Dielectric and geometric feature extraction and recognition method of coal and gangue based on VMD-SVM. Powder Technol. 2021, 392, 241–250. [Google Scholar] [CrossRef]
  16. Gomez-Flores, A.; Ilyas, S.; Heyes, G.W.; Kim, H. A critical review of artificial intelligence in mineral concentration. Miner. Eng. 2022, 189, 107884. [Google Scholar] [CrossRef]
  17. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2017, 37, 1137–1149. [Google Scholar] [CrossRef]
  18. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  19. Li, D.; Wang, G.; Zhang, Y.; Wang, S. Coal gangue detection and recognition algorithm based on deformable convolution YOLOv3. IET Image Process. 2022, 16, 134–144. [Google Scholar] [CrossRef]
  20. Pan, H.; Shi, Y.; Lei, X.; Wang, Z.; Xin, F. Fast identification model for coal and gangue based on the improved tiny YOLO v3. J. Real-Time Image Process. 2022, 19, 687–701. [Google Scholar] [CrossRef]
  21. Zhang, Y.; Wang, J.; Yu, Z.; Zhao, S.; Bei, G. Research on intelligent detection of coal gangue based on deep learning. Measurement 2022, 198, 111415. [Google Scholar] [CrossRef]
  22. Li, M.; He, X.; Yuan, Y.; Yang, M. Multiple factors influence coal and gangue image recognition method and experimental research based on deep learning. Int. J. Coal Prep. Util. 2022, 1–17. [Google Scholar] [CrossRef]
  23. Yan, P.; Sun, Q.; Yin, N.; Hua, L.; Shang, S.; Zhang, C. Detection of coal and gangue based on improved YOLOv5.1 which embedded scSE module. Measurement 2022, 188, 110530. [Google Scholar] [CrossRef]
  24. Wang, Z.; Jin, L.; Wang, S.; Xu, H. Apple stem/calyx real-time recognition using YOLO-v5 algorithm for fruit automatic loading system. Postharvest Biol. Technol. 2022, 185, 111808. [Google Scholar] [CrossRef]
  25. Zhang, Y.; Guo, Z.; Wu, J.; Tian, Y.; Tang, H.; Guo, X. Real-Time Vehicle Detection Based on Improved YOLO v5. Sustainability 2022, 14, 12274. [Google Scholar] [CrossRef]
  26. Li, Z.; Xie, W.; Zhang, L.; Lu, S.; Xie, L.; Su, H.; Du, W.; Hou, W. Toward Efficient Safety Helmet Detection Based on YoloV5 with Hierarchical Positive Sample Selection and Box Density Filtering. IEEE Trans. Instrum. Meas. 2022, 71, 1–14. [Google Scholar] [CrossRef]
  27. Li, S.; Zhang, S.; Xue, J.; Sun, H. Lightweight target detection for the field flat jujube based on improved YOLOv5. Comput. Electron. Agric. 2022, 202, 107391. [Google Scholar] [CrossRef]
  28. Park, H.; Yoo, Y.; Seo, G.; Han, D.; Yun, S.; Kwak, N. C3: Concentrated-comprehensive convolution and its application to semantic segmentation. arXiv 2018, arXiv:1812.04920. [Google Scholar]
  29. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–27 July 2017; pp. 2117–2125. [Google Scholar]
  30. Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
  31. Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
  32. Ma, N.; Zhang, X.; Sun, J. Funnel activation for visual recognition. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part XI 16; Springer International Publishing: Cham, Switzerland, 2020; pp. 351–368. [Google Scholar]
  33. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  34. Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv 2022, arXiv:2206.02424. [Google Scholar]
  35. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
  36. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
  37. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  38. Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
  39. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  40. Liu, Y.; Shao, Z.; Hoffmann, N. Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar]
  41. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 618–626. [Google Scholar]
  42. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
  43. Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
  44. Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
  45. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
  46. Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
  47. Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar] [CrossRef]
  48. Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Figure 1. Sample pictures of coal and gangue (a) Coal, (b) Gangue (c) Coal mixed with gangue.
Figure 1. Sample pictures of coal and gangue (a) Coal, (b) Gangue (c) Coal mixed with gangue.
Processes 11 01268 g001
Figure 2. Structure of Yolov5s network.
Figure 2. Structure of Yolov5s network.
Processes 11 01268 g002
Figure 3. Structure of the DWC and GSC modules.
Figure 3. Structure of the DWC and GSC modules.
Processes 11 01268 g003
Figure 4. Structure of the ECA module.
Figure 4. Structure of the ECA module.
Processes 11 01268 g004
Figure 5. Coal and gangue detection model based on improved Yolov5s. (a) Structure of lightweight backbone network; (b) Structure of lightweight neck network.
Figure 5. Coal and gangue detection model based on improved Yolov5s. (a) Structure of lightweight backbone network; (b) Structure of lightweight neck network.
Processes 11 01268 g005
Figure 6. Workflow diagram of the proposed algorithm.
Figure 6. Workflow diagram of the proposed algorithm.
Processes 11 01268 g006
Figure 7. Training process of the improved Yolov5s: (a) training and validation loss; (b) validation curve of accuracy.
Figure 7. Training process of the improved Yolov5s: (a) training and validation loss; (b) validation curve of accuracy.
Processes 11 01268 g007
Figure 8. Detection results of improved Yolov5s.
Figure 8. Detection results of improved Yolov5s.
Processes 11 01268 g008
Figure 9. Visualization of feature maps using different attention mechanisms.
Figure 9. Visualization of feature maps using different attention mechanisms.
Processes 11 01268 g009
Figure 10. Results of ablation experiments.
Figure 10. Results of ablation experiments.
Processes 11 01268 g010
Figure 11. Comparison of detection results for coal and gangue.
Figure 11. Comparison of detection results for coal and gangue.
Processes 11 01268 g011
Table 1. Coal and gangue data set information.
Table 1. Coal and gangue data set information.
SetTarget BoxNumber of Images
CoalGangueTotal
Training6406401280480
Validation160160320120
Total8008001600600
Table 2. Hyperparameters of the detection model.
Table 2. Hyperparameters of the detection model.
HyperparametersThe Numerical
Epoch300
Batch size8
Initial learning rate0.01
Final learning rate0.0001
Momentum0.937
Weight decay0.0005
Warmup epochs3.0
Warmup momentum0.8
Warmup bias learning rate0.1
Mosaic1.0
Table 3. Coordinates and score of bounding box.
Table 3. Coordinates and score of bounding box.
SampleDetection Result [Label: Score ( x lt , y lt ), ( x rb , y rb )]
LabelScore ( x lt , y lt ) ( x rb , y rb )
(a)Coal0.95(782, 727)(1036, 687)
(b)Gangue0.91(529, 437)(865, 397)
(c)Gangue0.90(314, 661)(650, 621)
Coal0.91(504, 996)(758, 956)
Gangue0.91(1302, 418)(1638, 378)
Gangue0.95(681, 252)(1017, 212)
Coal0.96(1162, 843)(1416, 803)
Coal0.97(916, 603)(1170, 563)
Table 4. Comparison of different lightweight schemes.
Table 4. Comparison of different lightweight schemes.
SchemeDWConvGSConvmAP(%)Paras (M)GFLOPsSize (M)
BackboneNeckBackboneNeck
1×××0.9794.3813.59.7
2×××0.9744.3011.39.4
3××0.9261.415.53.8
4×××0.9746.0417.413.1
5×××0.9825.2813.911.4
6××0.9844.0411.99.2
7××0.984.9115.210.8
8××0.9782.398.15.9
9×0.983.179.87.4
10××0.9783.069.37.2
11××0.9844.8013.010.5
12×0.9783.2310.87.6
13×0.9021.947.15.0
14×0.9811.917.24.9
150.9742.448.96.0
Table 5. Comparison experiments of attention mechanisms.
Table 5. Comparison experiments of attention mechanisms.
ModelsPRF1-ScoremAPParas (M)GFLOPsSize (M)
Scheme_140.9540.9810.9680.9811.917.24.9
+ECA0.970.9820.9760.9851.917.24.9
+CA0.9570.9630.9600.9711.947.24.9
+CBAM0.960.9630.9620.9771.947.34.9
+GAM0.9540.9760.9650.9793.658.68.3
Table 6. Results of ablation experiments.
Table 6. Results of ablation experiments.
ModelImprovement StrategyAverage PrecisionmAPF1-ScoreSize (M)
CoalGangue
AYolov5s0.980.9780.9790.95914.1
BModel A + FReLU0.9880.9760.9820.97114.4
CModel B + Small target detection head0.9830.9880.9850.97215.3
DModel C + Lightweight0.9830.9810.9810.9684.9
EModel D + ECA0.9860.9840.9850.9764.9
Table 7. Performance of different models.
Table 7. Performance of different models.
ModelsPRF1-ScoremAPParas (M)GFLOPsSize (M)
Yolov5s0.950.9680.9590.9797.0215.814.1
Yolov3-tiny0.9680.9780.9730.9868.6712.917.0
Yolov4-tiny0.9330.9530.9430.9725.5916.223.0
Yolox-tiny0.9470.9780.960.9875.066.4519.9
Yolov6-tiny0.970.9690.9690.98410.5820.438.3
Yolov7-tiny0.9840.9650.9740.9876.0213.212.3
Yolov5(GhostNet)0.9710.9720.9720.9828.8714.117.9
Yolov5(MobileNetv3)0.9640.9730.9690.9863.546.37.3
Yolov5(ShuffleNetv2)0.9740.9690.9720.9893.798.07.8
Improved Yolov5s0.970.9820.9760.9851.917.24.9
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cao, Z.; Fang, L.; Li, Z.; Li, J. Lightweight Target Detection for Coal and Gangue Based on Improved Yolov5s. Processes 2023, 11, 1268. https://doi.org/10.3390/pr11041268

AMA Style

Cao Z, Fang L, Li Z, Li J. Lightweight Target Detection for Coal and Gangue Based on Improved Yolov5s. Processes. 2023; 11(4):1268. https://doi.org/10.3390/pr11041268

Chicago/Turabian Style

Cao, Zhenguan, Liao Fang, Zhuoqin Li, and Jinbiao Li. 2023. "Lightweight Target Detection for Coal and Gangue Based on Improved Yolov5s" Processes 11, no. 4: 1268. https://doi.org/10.3390/pr11041268

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop