Intelligent Gangue Sorting System Based on Dual-Energy X-ray and Improved YOLOv5 Algorithm

Qin, Yuchen; Kou, Ziming; Han, Cong; Wang, Yutong

doi:10.3390/app14010098

Open AccessArticle

Intelligent Gangue Sorting System Based on Dual-Energy X-ray and Improved YOLOv5 Algorithm

by

Yuchen Qin

^1,2,

Ziming Kou

^1,2,*,

Cong Han

^1,2 and

Yutong Wang

^1,2

¹

School of Mechanical Engineering, Taiyuan University of Technology, Taiyuan 030024, China

²

Shanxi Provincial Engineering Laboratory for Mine Fluid Control, Taiyuan 030024, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(1), 98; https://doi.org/10.3390/app14010098

Submission received: 24 November 2023 / Revised: 15 December 2023 / Accepted: 19 December 2023 / Published: 21 December 2023

(This article belongs to the Collection Computer Vision and Machine Learning: Theory, Methods and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Intelligent gangue sorting with high precision is of vital importance for improving coal quality. To tackle the challenges associated with coal gangue target detection, including algorithm performance imbalance and hardware deployment difficulties, in this paper, an intelligent gangue separation system that adopts the elevated YOLO-v5 algorithm and dual-energy X-rays is proposed. Firstly, images of dual-energy X-ray transmission coal gangue mixture under the actual operation of a coal mine were collected, and datasets for training and validation were self-constructed. Then, in the YOLOv5 backbone network, the EfficientNetv2 was used to replace the original cross stage partial darknet (CSPDarknet) to achieve the lightweight of the backbone network; in the neck, a light path aggregation network (LPAN) was designed based on PAN, and a convolutional block attention module (CBAM) was integrated into the BottleneckCSP of the feature fusion block to raise the feature acquisition capability of the network and maximize the learning effect. Subsequently, to accelerate the rate of convergence, an efficient intersection over union (EIOU) was used instead of the complete intersection over union (CIOU) loss function. Finally, to address the problem of low resolution of small targets leading to missed detection, an L2 detection head was introduced to the head section to improve the multi-scale target detection performance of the algorithm. The experimental results indicate that in comparison with YOLOv5-S, the same version of the algorithm proposed in this paper increases by 19.2% and 32.4% on mAP @.5 and mAP @.5:.95, respectively. The number of parameters decline by 51.5%, and the calculation complexity declines by 14.7%. The algorithm suggested in this article offers new ideas for the design of identification algorithms for coal gangue sorting systems, which is expected to save energy and reduce consumption, reduce labor, improve efficiency, and be more friendly to the embedded platform.

Keywords:

hardware device; gangue recognition; object detection; YOLOv5; lightweight network; deep learning

1. Introduction

As a large coal-producing country, China’s annual coal production is up to billions of tonnes, but the gangue content of the mined raw coal is as high as 10% to 30%. The low carbon content and high ash characteristics of gangue cause a decline in the quality of raw coal, thus affecting the use value and economic value of coal [1]. In addition, in the main coal flow transport system, the presence of large gangue may lead to longitudinal tears, surface scratches, and other damage to the conveyor belt of the belt conveyor system, and if these problems are not detected and dealt with promptly, the damage may further expand or even lead to the occurrence of major accidents [2]. Thus, the efficient sorting of gangue is the first important link in the deep processing of coal.

Manual gangue sorting is gradually being eliminated due to high labor intensity, poor working environment, and low efficiency [3]. To change the above gangue sorting method and enhance the level of coal mine intelligence, many scholars have conducted in-depth research on an automatic gangue sorting system. Dou et al. [4] used machine vision as a means of identifying gangue and then utilized a high-pressure air gun to reject it. An intelligent sorting system for gangue based on an embedded artificial intelligence development platform was proposed by Wang et al. [5]. Image acquisition and processing were completed by the intelligent vision systems. After aligning the camera coordinate system with the robot coordinate system, the robot arm was controlled to pursue and classify the gangue. A near-infrared camera was used with a visible light camera to build a multi-level fusion recognition system by Ding et al. [6] to provide a new idea for the robot to automatically sort coal and gangue. Sun et al. [7] established a smart visual coal gangue separation framework on the basis of CoppeliaSim, modeled the raw coal and gangue blends, manipulator control system, material conveyor belt, and image recognition system, and restored the actual working conditions of gangue splitting. Feng et al. [8] integrated machine vision with robotics to construct a gangue robotic sorting system, and optimized the robot controller to improve control precision.

In the coal gangue automatic sorting system, as only the precise identification of coal and gangue can control the subsequent actuators to sort coal and gangue, the precise identification of coal and gangue is particularly important. With the advancement of computing power and the development of intelligent algorithms, research on coal gangue recognition technology shows a clearly deepening trend. Currently, the mainstream recognition methods include visible light-imaging-based image recognition technology and transmission-imaging-based image recognition technology.

Image recognition technology based on visible light imaging uses a camera to image the flow of coal on the conveyor belt and adopts machine learning algorithms to classify coal and gangue based on the different textures, grey values, and other feature information presented on the digital image. Hu et al. [9] proposed the idea of combining infrared images with convolutional neural networks (CNN) to distinguish coal and gangue. Li et al. [10] proposed an image-based deep learning framework for coal and gangue stratification detection, which provided a solution to the problem of multi-scale extraction of image features and the difficulty of recognizing multiple randomly located targets. Su et al. [11] raised a way of accessing an improved LeNet5 model to identify coal gangue, which could automatically extract coal gangue image features and precisely classify them. Based on the idea of transfer learning and VGG16, Pu et al. [12] constructed a CNN model to distinguish between coal and gangue images, which had a good recognition effect. In addition, the single-stage approach represented by a series of YOLO algorithms, which did not need to generate the area suggestion network in advance, could directly obtain the category and coordinate information of the target object [13], and had a faster detection speed and powerful target detection capability, thus the algorithm has been favored by many researchers. Zhang et al. [14,15] proposed an approach of combining segmentation networks with YOLOv5, which enabled the network to take into account both target segmentation and target detection and achieved integrated multi-task detection. They used this method and studied the belt conveyor runout faults, and enhanced the network’s capability to detect straight lines by combining the outputs of the improved YOLOv5 network, thus solving the problem of fast feature extraction and deviation judgment of conveyor belt edges in complex backgrounds. Yan et al. [16] improved the YOLOv5 model and put forward an approach to classify coal gangue on the basis of multispectral imaging techniques and object detection. Not only could it achieve the precise identification of the gangue, but it also obtained information about the relevant locations of the gangue, which could be applied to the target detection task of the gangue. Sun et al. [17] studied the classification and location of coal and gangue, proposed a dynamic object detection approach in view of CG-YOLO, and used robots to grab irregularly shaped gangue to form a complete separation system. Luo et al. [18] improved the backbone and neck of YOLOv5, implemented a deep lightweight target inspection network, and ensured speed and precision in detecting multiple types of materials on belt conveyors. Xu et al. [19] studied the issues of remote sensing object detection and proposed a new remote sensing object detection model using YOLO-v3 to raise the precision and real-time performance of remote sensing object detection. Mao et al. [20] added a lightweight non-parametric attention mechanism based on YOLOv7 and introduced deep separable convolution in the backbone network to achieve efficient recognition of foreign objects. It is worth mentioning that the precision of the image-based recognition method is easily affected by the downhole lighting conditions, material surface, working environment, and camera exposure time, in addition to the camera’s mounting angle, height, and focal length settings which will also have an impact on the recognition efficiency.

Image recognition technology based on transmission imaging relies on high-energy rays (e.g., gamma rays, X-rays) to conduct transmission imaging of the raw coal flow on the conveyor belt and to differentiate between coal and gangue mixture on the basis of the fact that coal and gangue have different densities. Zhang et al. [21] developed a coal gangue-identifying system according to natural gamma ray technology, which achieved the automation of coal and gangue identification. Zhou et al. [22] combined dual-energy X-ray with Geant4 simulation image processing technology, and used the R-value method to effectively identify coal and gangue of various shapes and thicknesses. Abbasi et al. [23] proposed a linear function fitting method as a basis for obtaining atomic numbers of uncertain materials, which provided the possibility of separating the atomic numbers of substances with unit resolution. Wang et al. [24] combined the receptive field module with the U-Net model to provide a reference to address the issue of gangue clinging or obscuring impacting the precision of intelligent recognition.

The above scholars have made great progress in the research of automatic gangue sorting systems and gangue identification methods, but there are still shortcomings. Most of the research on automatic gangue sorting systems is in the experimental stage, which has limited ability to sort gangue with large particle size and dense distribution, and cannot overcome the influence of the poor underground environment. In the field of coal gangue recognition, the gangue recognition method based on industrial camera images can capture more gangue features, however, this method has difficulty achieving ideal recognition results for coal gangue that is seriously wrapped by coal sludge in the coal mine. In addition, although the image recognition method based on projected imaging is not easily disturbed by the complex environment of coal mines, a series of algorithms represented by the R-value method mostly have complex calculation, difficultly in determining the threshold value, and rely on a large amount of manual experience, which greatly limits the application of this method in the actual scene. Therefore, to meet the demand for efficient and intelligent sorting in coal mines and to maximize the recognition precision and sorting efficiency of coal gangue, this paper combines the advantages of YOLOv5 precision and efficiency with the weak environmental sensitivity of the projection recognition method, and proposes a target detection algorithm for coal gangue based on the combination of dual-energy X-rays and the improved YOLOv5 model (DX-YOLOv5), together with the design of a new type of intelligent gangue sorting system which can sort the gangue with high precision and efficiency without the intervention of human beings and can overcome the harsh underground conditions without being influenced by the temperature, dust, and hazardous gases.

The principal contributions of this paper are as follows. First, dataset construction: given that there is no publicly available dataset of dual-energy X-ray images of coal and gangue, this paper creates a customized dataset for learning and testing by collecting coal mine data in the field and combining it with LableImg. Second, backbone network: to address the fact that the network parameters and calculation complexity of CSPDarknet consume a large amount of computing resources of hardware devices, in this paper, EfficientNetv2, a lightweight network, is used as the backbone network of YOLOv5-S to enhance computational productivity and precision. Third, neck design: to achieve further network performance enhancement by lightweighting the network, this paper designs a LPAN in the neck instead of the original PAN module, and seamlessly integrates the CBAM into the feature fusion module to maximize the learning effect. Fourth, detection head optimization: to promote the multi-scale target inspection performance of the algorithms presented by this paper, this paper adds the L2 output block to achieve the detection of small targets with a resolution of 160 × 160. Fifth, modification of the loss function: to achieve a higher rate of convergence and optimized positioning, this paper chooses the EIOU-loss to substitute the CIOU-loss in the previous network architecture to improve regression precision. Sixth, the design of a new type of intelligent gangue sorting system, which can overcome the harsh environment of coal mines and achieve high-speed and high-efficiency sorting of coal gangue with a large range of particle size.

The remainder of this article is as follows. Section 2 presents the process flow of the coal gangue sorting system designed in this paper and the principle of dual-energy X-ray identification. Section 3 describes the structure of YOLOv5 and the main points of creativity in this paper. Section 4 gives the relevant experiments of this paper’s methodology on the self-built dataset and the performance is compared with that of other classically based algorithms. Finally, Section 5 gives the conclusion.

2. Coal Gangue Separation System Based on the DX-YOLOv5 Recognition Model

Figure 1 illustrates the process sequence of the newly developed coal gangue intelligent sorting system. This system is based on the DX-YOLOv5 identification model and incorporates coordinated control of the side-by-side seamless group skateboard. Raw coal enters the roll screen after crushing, the under-screen material enters the final coal belt, and the over-screen material is sent to the lower detection belt by the spreader. The DX-YOLOv5 identification system arranged on the lower inspection belt recognizes the coal and gangue, measures the density, category, and coordinates of the material, and sends the information to the intelligent group skateboard controller. At the terminal of the conveyor belt, the intelligent skateboard accepts the superior order and separates the coal and gangue into different belts, accomplishing the separation of coal and gangue.

Specific performance parameters as shown in Table 1, this system is not only applicable to the ground coal processing plant, but can also be applied to the underground raw coal direct sorting, and the formation of a “coal mining → sorting → filling” cycle closed mining system to achieve the “gangue underground filling + ground backfill”, to promote gangue solid waste recycling.

2.1. Principle of Dual-Energy X-ray Recognition

X-rays are known for their high penetration power. When X-rays pass through different substances, they experience varying degrees of intensity attenuation. This attenuation follows the Lambert–Beer theorem [25].

I_{L} = {I_{L 0 e}}^{- μ_{m L} ρ t}

(1)

I_{H} = {I_{H 0 e}}^{- μ_{m H} ρ t}

(2)

where

I_{L}

and

I_{H}

are the intensity values after radiation from low- and high-energy X-rays;

I_{L 0}

and

I_{H 0}

are the original intensity values before the transmission of dual-energy X-rays;

μ_{m L}

and

μ_{m H}

are the linear attenuation coefficients of dual-energy X-rays;

ρ

are the densities of the objects that are transmitted; and

t

are the thicknesses of the objects that are transmitted. Since the coal and gangue thickness parameters will interfere with the recognition results after imaging, the ratio of Equation (1) to Equation (2) is used to obtain the characteristic property R-value of the object under the monochromatic spectrum.

R = \frac{μ_{m L}}{μ_{m H}} = \frac{\ln (\frac{I_{L 0}}{I_{L}})}{\ln (\frac{I_{H 0}}{I_{H}})}

(3)

However, in industrial applications, dual-energy X-ray sources produce a continuous spectrum. Consider the following issues: (1) the intensity of continuous X-ray penetration in the process of material will be reduced with the increase in the thickness of the object, and the average wavelength will move in the short direction, ultimately leading to an increase in high-energy particles that beam hardening phenomenon on the gray value of the imaging after the impact; (2) the coal gangue mixture is transported on a belt to the area covered by X-rays, however, the signal collector is unable to immediately capture the sudden change in the intensity of the rays, which results in the afterglow effect, which makes it difficult to preprocess the image; and (3) in the field, manually set thresholds are more often used for the identification of coal gangue, which relies on subjective experience, and the lack of sufficient supporting statistical data makes it very easy to misjudge the phenomenon of impact on the precision of the sorting. Based on this, this paper is based on coal preparation plant and underground gangue high precision, high-quality sorting requirements, to improve the YOLOv5 model as a medium, and the use of deep neural networks to recognize gangue in raw coal online identification.

2.2. Data Collection

Since there is no publicly available dataset on X-ray images of coal gangue mixture material on a belt conveyor, this study collected images at a mine in Shanxi, China, at a belt conveyor operating speed of 4 m/s using a dual-energy X-ray real-time online ash monitor First, we collected a sample set of images with an original resolution of 1280 × 1024. Each of these images contained gangue mixture materials of different sizes and uneven distribution with complexities such as stacking and obscuring. Then, we used the auxiliary annotation tool LabelImg to annotate and visualize these images, thus completing the production of the coal gangue dataset. The image was divided into two categories based on the different ray absorption of coal and gangue: coal and gangue, as shown in the sample legend of Figure 2, where the orange color represents coal and the green color represents gangue. Considering that gangue with coal dust is difficult to recognize in practical applications, we added the category of coal and gangue, which was classified as gangue and discarded in the processing. In the experimental process, we allocated 5600 image samples as the training set and 1400 image samples as the validation set in the ratio of 4:1. The experimental environment is shown in Figure 3.

3. Improved Gangue Recognition Model Based on YOLOv5

3.1. YOLOv5 Model

YOLOv5 is a network architecture for target detection, including four different configurations of S, M, L, and X. As the number of parameters grows, the performance gradually improves. The architecture of the YOLOv5-S model is represented in Figure 4. The main parts of the whole network are input processing, backbone network, neck, and output. The data loading of the network preprocesses the input image by applying mosaic data reinforcement, adaptive anchor frame calculation, and adaptive image scaling. The backbone feature extractor framework uses CSPDarknet, and the modules used are Focus, CBH, SPP, CSP1_X, and CSP2_X; the neck part uses the feature fusion module of FPN + PAN to combine feature maps of different resolutions to obtain more comprehensive information. The prediction part of the bounding box is generated to identify the target position and predict the classification of the target. In short, YOLOv5 is able to locate target objects in an image and determine which category they belong to, with efficient, precise, and fast target detection capabilities.

3.2. Improvement of YOLOv5 Algorithm

3.2.1. Lightweighting of the Backbone Network

This study takes the processing capacity of 120 t/h of the sorting equipment in Table 1 as an example. It was calculated that the time for the one-tonne coal gangue mixture to pass through the recognition system would only be 26~27 ms, which means that the task of detecting the target of gangue precisely and efficiently must be completed in this period. To enhance precision, it is necessary to utilize more intricate network models. However, this can be challenging for embedded devices with limited memory and computational power. To satisfy the demand for real-time handling, there is a need for lightweight object detection frameworks. While CSPDarknet has excellent feature extraction capabilities, its network parameters and computational complexity can heavily burden hardware devices, leading to challenges in multi-scale analysis, feature representation, computational efficiency, and generalization performance in object detection. To tackle this problem, this study proposes combining the lightweight network EfficientNetV2 with YOLOv5 as the backbone network to improve both computational efficiency and precision.

EfficientNetV2, first proposed by Tan et al. [26] in 2021, is a lightweight convolutional neural network designed to improve model training speed and parameter efficiency. It was designed with the goal of obtaining optimal performance with limited computational resources by balancing the depth, width, and resolution of the network. Compared with EfficientNetV1, EfficientNetV2 introduces a new module, Fused-MBConv, which combines multiple residual blocks into a single convolutional layer, thus reducing the number of parameters and calculations in the model. The authors of the article used the training-aware NAS method to determine the correct combination of MBConv and Fused-MBConv, using Fused-MBConv for shallow convolution and MBConv for deep convolution, which allowed the EfficientNetV2 model to have a faster training speed under the same hardware conditions and parameters. Experiments conducted by the authors of the article showed that EfficientNetV2 had an 11-fold faster training speed compared with EfficientNet, and the number of parameters were reduced to 1/6.8 of the original. An improved incremental learning strategy was proposed that introduced adaptive convolution and multi-scale feature extraction, and adjusted the regular approach dynamically to accommodate objects of different scales through the size of the training images. It also employed pre-trained modeling. The use of a pre-trained model enabled it to learn generic features from large-scale data and then adapt them to specific tasks through fine-tuning, thus, EfficientNetV2 is more advantageous in terms of computational efficiency and is suitable for mobile or resource-limited devices.

EfficientNetV2 is primarily comprised of two modules, namely, Fused-MBConv and MBConv. The structure of the MBConv module consists of several components. It starts with a Conv 1 × 1 layer that plays an ascending role, followed by a depthwise conv k × k layer with two main cases: 3 × 3 and 5 × 5. Additionally, there is an SE module, a Conv 1 × 1 layer with a descending role, and a Dropout layer. In the Fused-MBConv module, the main branch of the MBConv structure replaces the Conv 1 × 1 and depthwise conv 3 × 3 layers with a regular Conv 3 × 3 layer. Figure 5 illustrates a detailed comparison of the specific structure.

3.2.2. Enhancement of Feature Fusion Capabilities

Design of LPAN modules

To address the issue of feature loss that can occur in the deep network of the YOLO object detector, YOLOv3 incorporates feature pyramid networks (FPN). While FPN preserves detailed features, it still has limitations in terms of positioning information. To overcome this limitation, PAN introduces base-down and top-up channels to FPN, allowing for the direct transmission of underlying information and enhancing the effective utilization of information. Additionally, to further enhance performance, we developed an LPAN module for PAN, with its structure and function elaborated in Figure 6.

In this study, we also utilized both bottom-up and top-down strategies to improve feature fusion, but with different approaches. We introduced a jump-level connection and feature reuse mechanism and simplified the process by eliminating the intermediate fusion between L3 and L5, replacing it with an add operation to reduce the computational burden. The bottom-up strategy focused on merging more useful features at a lower cost by adding jump-level connections between the initial input and output nodes. On the other hand, the top-down strategy enhanced feature extraction by using a lightweight SE module to prioritize regions of interest. Since low-level features lack semantic information while high-level features are more semantically rich, we believe that the L4 feature layer in the middle position not only contains more comprehensive semantic information but can also be processed without adding excessive computation. Therefore, we connected the intermediate node of L4 to the outputs of L3 and L5 to improve feature reuse and enhance the network’s object detection performance.

The addition of the LPAN module not only reduces the complexity of model calculations but also strengthens the network’s capacity to combine and utilize data. This enables the network to achieve better results in detecting objects. Overall, this optimization measure greatly enhances the feature extraction and object localization capabilities of the network.

2.: Introduction of the CBAM block

To further enhance the performance of the network’s detection, we opted to seamlessly integrate the attention mechanism block into the feature fusion block to maximize the learning effect. Considering the characteristics of different attention mechanisms, CBAM was chosen to be embedded into the BottleneckCSP of the YOLOv5 feature fusion module. As indicated in Figure 7, the CBAM [27] module is made up of input, channel attention, spatial attention, and output. Channel attention is concerned with the allocation relationship of the feature map channels and adaptively learns the weights of each channel to emphasize the important feature channels through parallel global maxima, average pooling, and a multilayer perceptron. Spatial attention is concerned with the location information of the target, which calculates the correlation between the location channels through maximum pooling and average pooling, and uses them as weights for fusing features from different locations. By this means, CBAM can automatically adjust the feature representation in channel and spatial dimensions according to the image content, improving the network’s attention to important features and its ability to capture location information.

3.2.3. Optimization of the Detection Head

Given the large span of the particle size range of target recognition objects in this study and the existence of a large quantity of small objects with complex shapes and random location distribution in the dataset, we found that the performance index of YOLOv5 was low when dealing with the task of small target detection, and it was difficult to precisely obtain the location and detailed information of certain small targets. Therefore, we added the L2 detection head in the head layer to obtain richer semantic information. As shown in the improved network architecture in Figure 8, we added a feature output after the first Fused-MBConv module of the backbone network, by fusing the features with the large-scale feature map obtained after up-sampling with L3 and inputting it to the L2 detection layer for prediction after the CSP2_CBAM module and Conv operation. By fusing the shallow features with the deeper ones in the above way, the output of the network added 160 × 160 feature maps to the 80 × 80, 40 × 40 and 20 × 20 scale feature maps of the original detection head. This enabled the network to have the ability of multi-scale target detection in the target detection task, which effectively improved the phenomenon of omission and misdetection caused by the easy loss of key features of small targets in the downsampling process.

3.2.4. Modification of the Loss Function

The bounding box regression loss for YOLOv5 was calculated using CIOU_Loss, which is calculated as shown below [28]:

v = \frac{4}{π^{2}} {(\arctan \frac{w^{g t}}{h^{g t}} - \arctan \frac{w}{h})}^{2}

(4)

L_{C I O U} = 1 - I O U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v

(5)

α = \frac{v}{(1 - I O U) + v}

(6)

where

v

is the consistency parameter that measures the aspect ratio of the predicted frame to the real frame; w,

w^{g t}

, h and

h^{g t}

represent the width and height of the real and predicted boxes;

ρ^{} (b, b^{g t})

denotes the Euclidean distance between the predicted frame and the center of the real frame;

b

is the center of the predicted frame;

b^{g t}

is the center of the real frame; c refers to the diagonal length of the smallest closure region that can incorporate the predicted and real frames; and

α

is a parameter for balancing ratios.

In terms of CIOU_Loss, while considering the bounding box regression in terms of overlap area, centroid distance, and aspect ratio, in the formula, v reflects the aspect ratio difference instead of the real difference between w, h, and their confidence levels, respectively. In addition, the gradients of w and h concerning v are in opposite directions, which means that both cannot simultaneously be increased or decreased, hindering the effective optimization of the model to some extent. To address this problem, we chose to adopt the EIOU_Loss proposed by Zhang et al. [29], which on the basis of the CIOU penalty term separately calculates the predicted box and true box’s height and width using the influence factors of their aspect ratios, in order to address the issues with CIOU.

L_{E I O U} = L_{I O U} + L_{d i s} + L_{a s p} = 1 - I O U + \frac{ρ^{2} (b, b^{g t})}{{(w^{c})}^{2} + {(h^{c})}^{2}} + \frac{ρ^{2} (w, w^{g t})}{{(w^{c})}^{2}} + \frac{ρ^{2} (h, h^{g t})}{{(h^{c})}^{2}}

(7)

where

w^{c}

and

h^{c}

represent the width and height, respectively, of the smallest outer bounding box that covers the predicted box and the true box, and

ρ^{2}

denoting the squared difference. It can be seen that EIOU mainly includes IOU loss (

L_{I O U}

), distance loss (

L_{d i s}

), and height–width loss (

L_{a s p}

). By further optimizing the definition of the height–width loss, the model pays more attention to the consistency of the width–height ratio between the predicted frame and the actual target frame during the training process, which leads to accelerated model convergence and enhanced positioning effects.

3.3. Enhanced Overall Network Architecture

The improved whole network architecture of this study is depicted in Figure 8, where Rn stands for n repetitions of the module; n × n stands for the size of the convolutional kernel; and Sn stands for the step size of n. The main improvements are the adoption of EfficientNetV2 for the backbone network, the design of an LPAN module in the neck, and the embedding of the CBAM into the BottleneckCSP of the YOLOv5 feature fusion module. Additional improvements include adding an L2 detection head in the head layer, and selecting EIOU_Loss to replace CIOU_Loss in the original network. The whole network architecture is designed to improve the precision, robustness, and efficiency of the model to better meet the requirements of practical application scenarios. With these optimization strategies, we can further improve the performance of the network on the target detection task and enable more precise targeting and identification.

Figure 8. Diagram of the improved overall network architecture. (The * sign in the figure indicates the multiplication sign).

4. Experiments and Discussions

4.1. Experimental Configuration Framework

In this study, we utilized the YOLOv5-3.1 code base as our starting point, and the network was trained using the AdamW optimizer with an incipient learning incidence of 0.01. To enhance training, we used a cyclic learning rate of 0.2 and a weight decay of 0.0005. Throughout the experiment, we ran 200 epochs with batch sizes of 8. During the warm-up stages, the momentum was set to 0.8 and then changed to 0.937. All picture input sizes were changed to 640 × 640 by web preprocessing. For the sake of equitable comparison, we employed the default data augmentation method and did not use any pre-training models.

The computer setup utilized in this experiment consisted of a PC platform with an Intel Core i5-9300H CPU running at 2.4 GHz, -an NVIDIA GeForce GTX 1660 Ti GPU, 16 GB of memory, and the Windows 11 operating system. The development framework employed was PyTorch, specifically Python version 3.7.11 with Torch version 1.7.0 and Torchvision version 0.8.1.

4.2. Performance Evaluation

The purpose of this section is to assess the performance of the enhanced YOLOv5-S model compared with its baseline model YOLOv5-S using different performance metrics. We utilized precision (P), recall (R), and mean average precision (mAP) as indicators to assess the improvement. Specifically, we defined the formulas for P and R as follows:

P = \frac{T P}{T P + F P}

(8)

R = \frac{T P}{T P + F N}

(9)

In the given equation, the symbol

T P

denotes the quantity of the true positive instances,

F P

denotes the quantity of false positive samples, and

F N

denotes the quantity of false negative samples.

In the target detection task, mAP is the average of the average detection precision (AP) of all categories within all images and is used as a measure of the algorithm’s performance in predicting target locations and categories. The AP is denoted as the area enclosed by the horizontal and vertical coordinates of the curve in the P-R plot plotted with recall R as the horizontal axis and precision P as the vertical axis.

4.3. Ablation Experiment

To validate the effectiveness of the various improvements proposed in this study for the EfficientNetV2 backbone, the LPAN module, the CBAM module, the L2 detection head, and the EIOU_loss, comparative ablation experiments incorporating each of the different modules, as well as fusing all of them, were designed in this section using YOLOv5-S as the baseline. As indicated in Table 2, the above experiments used the mAP at two different IOU thresholds, the number of parameters (Params), and the computed floating-point operations per second (FLOPs) as performance evaluation metrics.

Analyzing the experimental results, it was found that the conventional CSPDarknet was substituted by EfficientNetV2 as the backbone network. Compared with the original YOLOv5-S model, in the improved network architecture, mAP@.5(%) and mAP@.5:.95(%) indexes were increased by 7.5% and 11.3%, respectively, the number of parameters were reduced by 24.4%, and the computing complexity was reduced by 3.09 G, which not only enhanced the target detection precision but also achieved the lightweight of the network. This was due to the strong feature extraction capability of EfficientNetV2, which can mine more deep features to achieve improved precision, and the Fused-MBConv module, which combines multiple residual blocks into a single convolutional layer, which can further reduce the quantities of parameters in the model and the calculation complexity.

Further examination of the experimental data indicated that integrating the LPAN module enhanced the model’s performance on mAP@.5(%) and mAP@.5:.95(%)by 8.4% and 13.7%, respectively. It is worth noting that although the introduction of the LPAN module did not change the parameters of the model, it slightly increased the computational complexity of the model while keeping the number of model parameters unchanged, due to its introduction of hopping connections and feature reuse mechanisms in the network structure. It was a minor expense that was exchanged for a significant improvement in model performance.

In the YOLOv5 model, we introduced the CBAM module to process the feature maps to achieve adaptive feature refinement. The CBAM module computes the 1D channel and 2D spatial attention maps sequentially and then multiplies them with the input feature maps, which enhances the depiction of characteristics of obstructed targets on the maps and suppresses the expression of the irrelevant features, thus improving the precision of the model detection. From the results, it can be seen that this module improves the mAP@.5 (%), mAP@.5:.95 (%) metrics by 10% and 16%, respectively, which reduces the model’s parameter amount by 13.5% and the computational complexity by 2.1 G.

In order to provide the network with better performance in small target detection tasks, we added the L2 detection head to YOLOv5 for related experiments. The results showed that the separate introduction of the module traded off a 10.9% and 18.5% improvement in the mAP@.5 (%) vs. mAP@.5:.95 (%) metrics, respectively, at a cost of a 5.5% increase in the parametric count of the model and an 8.49 G rise in arithmetic complexity. Although the introduction of the small target detection layer increased the depth of the network increasing the number of network parameters and computational complexity, the method passed more shallow features to deeper features, and improved the ability of the network to learn multi-level feature information of the target, so that the network could detect the smallest target with a resolution of 160 × 160, which could re-detect a large number of the small targets that were originally missed, and effectively improve the target missing detection situation.

Finally, the

L_{a s p}

in EIOU_loss directly minimized the difference between the width and height of the predicted frame and the true frame, resulting in faster convergence, greatly alleviating the problem of positive and negative sample imbalance in the CIOU, and, thus, without altering the model parameters and arithmetic complexity, led to a 3.8% increase in mAP@.5 (%) and a 7.4% increase in mAP@.5:.95 (%), while also enhancing the precision of regression.

By comparing the experimental results of this method and baseline ablation shown in Table 2, it is evident that the incorporation of EfficientNetv2, LPAN, CBAM module, L2 detection head, and EIOU_loss enhanced the precision of the network, with most improvement achieved in the small target detection layer module. Compared with YOLOv5-S, mAP@.5 (%) and mAP@.5:.95 (%) increased by 19.2% and 32.4%, respectively, and the parameters and computational complexity reduced by 3.74 M and 2.49 G. When fusing multiple feature extraction methods, adjustments were necessary in the LPAN feature fusion module to match the output of EfficientNetV2. Specifically, some features with 512 channels were converted to 160 channels, which substantially diminished the parameter count and arithmetic complexity of fusing all methods. The experimental results verified the reliability of the enhanced algorithm in this paper, indicating that the modified model can efficiently and precisely identify coal and gangue.

4.4. Analysis of the Results

Figure 9 shows that the results of training YOLOv5-S and YOLOv5-S on the homemade dataset improve in this paper. As can be seen from the figure, the precision decreases with an increase in recall. This occurs because P represents the ratio of the predicted true positive samples to the total samples predicted as positive by all classifiers. When the recall increases, the model identifies more positive cases as positive, but it also misclassifies more negative cases as positive, leading to a decrease in precision. By conducting a comparison, it is evident that the enhanced network structure discussed in this article achieves significantly higher precision (vertical axis) and recall (horizontal axis) results compared with YOLOv5-S. It also presents specific AP test results for each specific category of object in the dataset. The revised model surpassed the original model in the region enclosed by the P-R curve, both horizontally and vertically, indicating superior generalization performance and stronger feature extraction capabilities. Consequently, mAP experienced a 19.2 percent improvement. Simultaneously, the detection precision for the two categories of gangue, and coal and gangue, was also enhanced. By referring to Figure 9a, it is evident that the YOLOv5-S model exhibited superior identification capability for coal, with a precision as high as 0.926; the recognition performance for coal and gangue was moderate, with a precision of 0.716; and the recognition performance for gangue was the most inferior, with a precision of only 0.561. From Figure 9b, it can be observed that the enhanced YOLOv5-S model presented the best recognition performance for coal and gangue, with a precision as high as 0.919; the recognition performance for coal was moderate, with a precision of 0.896; and the recognition performance for gangue was inferior, with a precision of 0.811.

To further validate the above indexes, the improved YOLOv5-S and YOLOv5-S were compared to detect the same batch of gangue materials in the actual production environment. Figure 10a shows the detection of gangue mixture material using the YOLOv5-S network, and Figure 10b shows the detection of the same batch of mixture material using the improved YOLOv5-S with labeled target categories and confidence levels. It was observed that the improved YOLOv5-S network could identify and locate coal, gangue, and coal and gangue with higher confidence compared with the YOLOv5-S network, and had better identification results in the case of dense targets and many small targets. The YOLOv5-S network had a low confidence level, and the phenomenon of repeated recognition also occurred. Thus, the improved YOLOv5-S network had stronger target detection capability.

As seen in Figure 11, the confidence level of 0.6 was considered as the reference point. Materials with a confidence level higher than 0.6 were identified as coal, while materials with a confidence level lower than 0.6 were classified as gangue. The DX-YOLOv5 recognition model analyzed the density, composition, and coordinates of the materials and transmitted this information to the host computer. According to the size of the gangue particles, the main computer determined the number of slide boards needed to track and separate the object gangue. It then sent a command to the intelligent slide board controller based on the time it took to identify the material and the velocity of the conveyor belt. When the object gangue reached the end of the belt, the control system activated the intelligent slide plate to extend, catch the gangue, and then retract. The gangue that was caught moved along the slide plate toward the gangue belt, while the coal pieces fell directly onto the coal belt. This process effectively separated the coal from the gangue.

4.5. Comparison with Other SOTA Models

To validate the enhanced YOLOv5-S algorithm, this study assessed its performance by conducting experiments on a custom dataset. The proposed method was compared with NanoDet-Plus, YOLOv5 series, and YOLOv7 under identical experimental conditions. The performance measures utilized in this study comprised parameters, FLOPs, mAP, latency, and model size. The specific results are showcased in Table 3.

In analyzing the results of Table 3, our method performed well at the same level as the YOLOv5 algorithm and reduced the model size, params and calculated complexity while successfully maintaining the model precision. This fully demonstrates the efficacy of the enhanced strategy in object inspection and recognition tasks. Since we have taken measures such as a lighter backbone network, depthwise separable convolutions, limiting the maximum number of channels, and introducing a small target detection layer, our model exhibits a substantial decrease in size, parameter count, and computational complexity in comparison with YOLOv7. At the same time, the precision is also significantly improved. Further analysis of YOLOv7’s mAP @.5 and mAP @.5:.95 scores did not reach the best level and reached the highest in terms of parameter number and calculation amount, which may be attributed to overfitting caused by the excessive size of the model. Compared with NanoDet-Plus, the improved YOLOv5-S is comparable in terms of parameter quantity. Although its computational complexity and detection speed are slightly inferior, our model is much larger than NanoDet-Plus in mAP, which shows the superiority of our model in handing small target detection tasks.

5. Conclusions

In this paper, a new gangue sorting system with dual-energy X-ray recognition and improved YOLOv5 algorithm was proposed, which improves the YOLOv5-S network model by five methods: using EfficientNetV2 for the backbone network; designing the LPAN module in the neck and embedding the CBAM into the BottleneckCSP of the YOLOv5 feature fusion module; adding the L2 detection header in the head layer; choosing EIOU_Loss to replace the CIOU_Loss in the original network; and improving the recognition precision and decreasing the computational complexity. After conducting experiments to validate the methodologies, the following verdict was reached: in comparison with the YOLOv5-S, the updated model showed a 19.2% increase in mAP@.5 and a 32.4% increase in mAP@.5:.95. Meanwhile, there was a 51.5% reduction in parameters, and a 14.7% reduction in computational complexity. In addition, the proposed method had better overall performance improvement as well as less resource consumption compared with other mainstream algorithms, which is more suitable for deployment and implementation on embedded devices. Specifically, the improved network architecture reduced 88.3%, 47.5%, and 86.4% in terms of model parameters, inference time, and computational cost, respectively, compared with YOLOv7 when the model size was only 7.10 MB. It is anticipated that the approach suggested in this study will be beneficial to a wider range of researchers in the identification and inspection of coal gangue on a conveyor belt, which holds significant practical application value for the intelligent sorting of coal gangue.

In our future endeavors, we will focus on optimizing test results for hardware devices with low computational power, i.e., high quality inference tests with limited computational power. Additionally, we will conduct in-depth research on image preprocessing techniques to minimize and avoid the impact of light artifacts.

Author Contributions

Conceptualization, Y.Q.; Funding acquisition, Z.K.; Investigation, Y.W.; Methodology, Y.Q.; Project administration, Z.K.; Software, Y.Q.; Validation, Y.Q.; Writing—original draft, Y.Q.; Writing—review & editing, Z.K. and C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (52174147) the Research on Underground Complex Working Condition Inspection Robot Based on Domestic Dragon Core Control (a Key R&D Plan Project in Shanxi Province), the Shanxi Province Science and Technology Innovation Leading Talent Team Project on Mining Equipment Artificial Intelligence and Robotization Technology (project number:202204051002003), the Key Technology Research and Industrialization of Intelligent Lower Transportation Expandable Belt Conveyor Equipment (Taishan Industry Leading Talent Program), and the Unattended Management System of Belt Conveyor Based on Multi-sensing Technology.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Acknowledgments

The authors express their gratitude to Wang Jin and Xu Jiang from Shanghai Shamin Intelligent Technology Co., Ltd. for their invaluable contribution in providing generalized data. The authors also extend their appreciation to the anonymous reviewers for their valuable feedback.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gao, H.; Huang, Y.; Li, W.; Li, J.; Ouyang, S.; Song, T.; Lv, F.; Zhai, W.; Ma, K. Explanation of heavy metal pollution in coal mines of China from the perspective of coal gangue geochemical characteristics. Environ. Sci. Pollut. Res. Int. 2021, 28, 65363–65373. [Google Scholar]
Zhang, M.; Shi, H.; Zhang, Y.; Yu, Y.; Zhou, M. Deep learning-based damage detection of mining conveyor belt. Measurement 2021, 175, 109130. [Google Scholar]
Wang, Z.; Xie, S.; Chen, G.; Chi, W.; Ding, Z.; Wang, P. An Online Flexible Sorting Model for Coal and Gangue Based on Multi-Information Fusion. IEEE Access 2021, 9, 90816–90827. [Google Scholar]
Dou, D.; Wu, W.; Yang, J.; Zhang, Y. Classification of coal and gangue under multiple surface conditions via machine vision and relief-SVM. Powder Technol. 2019, 356, 1024–1028. [Google Scholar]
Wang, G.; Su, T.; Liu, W.; Qian, Z.; Li, J. Design of intelligent coal and gangue sorting system based on EAIDK. Ind. Mine Autom. 2020, 46, 105–108. [Google Scholar]
Ding, Z.; Chen, G.; Wang, Z.; Fan, Y. A Real-Time Multilevel Fusion Recognition System for Coal and Gangue Based on Near-Infrared Sensing. IEEE Access 2020, 8, 178722–178732. [Google Scholar]
Sun, Z.; Li, D.; Huang, L.; Liu, B.; Jia, R. Construction of intelligent visual coal and gangue separation system based on CoppeliaSim. In Proceedings of the 2020 5th International Conference on Automation, Control and Robotics Engineering (CACRE), Dalian, China, 19–20 September 2020; pp. 560–564. [Google Scholar]
Feng, C.; Nie, G.; Naveed, Q.N.; Potrich, E.; Sankaran, K.S.; Kaur, A.; Sammy, F. Optimization of Sorting Robot Control System Based on Deep Learning and Machine Vision. Math. Probl. Eng. 2022, 2022, 1–7. [Google Scholar]
Hu, F.; Bian, K. Accurate Identification Strategy of Coal and Gangue Using Infrared Imaging Technology Combined with Convolutional Neural Network. IEEE Access 2022, 10, 8758–8766. [Google Scholar]
Li, D.; Zhang, Z.; Xu, Z.; Xu, L.; Meng, G.; Li, Z.; Chen, S. An Image-Based Hierarchical Deep Learning Framework for Coal and Gangue Detection. IEEE Access 2019, 7, 184686–184699. [Google Scholar]
Su, L.; Cao, X.; Ma, H.; Li, Y. Research on Coal Gangue Identification by Using Convolutional Neural Network. In Proceedings of the 2018 2nd IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Xi’an, China, 25 May 2018; pp. 810–814. [Google Scholar]
Pu, Y.; Apel, D.B.; Szmigiel, A.; Chen, J. Image Recognition of Coal and Coal Gangue Using a Convolutional Neural Network and Transfer Learning. Energies 2019, 12, 1735. [Google Scholar]
Pan, H.; Shi, Y.; Lei, X.; Wang, Z.; Xin, F. Fast identification model for coal and gangue based on the improved tiny YOLO v3. J. Real-Time Image Process. 2022, 19, 687–701. [Google Scholar]
Zhang, M.; Jiang, K.; Cao, Y.; Li, M.; Wang, Q.; Li, D.; Zhang, Y. A new paradigm for intelligent status detection of belt conveyors based on deep learning. Measurement 2023, 213, 112735. [Google Scholar]
Zhang, M.; Jiang, K.; Cao, Y.; Li, M.; Hao, N.; Zhang, Y. A deep learning-based method for deviation status detection in intelligent conveyor belt system. J. Clean. Prod. 2022, 363, 132575. [Google Scholar]
Yan, P.; Sun, Q.; Yin, N.; Hua, L.; Shang, S.; Zhang, C. Detection of coal and gangue based on improved YOLOv5.1 which embedded scSE module*. Measurement 2021, 188, 110530. [Google Scholar]
Sun, Z.; Huang, L.; Jia, R. Coal and Gangue Separating Robot System Based on Computer Vision. Sensors 2021, 21, 1349. [Google Scholar]
Luo, B.; Kou, Z.; Han, C.; Wu, J.; Liu, S. A Faster and Lighter Detection Method for Foreign Objects in Coal Mine Belt Conveyors. Sensors 2023, 23, 6276. [Google Scholar]
Xu, D.; Wu, Y. Improved YOLO-V3 with DenseNet for Multi-Scale Remote Sensing Target Detection. Sensors 2020, 20, 4276. [Google Scholar] [CrossRef]
Mao, Q.; Li, S.; Hu, X.; Xue, X.; Yao, L. Foreign object recognition of coal mine belt conveyors based on improved YOLOv7. Ind. Min. Autom. 2022, 48, 26–32. [Google Scholar]
Zhang, N.; Liu, C. Radiation characteristics of natural gamma-ray from coal and gangue for recognition in top coal caving. Sci. Rep. 2018, 8, 190. [Google Scholar]
Zhou, J.; Guo, Y.; Wang, S.; Chen, W.; Cheng, G. Identification of coal and gangue based on R value method for dual-energy X-ray of Geant4 simulation. Environ. Res. 2023, 226, 115650. [Google Scholar]
Abbasi, S.; Mohammadzadeh, M.; Zamzamian, M. A novel dual high-energy X-ray imaging method for materials discrimination. Nucl. Instrum. Methods Phys. Res. 2019, 930, 82–86. [Google Scholar]
Wang, W.; Huang, J.; Wang, X.; Shi, Y.; Wu, G. X-ray transmission intelligent coal-gangue recognition method. J. Mine Autom. 2022, 48, 27–32. [Google Scholar]
Li, D.; Huang, X.; Ni, H.; Tang, W. Research on coal gangue separation method based on dual energy X-ray image technology. In Proceedings of the Proc. SPIE 12167, Third International Conference on Electronics and Communication, Network and Computer Technology (ECNCT 2021), Harbin, China, 3–5 December 2021. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNetV2: Smaller Models and Faster Training. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Computer Vision—ECCV 2018. ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 11211. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019. [Google Scholar]
Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. Neurocomputing 2022, 506, 146–157. [Google Scholar]

Figure 1. Coal gangue separation system process flow.

Figure 2. The sample image.

Figure 3. Experimental environment: (a) 3-D model of dual-energy X-ray coal separation system; (b) experimental equipment diagram.

Figure 4. YOLOv5-S model structure. (The * sign in the figure indicates the multiplication sign).

Figure 5. Comparison of MBConvh and Fused-MBConv modules: (a) MBConv module; (b) fused-MBConv module.

Figure 6. Comparison of PAN and LPAN modules: (a) PAN, (b) LPAN.

Figure 7. CBAM module diagram.

Figure 9. YOLOv5-S and improved YOLOv5-S average precision and recall rate comparison charts: (a) YOLOv5-S; (b) improved YOLOv5-S.

Figure 10. YOLOv5-S and improved YOLOv5-S coal gangue detection effect comparison chart: (a) YOLOv5-S; (b) improved YOLOv5-S.

Figure 11. Collaborative control of intelligent identification and sorting of coal gangue.

Table 1. Coal gangue intelligent sorting system performance parameters (t/h indicates tonnes per hour).

Parameters	Numerical Value
Belt width	1400 mm
Belt speed	2 m/s
Sorting grain size	50–600 mm
Sorting volume	120 t/h
Ratio of coal in gangue	<1%
Gangue sorting rate	>90%
Size	5000 mm × 2600 mm × 2600 mm
System power	100 kW (50 kW)@ 380 V

Table 2. Comparison of experimental results between this method and baseline ablation (data in parentheses are the relative rate of change of each parameter relative to the baseline).

EfficientNetV2	LPAN	CBAM	L2	EIOU_Loss	mAP@.5 (%)	mAP@.5:.95 (%)	Params (M)	FLOPs (G)
					73.4	37.9	7.26	16.92
√					78.9 (+7.5%)	42.2 (+11.3%)	5.49 (−24.4%)	13.83(−18.3%)
	√				79.6 (+8.4%)	43.1 (+13.7%)	7.26 (0)	16.93 (+0.06%)
		√			80.8 (+10%)	44.0 (+16%)	6.28 (−13.5%)	14.82 (−12.4%)
			√		81.4 (+10.9%)	44.9 (+18.5%)	7.66 (+5.5%)	25.41 (+50.2%)
				√	76.2 (+3.8%)	40.7 (+7.3%)	7.26 (0)	16.92 (0)
√	√	√	√	√	87.5(+19.2%)	50.2 (+32.4%)	3.52 (−51.5%)	14.43 (−14.7%)

Table 3. Comparing the property of dissimilar excellent algorithms on self-constructed gangue dataset.

Model	Params (M)	FLOPs (G)	mAP@.5 (%)	mAP@.5:.95 (%)	Latency/MS	Model Size (MB)
NanoDet-Plus	1.17	3.72	62.18	25.38	21.30	2.24
YOLOv5-N	1.85	4.55	66.51	29.93	22.80	3.83
YOLOv5-S	7.26	16.92	73.40	37.90	27.40	14.20
YOLOv5-M	21.56	51.82	84.82	46.79	31.60	43.20
Improved YOLOv5-S	3.52	14.43	87.50	50.20	25.30	7.10
YOLOv7	37.62	106.47	84.53	46.95	48.20	76.58

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qin, Y.; Kou, Z.; Han, C.; Wang, Y. Intelligent Gangue Sorting System Based on Dual-Energy X-ray and Improved YOLOv5 Algorithm. Appl. Sci. 2024, 14, 98. https://doi.org/10.3390/app14010098

AMA Style

Qin Y, Kou Z, Han C, Wang Y. Intelligent Gangue Sorting System Based on Dual-Energy X-ray and Improved YOLOv5 Algorithm. Applied Sciences. 2024; 14(1):98. https://doi.org/10.3390/app14010098

Chicago/Turabian Style

Qin, Yuchen, Ziming Kou, Cong Han, and Yutong Wang. 2024. "Intelligent Gangue Sorting System Based on Dual-Energy X-ray and Improved YOLOv5 Algorithm" Applied Sciences 14, no. 1: 98. https://doi.org/10.3390/app14010098

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Gangue Sorting System Based on Dual-Energy X-ray and Improved YOLOv5 Algorithm

Abstract

1. Introduction

2. Coal Gangue Separation System Based on the DX-YOLOv5 Recognition Model

2.1. Principle of Dual-Energy X-ray Recognition

2.2. Data Collection

3. Improved Gangue Recognition Model Based on YOLOv5

3.1. YOLOv5 Model

3.2. Improvement of YOLOv5 Algorithm

3.2.1. Lightweighting of the Backbone Network

3.2.2. Enhancement of Feature Fusion Capabilities

3.2.3. Optimization of the Detection Head

3.2.4. Modification of the Loss Function

3.3. Enhanced Overall Network Architecture

4. Experiments and Discussions

4.1. Experimental Configuration Framework

4.2. Performance Evaluation

4.3. Ablation Experiment

4.4. Analysis of the Results

4.5. Comparison with Other SOTA Models

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI