YOLO-CSM-Based Component Defect and Foreign Object Detection in Overhead Transmission Lines

Liu, Chunyang; Ma, Lin; Sui, Xin; Guo, Nan; Yang, Fang; Yang, Xiaokang; Huang, Yan; Wang, Xiao

doi:10.3390/electronics13010123

Open AccessArticle

YOLO-CSM-Based Component Defect and Foreign Object Detection in Overhead Transmission Lines

by

Chunyang Liu

^1,2

,

Lin Ma

^1,*,

Xin Sui

^1,3,

Nan Guo

^1,3,

Fang Yang

^1,3,

Xiaokang Yang

¹,

Yan Huang

¹ and

Xiao Wang

¹

School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471003, China

²

Longmen Laboratory, Luoyang 471003, China

³

Key Laboratory of Mechanical Design and Transmission System of Henan Province, Luoyang 471003, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(1), 123; https://doi.org/10.3390/electronics13010123

Submission received: 28 November 2023 / Revised: 25 December 2023 / Accepted: 26 December 2023 / Published: 28 December 2023

Download

Browse Figures

Versions Notes

Abstract

:

Detecting component defects and attaching tiny-scaled foreign objects to the overhead transmission lines are critical to the national grid’s safe operation and power distribution. This urgent task, however, faces challenges, such as the complex working environment and the considerable amount of workforce investment, for which we propose a deep-learning-aided object detection approach, YOLO-CSM, to address the issue. Combined with two attention mechanisms (Swin transformer and CBAM) and an extra detection layer, the proposed model can effectively capture global information and key visual features and promote its ability to identify tiny-scaled defects and distant objects in the visual fields. In order to validate this model, this work consolidates a dataset composed of public images and our field-taken picture samples. The experiment verifies YOLO-CSM as a suitable solution for small and distant object detection that outperforms several well-used algorithms, featuring a 16.3% faster detection speed than YOLOv5 and a 3.3% better detection accuracy than YOLOv7. Finally, this work conducts an interpretability experiment to reveal the similarity between YOLO-CSM’s attention patterns and that of humans, aiming to explain YOLO-CSM’s advantages in detecting small objects and minor defects in the working environments of power transmission lines.

Keywords:

power transmission line inspection; YOLOv7; object and defect detection; swin transformer; CBAM

1. Introduction

Overhead power transmission lines are an essential infrastructure in the modern industrialized world because they facilitate electricity distribution across different regions. However, the safety of power transmission lines is threatened by varying-sized foreign objects and the severe working environment: foreign objects of varying sizes, such as tower cranes, kite lines, wildfire, and other objects that can potentially interfere with or damage the transmission line, pose significant safety hazards to the entire power system [1], and the severe working environment, on the other hand, could lead to the ageing and damage of the components of power transmission lines, such as the breakage of anti-vibration hammers and insulators [2]. Therefore, detecting foreign objects and the defects of transmission line components is essential for the power grid’s safe operation. In addition to manual inspection, the most popular inspection methods for transmission lines are drone inspection [3] and robot inspection [4]. Drone inspection features a relatively low initial investment and flexible deployment. However, it often achieves below-expectation results due to its need for manual operation and being vulnerable to adverse weather conditions [5,6]. Robot inspection, which can move and detect transmission lines without the above issues, offers more accurate inspection and low operating costs [7].

Traditional object detection methods, such as the histogram of oriented gradients (HOG), primarily rely on contours, textures, and colors [8] to locate objects in images. The performance of these methods is highly sensitive to illumination and background interference [9], resulting in limited accuracy, high computational demand, and restricted scalability [10].

With the development of computer vision technology, object detection algorithms based on deep learning (DL) are being applied to transmission line inspection. The object detection techniques are usually region-based convolutional neural network (CNN) algorithms and their variants [11,12,13]. The CNN-based algorithms can detect defects in transmission lines with higher accuracy than traditional algorithms and remarkable computation costs [14,15,16]. Methods using directed bounding box regression [17] or shared CNN parts (SPTL-Net) [18] on top of R-CNN could achieve real-time detection, but at the cost of accuracy (less than 90%) and with a lower real-time performance. In addition, eliminating the models’ proposal generation and feature resampling phases can achieve higher detection speeds [19] but with a significantly lower accuracy when detecting small-sized objects. Moreover, single-stage parallelism and shared convolutional features enable faster detection than CNN families [20,21,22]. These algorithms have been applied to transmission line detection [23,24] with better performance than two-stage models. Some other works have achieved the real-time detection of small objects and defects in complex environments by optimizing YOLOv3 (You only Look Once version 3) [25] and YOLOv4 (You only Look Once version 4) [26,27,28] networks; the fastest speed was 16 frames per second. However, these improvements may still need to be revised as the requirements for transmission line inspections continue to increase. YOLOv5 (You only Look Once version 5) [29,30,31] and YOLOv7 (You only Look Once version 7) [32], which can effectively increase the detection accuracy while ensuring real-time performance, still need to have excellent detection capability of long-range objects and small objects. The primary limitation of the above techniques is their lack of the ability to detect small and distant objects surrounded by complicated backgrounds.

The attention mechanism is a promising direction to address the shortcomings of CNN-based object detection models. The attention mechanism is a complex cognitive function unique to living beings, which helps to focus on trained objects more effectively, even if the object occupies a small percentage of images [33,34]. Incorporating it into a deep learning network enhances the interpretability of the network [35] and improves model performance, especially the detection of small and distant objects. Integrating the attention mechanism in YOLOv5 [36] avoids introducing significant overhead problems by embedding location information into channel attention, enabling the mobile network to cover a larger area and obtain a detection accuracy of 89.1%, or 7.5% higher than YOLOv2, and detecting objects very similar to a human. Wu [37] et al. optimized YOLOX [38] by expanding the model’s perceptual domain and utilizing the attention module, increasing the mean accuracy (mAP) by approximately 4.24% compared to the original algorithm. In addition, using the attention mechanism to suppress useless information in CenterNet can improve the network’s mAP to 96.16% [39].

Therefore, this study integrates the attention mechanism with the YOLOv7 model, the state-of-the-art single-stage object detection model characterized by deeper feature extraction [40], to increase the accuracy of detecting distant foreign objects and minor defects of the components of power transmission lines. The novelties of this paper are as follows:

(1): An improved network named YOLO-CSM introduces the CBAM hybrid attention module and Swin Transformer self-attention module to YOLOv7. The model also adds a small object detection layer to the prediction part to grant the model a better identification ability of small objects;
(2): A dataset containing five types of foreign objects and two types of component defects, with which the proposed method was compared along with the currently most popular models. Based on the comparison, the proposed model exhibits higher accuracy in detecting foreign objects and component defects, for which an interpretability analysis helps to disclose the reason;
(3): The proposed method is compared with the current popular models, and the present method has higher accuracy in the detection of transmission line foreign objects and normal/defective line components. In addition, we perform an interpretability analysis of the model using the Grad-cam method.

The organization of the rest of the paper is as follows: Section 2 presents an overview of the dataset construction process and briefly describes the classes contained. Section 3 describes two attention mechanisms and proposes the YOLO-CSM network. Section 4 presents the relevant comparison and ablation experiments with the interpretability analysis. Section 5 concludes this paper with suggestions for future work.

2. Dataset Construction

Drone-based detection needs to keep a safe distance from the pole tower under inspection (Figure 1a), while the inspection robot walking above power transmission lines can take pictures from an overhead perspective. Due to their varying capture distances and angles, the images captured by two devices for the same object present significant variations; the object occupies a smaller portion of the drone-captured images and is less clear than those in its counterpart. Moreover, the drone-captured image usually includes more complicated background elements, as shown in the RGB color histograms on the lower-left of Figure 1b,c. Therefore, it is necessary to construct a specialized dataset for the robot-based power transmission line inspection.

The quality and variety of the to-be-constructed dataset are essential to the detection performance of the DL-based approach. This paper employed a two-step strategy, including data construction and image enhancement, to construct the final large-scale dataset (Figure 2) of foreign objects and component defects. Foreign objects include large-scale excavators, kite lines, tower cranes, wildfire, and other abnormal objects. When collecting images, the inspection robot can move near to the target and keep a close distance, but considering the subsequent detection of long-distance and small objects, we collected images of the same object at different distances. Such a dataset enables the trained model to detect objects at different distances when detecting them. The component images include insulators and dampers with/without defects, most of which come from a public dataset named China Power Line Insulator Dataset (CPLID), including images of both normal and (synthetic) defective insulators [41].

After consolidating the original images, this paper enlarged the volume of the dataset through offline enhancement approaches (Figure 2), such as salt and pepper noise, Gaussian blur, and affine transforms. The salt and pepper noise method simulates the degradation of image data quality due to the substantial electromagnetic interference in the transmission lines’ severe working environment. The Gaussian blur method simulates the low-clarity images taken in the real world caused by wind and sand. The affine transformation can increase the image variety while keeping the characteristics of the original image. Finally, we manually eliminated the poor-quality images, and the final dataset included 14,391 images, where some images simulated different lighting conditions and shooting angles, aiming to improve the model’s generalization ability and robustness.

3. Modeling and Methodologies

3.1. The Attention Mechanism

The attention mechanism initially appeared in natural language processing, and it has since found widespread application in other areas, including computer vision, reinforcement learning, and machine translation. It dynamically adjusts an object detection model’s attention to various parts of the input data according to their significance, which allows it to focus on the parts or features that are highly correlated to the ongoing task and to improve the detection performance. The attention mechanism assigns each input element with a weight that reflects its contribution to the detection result, where a higher weight signifies a greater significance [42], such as

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(1)

where Q is the query matrix quantifying the model’s attention to the inputs, K is the key matrix representing the importance of various input parts, V is the value matrix holding the output information, and dk is K’s dimension or the column number. The formula mentioned above represents a typical “click attention” mechanism that helps the object detection model to dynamically update the weights by calculating the similarity between query matrix Q and key matrix K. This calculation enables the model to focus on different input parts selectively in various contexts and enhances it in capturing correlations between inputs. Other techniques, such as scaled dot-product attention [43] and Bahdanau attention [44], employ similar strategies to solve the problem.

Therefore, based on the attention mechanism, this work proposes a new object detection model, YOLO-CSM, that integrates YOLOv7 with a Swin transformer self-attention module, a CBAM (the convolutional block attention module) hybrid attention module, and a small object detection layer (Figure 3), as will be explained in the subsequent subsections.

3.2. Swin Transformer Module

The Swin transformer module [45] adopts a hierarchical windowing mechanism to focus on features at different scales. The windowing mechanism divides the feature map into varying-sized regions (windows) and employs several two-layer structures to process them (Figure 4). In a two-layer structure, each layer is equipped with an MLP module and two LayerNorm (LN) layers that normalize the inputs. The two layers vary in that the former layer employs a window-based multi-head self-attention (W-MSA) function, and the latter layer utilizes a shifted window-based multi-head self-attention (SW-MSA) function. Equations (2) and (3) show the computational procedure for the former layer,

{\hat{z}}^{l} = W - M S A (L N (z^{l - 1})) + z^{l - 1}

(2)

z^{l} = M L P (L N ({\hat{z}}^{l})) + {\hat{z}}^{l}

(3)

where

z^{l}

and

{\hat{z}}^{l}

denote the feature outputs of MLP and W-MSA, respectively. Taking the normalized

z^{l - 1}

generated by the before-head LN as the input, the W-MSA module performs a window self-attention operation that adds the result with the origin

z^{l - 1}

to produce output

{\hat{z}}^{l}

. Next, an MLP module processes the normalized

{\hat{z}}^{l}

and adds the

{\hat{z}}^{l}

to produce the current layer’s final output

z^{l}

. W-MSA uses a fixed-size window of attention that does not pan or slide at different locations, indicating a fixed window’s location in the image.

The latter layer, except for the SW-MSA employed, has a similar calculation to the previous one,

{\hat{z}}^{l + 1} = S W - M S A (L N (z^{l})) + z^{l}

(4)

z^{l + 1} = M L P (L N ({\hat{z}}^{l + 1})) + z^{l + 1}

(5)

The coupling of two attention windows allows the model to handle dependencies over longer distances while reducing the computational effort, which is essential for capturing correlations at more distant locations and achieving a more comprehensive understanding of an image’s content. Considering that such an information fusion can extend the model’s perception of the sensory field, we integrated the Swin transformer module into the YOLOv7 network before the network’s up-sampling modules, aiming to combine the global and local information and promote the model’s semantic understanding for object detection [46,47].

3.3. CBAM Attention Module

The convolutional block attention module (CBAM) [48] adaptively adjusts the importance of feature maps from different channels and spatial locations and then multiplies the weights by the original feature map to produce the final feature map (Figure 5). It is verified that the consequent calculation can fully account for the correlation between channels and spaces by comprehensively unrevealing the critical information of feature maps.

As shown in Equation (6), a channel attention module pools the incoming feature maps based on their average or maximum values, then calculates the MLP and weighted summation of the pooling results, and generates the final feature map via a sigmoid activation function. The MLP and weighted summation help the model to adaptively determine the contribution of each pooling method, such as

M_{c} (F) = σ (M L P (A v g P o o l (F)) + M L P (M a x P o o l (F))

(6)

Equation (7) is a calculation process that is similar to the channel attention module. The spatial attention module pools the input feature map to produce two 1 × H × W feature maps, then concatenates the two feature maps into one single feature map via a 7 × 7 convolution, and produces the final spatial-attention-oriented feature map via a sigmoid activation function, such as

M_{s} (F) = σ (f^{7 \times 7} ([A v g P o o l (F); M a x P o o l (F)]))

(7)

Finally, the feature maps generated by the channel attention module and the spatial attention are merged into one feature map before multiplying with the original feature map. The CMAM values are in more significant positions and channels while eliminating irrelevant information through pixel-wise multiplication. Though the CMAM is compatible with CNN architectures and fits in any location of the YOLO-CSM network [49], we integrated it at the junction between the network’s backbone and neck to tune the channel weights to enhance or diminish specific regions within the feature maps. Moreover, the spatial attention module highlights the contrast between objects and the background [40].

3.4. Additional Detection Layer

For deep learning, feature maps obtained from varying convolutional layers feature distinguishing amounts of information concerning the objects. For instance, a feature map produced by an early-stage convolution possesses richer location information but less semantics about the objects. In contrast, feature maps generated at a deeper convolution layer feature a lower resolution but richer semantic information, thus lacking the objects’ location details. As a result, a shallow network is more appropriate for detecting simpler objects, while a deep network performs well in identifying complex objects.

Small objects occupy a minor portion of the image and are characterized by simple features. Usually, detecting small objects depends on shallow features with a small receptive field, for which early YOLO-based research integrated additional layers to their backbone networks to complete the task [50,51,52,53]. For the same reason, the YOLOv7 network equips deeper layers than its precedents. Regarding this, we extracted the early convolutional results and spliced them with the up-sampled results to obtain a new output layer sized 160 × 160. The new layer output frames corresponding to a 4 × 4 receptive field of the input map regarding a smaller receptive field can boost YOLO-CSM to be more sensitive to small objects.

3.5. Loss Function

The YOLO-CSM network utilizes the YOLOv7 network’s total loss function, which consists of three parts, including location loss

l o s s_{b o x}

, confidence loss

l o s s_{o b j}

, and classification loss

l o s s_{c l s}

, as shown in Equations (8)–(11):

l o s s = l o s s_{b o x} + l o s s_{o b j} + l o s s_{c l s}

(8)

l o s s_{b o x} = a_{b o x} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} I_{k i j}^{o b j} L_{C I o U}

(9)

l o s s_{o b j} = a_{o b j} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} I_{k i j}^{o b j} L_{C I o U}

(10)

l o s s_{o b j} = a_{o b j} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} I_{k i j}^{o b j} L_{C I o U}

(11)

In the above equations, k is the output feature map, S is the number of cells, and B is the number of cell anchors.

l o s s_{o b j}

and

l o s s_{c l s}

are calculated using the BCE with logits loss function, where

L_{C I o U}

is calculated as

L_{C I o U} = 1 - I O U + \frac{ρ^{2} (b, b_{g t})}{c^{2}} + a v

(12)

a = \frac{v}{(1 - I_{I O U}) + v}

(13)

v = \frac{4}{π^{2}} {(\arctan \frac{w_{g t}}{h_{g t}} - \arctan \frac{w}{h})}^{2}

(14)

where

b

and

b_{g t}

are the sizes of the prediction boxes and the true boxes, respectively; c denotes the diagonal distance of the smallest closed region that can contain both the prediction frame and the true frame; a denotes the balance parameter; and v measures whether the aspect ratio is consistent.

4. Experiment and Analysis

4.1. Hardware Configuration and Parameter Setting

For the experiments, we used Python 3.9 as the programing language and Pycharm as the integrated development environment. The hardware parameters used in the experiment are shown in Table 1.

4.2. Algorithm Evaluation Metrics

This paper used several typical evaluation metrics for object detection tasks, such as precision (Equation (14)), recall (Equation (15)), F1_score (Equation (16)), and mean average precision (Equation (17)). Also, considering the limited computation capacity of the detection device in the outdoor environment for transmission lines, we added the parametric size of the model to the evaluation metrics. It was calculated for the convolutional layer, fully connected layer, and other layers with parameters in the network structure.

P r e c i s i o n = \frac{T P}{T P + F P}

(15)

R e c a l l = \frac{T P}{T P + F N}

(16)

F 1_{s c o r e} = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(17)

A P = \int_{0}^{1} P r e c i s i o n (R e c a l l) d (R e c a l l)

(18)

m A P = \frac{\sum_{i = 0}^{i = n} A P}{n}

(19)

where TP is the number of positive samples predicted as positive, FP is the number of the negative samples predicted as positive, FN is the number of negative samples predicted as negative, and AP is the average accuracy.

To verify the validity of the above model, we trained YOLO-CSM and compared it to other models, mainly Faster-RCNN, YOLOv5, YOLOv7, and YOLOv8 (You only Look Once version 8). Faster-RCNN is a two-stage detection model different from the YOLO series, YOLOv5 is a more popular and practical model developed in recent years, and YOLOv8 is the latest research version of the YOLO series. We analyzed the loss value of YOLO-CSM during the training process and used the trained model to detect images after the training was completed. In addition, we compared the accuracy, number of parameters, F1_score, and other metrics.

4.3. Training Speed and Final Loss

Figure 6 compares the models’ losses during a 300-round training. According to the comparison, the YOLO-CSM approach, with an additional object detection layer, extracted more features and produced a higher loss than other models at the initial stage of training. Thanks to its integrated attention mechanism, the YOLOv7 enhanced its focus on small features and fitted faster than other models. The loss of YOLO-CSM became smaller than the other models after 30 rounds. The losses of all candidate models were stabilized at around 0.02 as the training ended, where YOLO-CSM achieved the smallest loss, 19.2% smaller than YOLOv7.

4.4. Model Comparisons

Figure 7 and Figure 8 show some of the results when using different models for detection. YOLO-CSM can identify foreign objects and defects in transmission lines, especially for small and distant objects. Compared to the other models, our model had no false recognition or detection miss, and the recognition accuracy was higher than that of the other models when detecting similar objects. Of the three objects normally recognized in Figure 7, YOLO-CSM had the highest detection accuracy for two of them. In Figure 8, of the six objects, YOLOv7, YOLOv8, and YOLO-CSM recognized five, five, and six of them, respectively, which means that YOLO-CSM had the highest detection accuracy rate among all five algorithms. When only considering the detection distance, the distance between the inspection robot and the foreign object in Figure 8 is tens of meters, which ensures a higher detection accuracy with no leakage, compared to the distance of about 4 m in Figure 7.

This section presents a performance comparison between YOLOv7 and the other models (Table 2). Our models had a better performance in precision, recall, F1_score, and mAP. YOLO-CSM (98.9%) improved in terms of precision by 3.3% relative to the original YOLOv7 (95.7%) and 13.3% relative to Faster-RCNN (87.3%), although the number of parameters increased. YOLO-CSM was capable of detecting 66 frames per second on our servers, which was higher than the latest YOLOv8 and the most popular YOLOv5, but still lower than YOLOv7.

4.5. Ablation Experiments

We posit that the integration of two additional attention modules with the convolutional neural network can effectively enhance both the model’s detection accuracy and speed. Therefore, this section aims to validate two primary points:

The evaluation of whether CBAM and Swin transformer surpass other attention mechanisms when integrated with convolutional neural networks. Numerous attentional mechanisms have been proposed to enhance the efficiency of deep learning networks, including the SE channel method (SE) [54], Sim spatial method (Sim) [55], convolutional block attention module (CBAM), and efficient channel attention (ECA) [56]. In this paper, we conducted experiments to discern the effects of combining CNNs with various attention mechanisms.

The delineation of the individual contributions of CBAM and Swin transformer within the model. Through ablation experiments, we analyzed the enhancements yielded by the CBAM module when coupled with optional components, such as the Swin transformer and supplementary detection layers. We introduced various modifications to YOLOv7 and conducted tests using different configurations, with each experimental setup corresponding to a comprehensive collection of evaluation metrics.

4.5.1. Combination of Attention Modules and CNNs

Figure 9 compares the detection accuracies after adding different attention modules to YOLOv7. After integrating SE or Sim with YOLOv7, the detection accuracy was improved for some objects, such as smoke and insulator. However, for objects present at a greater distance, such as the tower crane, both the SE and Sim methods had a greater degree of degradation in accuracy, which may be attributed to the distance and the lack of distinctive feature expression. The average accuracy of the hybrid attention mechanism was higher than that of YOLOv7, but it performed differently when detecting images of objects with different resolutions, i.e., the accuracy was high when detecting high-resolution insulator defects and pylons and low when detecting low-resolution pylons. The contrasting performance can be attributed to the ECA attentional mechanism’s greater emphasis on the relevant channels. CBAM’s accuracy also decreases when recognizing tower cranes, but the decrease is smaller than that of ECA. Therefore, we integrated CBAM’s hybrid attentional mechanism into YOLOv7 to make it focus on both the spatial and channel aspects of the image.

4.5.2. YOLOv7 with Different Modules

The results of the ablation experiment are shown in Table 3. The YOLOv7_Four_layer added feature scaling and increased the detection accuracy by 1.8%, suggesting that it detected small objects more effectively and reduced feature losses of each network layer. YOLOv7_Swin integrated the Swin transformer self-attentive module to YOLOv7 and increased the number of parameters by 18.9% compared to the original network. However, the accuracy of this approach accuracy improved slightly despite its higher recall and F1_score, attributed to the Swin transformer’s relatively slow converging speed (over 300 training rounds). YOLOv7_CBAM, combining the CBAM hybrid attention module with YOLOv7, showed the highest precision of the compared approaches. Moreover, the CBAM module effectively improved the feature extraction capacity against the studied objects, featuring a 1.3% increase in mAP. The YOLOv7_CBAM_Swin and YOLOv7-CSM detection layers incorporated the above modules into the network at the same time with different degrees of improvement, with YOLOv7-CSM having an additional detection layer compared to the former. YOLOv7-CSM had fewer number of parameters than YOLOv7_CBAM_Swin and better detection results than YOLOv7_CBAM_Swin. The YOLOv7-CSM comprehensively had the best performance regarding all evaluation metrics and a 6.7% reduction in parameter number than the standalone Swin transformer module. But, with the addition of CBAM, which optimizes the training network, the accuracy of detection was improved relative to the other strategies, despite the 10.9% increase in the number of parameters compared to YOLOv7.

4.6. Interpretability Analysis

To further understand the advantages of YOLO-CSM in detecting small objects and tiny defects, as well as to explore the mechanism of image perception when the attention module is combined with a CNN, we performed an interpretability analysis of YOLO-CSM. The YOLO-CSM network was back-propagated using Grad-cam [57] and weighted at the last output layer to derive the network’s attention to individual pixels in the image.

As shown in Figure 10, YOLO-CSM focused more accurately on the tower crane in the image’s right part, paying more attention to the transmission lines. In contrast to YOLOv7, which focused on the bottom transmission line, YOLO-CSM paid limited attention to the bottom of the image. This difference suggests that YOLO-CSM demonstrates an ‘associative ability’ to relate trained objects, such as ‘kite string’ and ‘transmission line’, which bear similarities. This kind of recognition is similar to the human observation of objects. In contrast, YOLOv7 paid attention to the following transmission line, but our model did not generate enough attention, suggesting that YOLO-CSM may pay less attention to some objects that cannot be clearly identified.

In Figure 11, YOLO-CSM paid more attention to the overall insulator string and its minor defects than to other parts, compared to YOLOv7′s paying limited attention to the partial insulator string. Additionally, YOLO-CSM focused on two distant insulator strings that YOLOv7 ignored.

In summary, the combination of the two attentional mechanisms with the CNN helped the model to distinguish between the foreground and background and focused more on trained objects, such as cranes and kite lines. This also afforded YOLO-CSM a higher accuracy in detecting small objects than in detecting large objects. The small object detection layer generated enough attention. It allowed the YOLO-CSM to detect small, distant objects, contributing to the rich features extracted for small objects, such as insulator breakages and being missing.

5. Conclusions

This paper constructed a dataset of transmission line defects and foreign objects, containing common transmission line defects as well as foreign objects that pose a great threat to transmission lines. For the detection model, the CBAM and Swin transformer attention mechanisms were added to YOLOv7, which effectively improved the focus of the model on the object undergoing testing, making the model more focused on the object undergoing testing itself rather than on the background or other parts, and an additional detection layer was added to improve the accuracy of small object detection. Through experimental validation, the improved model achieved an F1_score of 0.977 and a mAP (0.5) of 0.989, which are higher than the other object detection models, including YOLOv7 and YOLOv8. However, due to the increase in the attention module, the detection frame rate was reduced by 7.04% and the number of model parameters increased by 10.9% compared to the currently fastest YOlOv7. We also designed several sets of experiments to resolve the role of the attention module and the small object detection layer in the network through ablation experiments. When performing interpretability analysis on the model, we found that YOLO-CSM was better able to focus on the focal part of the image, i.e., the object being detected, further validating the advanced nature of YOLO-CSM.

In future research, we need to further collect other metal tools encountered during the inspection process, such as spacer bars, bird guards, and protective sleeves, to improve the generalization capability of the model. In the study of threatening foreign objects on transmission lines, wildfires are more likely to cause significant damage to transmission lines; thus, more attention should also be paid to the detection of wildfires. In addition, the light weight of the model is the next major research direction, which can solve the problem of insufficient arithmetic power in embedded systems resulting in insufficient real-time detection. Despite the significant improvement in detection capabilities, more testing in real-world scenarios is still needed. Edge devices usually use lower power processors and sensors, which may negatively affect the detection accuracy and do not guarantee real-time detection. Therefore, to apply the proposed method in real-world scenarios, the next phase of research is needed. Our future plan involves shifting our focus to discuss the impact of external factors and hardware devices on the detection performance of deep learning models.

Author Contributions

Conceptualization, C.L. and L.M.; methodology, L.M.; validation, C.L., X.S. and N.G.; formal analysis, F.Y.; investigation, X.Y.; data curation, Y.H.; writing—original draft preparation, L.M.; writing—review and editing, C.L. and X.S.; project administration, X.W. and X.S.; funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (No. 2020YFB2009602); Major Science and Technology Projects of Longmen Laboratory (No. 231100220500); and 2022 Henan Provincial Science and Technology Tackling Program Projects (No. 222102220079).

Data Availability Statement

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yu, N. Study on Resource Management of Electric Oriented Fiber Cable. In Proceedings of the 2018 IEEE International Conference of Safety Produce Informatization (IICSPI), Chongqing, China, 10–12 December 2018; pp. 169–172. [Google Scholar]
Zhang, K.; Huang, W. Defect detection of anti-vibration hammer based on improved faster R-CNN. In Proceedings of the 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), Hefei, China, 25–27 September 2020; pp. 889–893. [Google Scholar]
Li, Z.; Zhang, Y.; Wu, H.; Suzuki, S.; Namiki, A.; Wang, W. Design and Application of a UAV Autonomous Inspection System for High-Voltage Power Transmission Lines. Remote Sens. 2023, 15, 865. [Google Scholar] [CrossRef]
Alhassan, A.B.; Zhang, X.; Shen, H.; Xu, H. Power transmission line inspection robots: A review, trends and challenges for future research. Int. J. Electr. Power Energy Syst. 2020, 118, 105862. [Google Scholar] [CrossRef]
Jenssen, R.; Roverso, D. Automatic autonomous vision-based power line inspection: A review of current status and the potential role of deep learning. Int. J. Electr. Power Energy Syst. 2018, 99, 107–120. [Google Scholar]
Li, X.; Li, Z.; Wang, H.; Li, W. Unmanned aerial vehicle for transmission line inspection: Status, standardization, and perspectives. Front. Energy Res. 2021, 9, 713634. [Google Scholar] [CrossRef]
Gonçalves, R.S.; Souza, F.C.; Homma, R.Z.; Sudbrack, D.E.T.; Trautmann, P.V.; Clasen, B.C. Robots for Inspection and Maintenance of power transmission lines. In Robot Design: From Theory to Service Applications; Springer International Publishing: Cham, Switzerland, 2022; pp. 119–142. [Google Scholar]
Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
Mao, T.; Ren, L.; Yuan, F.; Li, C.; Zhang, L.; Zhang, M.; Chen, Y. Defect recognition method based on HOG and SVM for drone inspection images of power transmission line. In Proceedings of the 2019 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS), Shenzhen, China, 9–11 May 2019; pp. 254–257. [Google Scholar]
Yan, T.; Yang, G.; Yu, J. Feature fusion based insulator detection for aerial inspection. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; pp. 10972–10977. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2015. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Liu, Y.; Qing, Y.; Wang, C.; Lan, T.; Yao, R. Detection of insulator defects with improved ResNeSt and region proposal network. IEEE Access 2020, 8, 184841–184850. [Google Scholar] [CrossRef]
Zheng, X.; Jia, R.; Gong, L.; Zhang, G.; Dang, J. Component identification and defect detection in transmission lines based on deep learning. J. Intell. Fuzzy Syst. 2021, 40, 3147–3158. [Google Scholar] [CrossRef]
Deng, F.; Zeng, Z.; Mao, W.; Wei, B.; Li, Z. A Novel Transmission Line Defect Detection Method Based on Adaptive Federated Learning. IEEE Trans. Instrum. Meas. 2023, 72, 1–12. [Google Scholar] [CrossRef]
Zhu, J.; Guo, Y.; Yue, F.; Yuan, H.; Yang, A.; Wang, X.; Rong, M. A deep learning method to detect foreign objects for inspecting power transmission lines. IEEE Access 2020, 8, 94065–94075. [Google Scholar] [CrossRef]
Zhang, W.; Liu, X.; Yuan, J.; Xu, L.; Sun, H.; Zhou, J.; Liu, X. RCNN-based foreign object detection for securing power transmission lines (RCNN4SPTL). Procedia Comput. Sci. 2019, 147, 331–337. [Google Scholar] [CrossRef]
Wang, B.; Wu, R.; Zheng, Z.; Zhang, W.; Guo, J. Study on the method of transmission line foreign body detection based on deep learning. In Proceedings of the 2017 IEEE Conference on Energy Internet and Energy System Integration (EI2), Beijing, China, 26–28 November 2017; pp. 1–5. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Wang, W.; Wang, Z.; Liu, B.; Yang, Y.; Sun, X. Typical defect detection technology of transmission line based on deep learning. In Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; pp. 1185–1189. [Google Scholar]
Souza, B.J.; Stefenon, S.F.; Singh, G.; Freire, R.Z. Hybrid-YOLO for classification of insulators defects in transmission lines based on UAV. Int. J. Electr. Power Energy Syst. 2023, 148, 108982. [Google Scholar] [CrossRef]
Xia, P.; Yin, J.; He, J.; Gu, L.; Yang, K. Neural detection of foreign objects for transmission lines in power systems. J. Phys. Conf. Ser. 2019, 1267, 012043. [Google Scholar] [CrossRef]
Liu, Z.; Wu, G.; He, W.; Fan, F.; Ye, X. Key target and defect detection of high-voltage power transmission lines with deep learning. Int. J. Electr. Power Energy Syst. 2022, 142, 108277. [Google Scholar] [CrossRef]
Shan, H.; Song, Y.; Wang, H.; Chen, Y. Research on Efficient Detection Method of Foreign Objects on Transmission Lines Based on Improved YOLOv4 Network. J. Phys. Conf. Ser. 2022, 2404, 012040. [Google Scholar] [CrossRef]
Song, Y.; Zhou, Z.; Li, Q.; Chen, Y.; Xiang, P.; Yu, Q.; Zhang, L.; Lu, Y. Intrusion detection of foreign objects in high-voltage lines based on YOLOv4. In Proceedings of the 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 9–11 April 2021; pp. 1295–1300. [Google Scholar]
Wang, Q.; Si, G.; Qu, K.; Gong, J.; Cui, L. Transmission line foreign body fault detection using multi-feature fusion based on modified YOLOv5. J. Phys. Conf. Ser. 2022, 2320, 012028. [Google Scholar] [CrossRef]
Feng, Z.; Guo, L.; Huang, D.; Li, R. Electrical insulator defects detection method based on yolov5. In Proceedings of the 2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS), Suzhou, China, 14–16 May 2021; pp. 979–984. [Google Scholar]
Huang, Y.; Jiang, L.; Han, T.; Xu, S.; Liu, Y.; Fu, J. High-Accuracy Insulator Defect Detection for Overhead Transmission Lines Based on Improved YOLOv5. Appl. Sci. 2022, 12, 12682. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Rensink, R.A. The dynamic representation of scenes. Vis. Cogn. 2000, 7, 17–42. [Google Scholar] [CrossRef]
Corbetta, M.; Shulman, G.L. Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 2002, 3, 201–215. [Google Scholar] [CrossRef] [PubMed]
Li, L.H.; Yatskar, M.; Yin, D.; Hsieh, C.J.; Chang, K.W. What does BERT with vision look at? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 5265–5275. [Google Scholar]
Bao, W.; Du, X.; Wang, N.; Yuan, M.; Yang, X. A Defect Detection Method Based on BC-YOLO for Transmission Line Components in UAV Remote Sensing Images. Remote Sens. 2022, 14, 5176. [Google Scholar] [CrossRef]
Wu, M.; Guo, L.; Chen, R.; Du, W.; Wang, J.; Liu, M.; Kong, X.; Tang, J. Improved YOLOX Foreign Object Detection Algorithm for Transmission Lines. Wirel. Commun. Mob. Comput. 2022, 2022, 5835693. [Google Scholar] [CrossRef]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Wu, C.; Ma, X.; Kong, X.; Zhu, H. Research on insulator defect detection algorithm of transmission line based on CenterNet. PLoS ONE 2021, 16, e0255135. [Google Scholar] [CrossRef] [PubMed]
Jiang, K.; Xie, T.; Yan, R.; Wen, X.; Li, D.; Jiang, H.; Jiang, N.; Feng, L.; Duan, X.; Wang, J. An attention mechanism-improved YOLOv7 object detection algorithm for hemp duck count estimation. Agriculture 2022, 12, 1659. [Google Scholar] [CrossRef]
Tao, X.; Zhang, D.; Wang, Z.; Liu, X.; Zhang, H.; Xu, D. Detection of power line insulator defects using aerial images analyzed with convolutional neural networks. IEEE Trans. Syst. Man Cybern. Syst. 2018, 50, 1486–1498. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
Du, Y.; Pei, B.; Zhao, X.; Ji, J. Deep scaled dot-product attention based domain adaptation model for biomedical question answering. Methods 2020, 173, 69–74. [Google Scholar] [CrossRef]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
Li, R.; Shen, Y. YOLOSR-IST: A deep learning method for small target detection in infrared remote sensing images based on super-resolution and YOLO. Signal Process. 2023, 208, 108962. [Google Scholar] [CrossRef]
Dai, Y.; Liu, W.; Wang, H.; Xie, W.; Long, K. YOLO-Former: Marrying YOLO and Transformer for Foreign Object Detection. IEEE Trans. Instrum. Meas. 2022, 71, 1–14. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Zhao, L.; Zhu, M. MS-YOLOv7:YOLOv7 Based on Multi-Scale for Object Detection on UAV Aerial Photography. Drones 2023, 7, 188. [Google Scholar] [CrossRef]
Yue, X.; Wang, Q.; He, L.; Li, Y.; Tang, D. Research on tiny f detection technology of fabric defects based on improved Yolo. Appl. Sci. 2022, 12, 6823. [Google Scholar] [CrossRef]
Xu, Y.; Zhang, K.; Wang, L. Metal surface defect detection using modified YOLO. Algorithms 2021, 14, 257. [Google Scholar] [CrossRef]
Liu, Y.; He, G.; Wang, Z.; Li, W.; Huang, H. NRT-YOLO: Improved YOLOv5 based on nested residual transformer for tiny remote sensing object detection. Sensors 2022, 22, 4953. [Google Scholar] [CrossRef]
Gong, H.; Mu, T.; Li, Q.; Dai, H.; Li, C.; He, Z.; Wang, W.; Han, F.; Tuniyazi, A.; Li, H.; et al. Swin-transformer-enabled YOLOv5 with attention mechanism for small object detection on satellite images. Remote Sens. 2022, 14, 2861. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Virtual Event, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]

Figure 1. Demonstration of two image capturing approaches: (a) various inspection positions and (b,c) different background complexities.

Figure 2. Dataset construction process.

Figure 3. YOLO-CSM network structure.

Figure 4. Swin transformer block.

Figure 5. CBAM.

Figure 6. Comparison in the training loss of the candidate models.

Figure 7. Effect of fixture and defect detection: (a) original image, (b) Faster-RCNN, (c) YOLOv5, (d) YOLOv7, (e) YOLOv8, and (f) YOLO-CSM.

Figure 8. Foreign body detection effect: (a) original image, (b) Faster-RCNN, (c) YOLOv5, (d) YOLOv7, (e) YOLOv8, and (f) YOLO-CSM.

Figure 9. Comparison of the different attention spans.

Figure 10. Analysis of the heat map for foreign body detection: (a) original figure, (b) YOLOv7, and (c) YOLO-CSM.

Figure 11. Fixture and defect detection heat map analysis: (a) original figure, (b) YOLOv7, and (c) YOLO-CSM.

Table 1. Experimental hardware.

Platform	Configuration
CPU model	Intel Xeon Silver 4210
GPU model	GeForce RTX 2080
Memory	32G
Operating system	Ubuntu 18.04
GPU accelerator	CUDA 10.2

Table 2. Comparison of the different models.

Model	Precision	Recall	F1_Score	mAP		Parameter Numbers	FPS
Model	Precision	Recall	F1_Score	0.5	0.5:0.95	Parameter Numbers	FPS
Faster-RCNN	0.842	0.810	0.826	0.873	0.537	——	19
YOLOv5s	0.897	0.879	0.888	0.926	0.750	7089004	55
YOLOv7	0.949	0.928	0.939	0.957	0.772	36512236	71
YOLOv8	0.980	0.956	0.968	0.959	0.841	3007013	64
YOLO-CSM	0.987	0.968	0.977	0.989	0.818	40515194	66

Table 3. Evaluation results of integrating each module with YOLOv7.

Model	Precision	Recall	F1_Score	mAP (0.5)	Parameter Numbers
YOLOv7	0.949	0.928	0.939	0.957	36,512,236
YOLOv7_Four_layers	0.971	0.915	0.942	0.975	37,057,952
YOLOv7_Swin	0.960	0.970	0.965	0.959	43,423,126
YOLOv7_CBAM	0.981	0.951	0.966	0.970	37,787,154
YOLOv7_CBAM_Swin	0.970	0.946	0.958	0.980	44,698,044
YOLOv7-CSM	0.987	0.968	0.977	0.989	40,515,194

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, C.; Ma, L.; Sui, X.; Guo, N.; Yang, F.; Yang, X.; Huang, Y.; Wang, X. YOLO-CSM-Based Component Defect and Foreign Object Detection in Overhead Transmission Lines. Electronics 2024, 13, 123. https://doi.org/10.3390/electronics13010123

AMA Style

Liu C, Ma L, Sui X, Guo N, Yang F, Yang X, Huang Y, Wang X. YOLO-CSM-Based Component Defect and Foreign Object Detection in Overhead Transmission Lines. Electronics. 2024; 13(1):123. https://doi.org/10.3390/electronics13010123

Chicago/Turabian Style

Liu, Chunyang, Lin Ma, Xin Sui, Nan Guo, Fang Yang, Xiaokang Yang, Yan Huang, and Xiao Wang. 2024. "YOLO-CSM-Based Component Defect and Foreign Object Detection in Overhead Transmission Lines" Electronics 13, no. 1: 123. https://doi.org/10.3390/electronics13010123

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLO-CSM-Based Component Defect and Foreign Object Detection in Overhead Transmission Lines

Abstract

1. Introduction

2. Dataset Construction

3. Modeling and Methodologies

3.1. The Attention Mechanism

3.2. Swin Transformer Module

3.3. CBAM Attention Module

3.4. Additional Detection Layer

3.5. Loss Function

4. Experiment and Analysis

4.1. Hardware Configuration and Parameter Setting

4.2. Algorithm Evaluation Metrics

4.3. Training Speed and Final Loss

4.4. Model Comparisons

4.5. Ablation Experiments

4.5.1. Combination of Attention Modules and CNNs

4.5.2. YOLOv7 with Different Modules

4.6. Interpretability Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI