Next Article in Journal
Analysis of Bifurcation Vibrations of an Industrial Robot Arm System with Joints Compliance
Previous Article in Journal
Study on the Vibration and Sound Radiation Performance of Micro-Perforated Laminated Cylindrical Shells
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bird Detection on Power Transmission Lines Based on Improved YOLOv7

1
College of Computer and Information Technology, Three Gorges University, Yichang 443002, China
2
Three Gorges Polytechnic, Yichang 443000, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(21), 11940; https://doi.org/10.3390/app132111940
Submission received: 16 September 2023 / Revised: 26 October 2023 / Accepted: 27 October 2023 / Published: 31 October 2023

Abstract

:
The safety of transmission lines is essential for ensuring the secure and dependable operation of the power grid. However, the harm caused by birds to transmission lines poses a direct threat to their safe operation. The main challenges in detecting birds on lines is that the detected targets are small, densely packed, and susceptible to environmental interference. We introduce a novel dynamic convolutional kernel specifically designed for detecting small and densely packed targets, the ODconv in the backbone of YOLOv7, to capture richer contextual information and improve performance. The substitution of Alpha_GIoU for CIoU in the original YOLOv7 network model serves to refine the loss function, decrease its parameters, and bolster the network’s resilience. The results confirmed that the proposed YOLOv7 with ODConv reached mAP0.5, mAP0.5:0.95, and precision of up to 78.42%, 46.14%, and 73.56% respectively. In contrast to the base model, the enhanced model demonstrated a 2.58% rise in mAP0.5, a 0.72% improvement in mAP0.5:0.95, and an increased precision of 2.34%.

1. Introduction

1.1. Motivation

Power supply is crucial to our daily lives [1]. However, traditional power transmission lines pose environmental challenges and risks [2], particularly in their interaction with birds [3,4,5]. The contact between birds and power transmission lines is an important environmental issue as it involves a balance between energy stability and environmental protection [6]. Birds can not only be injured or killed due to contact with power lines but such contact can also lead to line faults and impact energy supply. According to relevant statistical analysis, bird-related activities account for the third highest number of power transmission line trips among all fault trips in the national power grid, following only lightning and external damages [7,8]. A special survey conducted by IEEE in 1990 revealed that one-quarter of power transmission line operational failures in the United States can be attributed to bird-related issues, and 86% of substation equipment failures are related to bird nesting activities [9]. Therefore, to achieve sustainable energy development, an efficient and reliable method is needed to detect and monitor the presence of birds on power transmission lines.

1.2. Related Work

In recent times, there has been a growing interest among researchers in machine vision-based object detection technologies. Currently, object identification and localization algorithms could be broadly divided into two main categories. The initial group, as demonstrated by Faster-RCNN, falls under the label of a “two-stage algorithm” [10]. The second category, represented by SSD and the YOLO series, is referred to as the one-stage detection algorithm [11,12]. Within the two-stage methodology, the fundamental idea is to first identify potential regions in the input image and then process these regions through network-based classification and regression. These algorithms offer the advantage of high accuracy, but their drawback is the slower detection speed due to increased computational complexity. In contrast, one-stage detection algorithms dispense with the need for candidate region extraction and instead directly perform regression predictions on each feature map of the input image. However, due to the following reasons, they still struggle to detect small objects. Taking YOLOv7 as an example, when extracting features using CSPDarknet-53 as the backbone network [13], the final feature map has a relatively small size (approximately 1/32 of the input size), resulting in low resolution and a larger receptive field for each pixel in the feature map. This coarseness leads to a poor localization ability for small objects. For instance, a small object may have a size of 32 × 32 pixels when input to the network, but it is represented by only 1 × 1 pixels in the feature map, posing significant challenges in localization. As a result, common optimization methods for small object detection involve using high-resolution images or feature maps. However, both of these approaches come with expensive computational requirements. Furthermore, the challenges in image acquisition facilities make it challenging to obtain high-definition images of birds in close proximity to power transmission lines.
In a study presented in reference [14], a bird detection algorithm based on YOLOv3 for power transmission lines was introduced. To enhance the algorithm’s capability to detect small objects, a feature fusion detection layer was established by upsampling the 52 × 52 scale feature map by a factor of two and combining it with the output of the second residual block. Additionally, by refining the detection confidence scores of the bounding boxes with associated scale factors, optimizations were made to the non-maximum suppression (NMS) algorithm, resulting in an overall enhancement of the model’s capability to detect partially concealed birds. However, the algorithm’s performance in identifying small and densely clustered targets still remained suboptimal.
A bird detection method in natural scenes was proposed in [15]. Image data were acquired through the transmission line monitoring device. Residual modules were employed to extract deep-level features from the images. A multi-scale object detection strategy was utilized to ensure effective bird detection. However, its efficacy diminished significantly in the presence of substantial background interference.

2. YOLOv7 Model

The YOLOv7 model, developed in 2022 by Chien-Yao Wang, Alexey Bochkovskiy, and their team [16], implements a range of strategies to enhance its performance. These strategies include E-ELAN (Efficient Layer Aggregation Network) [17], model scaling through the serial connection of models [18], and model reparameterization. These techniques are applied to strike a favorable balance between detection efficiency and accuracy [19]. The YOLOv7 architecture comprises three separate modules: the input module, the backbone module, and the head module, as visualized in Figure 1, offering a glimpse into the network’s overall structure.
By incorporating tactics such as E-ELAN, model scaling via concatenation-based models, and convolutional reparameterization, this algorithm successfully strikes an impressive equilibrium between detection efficiency and accuracy. The YOLOv7 network is structured around four essential modules: input, backbone, head, and prediction.
The input module standardizes the input image to a uniform pixel size, ensuring compatibility with the input size requirements of the backbone network. Within the backbone module, various convolutional layers are utilized, encompassing CBSConv, E-ELAN, and MPConv layers. CBSConv layers combine convolutional layers, batch normalization (BN) layers, and LeakyReLU activation functions to capture features across various image scales [20]. E-ELAN convolutional layers preserve the original ELAN design, enabling diverse feature learning by guiding computation blocks within various feature groups. This augmentation enhances the network’s learning capacity without disrupting the original gradient pathways. MPConv convolutional layers augment CBSConv layers by adding Maxpool layers to create upper and lower branches. The upper branch reduces image size by half via Maxpool and reduces image channels by half through CBSConv layers. Meanwhile, the lower branch reduces image channels by half via the first CBSConv layer, decreases image size by half using the second CBSConv layer, and ultimately employs the Cat operation to combine features extracted from the upper and lower branches. This procedure augments the network’s ability to extract features.
The head module incorporates the Path Aggregation Feature Pyramid Network (PAFPN) structure [21], which introduces bottom-up pathways, enabling the transfer of low-level information to higher levels and effectively consolidating features across various levels. In the prediction module, the image channel count for the three different feature scales generated by PAFPN is adapted using the REP (RepVGGBlock) structure. Subsequently, it utilizes 1 × 1 convolutions to make predictions related to confidence, class, and anchor boxes.
The generated three network feature layers are repeatedly fused and extracted. Finally, feature maps of sizes 20 × 20, 40 × 40, and 80 × 80 are generated, which are utilized for the detection of objects of different sizes in the image, including large, medium, and small items, respectively. With these generated feature maps, YOLOv7 follows the category prediction approach established in the YOLO series. Firstly, the feature maps are divided into S × S   grids, and each grid is responsible for detecting objects whose center points fall within that grid. Three bounding box predictions are calculated for each grid, and each prediction consists of five parameters: the coordinates of the box center (x, y), the box’s width and height (w, h), and the confidence score of the prediction. The confidence score represents the probability of whether an object exists inside the predicted box. The structure of YOLOv7 is shown in Figure 1.

3. Enhanced YOLO v7 Model

3.1. Our Work

To tackle the aforementioned concerns, this paper presents a range of enhancements to YOLOv7.
  • We introduce the ODConv in the backbone of YOLOv7. This method retains a substantial portion of spatial information, thereby enhancing the network’s sensitivity to small-scale targets without inflating the model’s parameter size. The ODConv module also helps mitigate the impact of noise.
  • Additionally, to address the issue of unstable loss function convergence during small object detection, this paper adopts the Alpha_GIoU function, thereby improving the network’s robustness concerning target sizes.
  • Finally, a series of data augmentation such as mixup and multi-angle rotation are added to the dataset for model training to mitigate the impact of insufficient data.

3.2. A New Dynamic Convolutional Kernels of ODConv

Although the YOLO series has made substantial advancements in speed and accuracy, it does have certain limitations, particularly in detecting small objects:
Inaccurate Localization of Small Objects: YOLO algorithms use a grid of multiple scales for object detection. However, for small-sized objects, the receptive field of these grid cells is often too large, making it difficult to accurately localize the bounding boxes of small objects.
Limited Feature Representation Capability: Small objects typically have fewer pixels, resulting in a less robust feature representation. YOLO algorithms employ pooling layers and convolutions with larger strides, which may result in the loss of finer object details, impacting the capability to detect small objects.
Class Imbalance in Training Data: Since larger objects are easier to detect, datasets often contain more samples of larger objects. This causes the algorithm to focus more on large objects during training, resulting in fewer training samples for small objects and impacting the detection performance.
Traditional convolutional neural networks utilize static convolutional kernels, while dynamic convolutional methods employ linear combinations of kernel weights to achieve attention-weighted input data.
Dynamic convolutional kernels can autonomously adjust their receptive field size based on the content and context of the input image. This facilitates a more favorable equilibrium among features of varying scales, ultimately resulting in enhanced accuracy and resilience in feature extraction. It also allows slight variations in their shapes at each location, making them better suited to handle non-rigid deformations or distortions present in the image. This strengthens the model’s ability to remain robust in the face of variations in the positioning of target objects within the image. Unlike traditional convolutional kernels that use the same weights across the entire input, dynamic convolutional kernels generate different weights at each position dynamically. This leads to a notable reduction in the quantity of parameters requiring learning, which, in turn, leads to decreased computational complexity and memory utilization. Dynamic convolutional kernels can adjust their parameters to capture features at diverse scales, offering benefits for a range of computer vision tasks, including object detection and segmentation. Also, dynamic convolutional kernels can flexibly adapt the kernel’s channel count, thereby enhancing the model’s versatility and expressive capability.
This approach significantly improves the accuracy of lightweight convolutional neural networks while maintaining high-speed inference. However, existing dynamic convolutional methods such as CondConv and DyConv focus only on the dynamicity of kernel numbers [22,23], neglecting the dynamicity of spatial dimensions, input channels, and output channels. The structural diagram of DyConv is shown in Figure 2. From a mathematical perspective, dynamic convolution operations can be formally expressed as Equation (1).
Explation: “*” stands for addition and “*” stands for multiplication
y = ( α w 1 W 1 + + α w n W n ) x
In light of this, Chao Li et al. proposed ODConv [24], a full-dimensional dynamic convolution method based on SE attention. ODConv can be described according to Equation (2).
y = ( α w 1 α f 1 α c 1 α s 1 W 1 + + α w n α f n α c n α s n W n ) x  
Among them, α w i represents the attention scalar of convolutional kernel W i ,  α s i R k × k , α c i R c i n , and α f i R c o u t represent the three newly introduced attentions along the spatial dimension, input channel dimension, and output channel dimension, respectively. These four attentions are computed using a multi-head attention module denoted as   π i ( x ) [25].
As illustrated in Figure 3, within ODConv, for the convolutional kernel   W i , α s i allocates distinct attention values to the convolutional parameters at various positions in the k × k spatial domain, α c i assigns varying attention values to the convolutional filters for different input channels; α f i allocates diverse values to the entire convolutional kernels; and α w i assigns different values to the overall convolutional kernels, as showcased in Figure 3.
In principle, these four types of attention are complementary. By progressively multiplying different attention values along the dimensions of position, channel, filter, and kernel of the convolution W i , the convolution operation becomes sensitive to variations in different dimensions of the input, providing better performance in capturing rich contextual information. Hence, ODConv notably augments the feature extraction potential of convolutional operations.

3.3. Alpha_GIoU Loss Function

A l p h a _ G I o U is a loss function employed in object detection tasks to assess the alignment between a predicted bounding box and a target bounding box, accounting for both positional disparities and size variations [26]. This loss function represents an enhanced iteration of the Intersection over Union ( I o U ) loss function and introduces a balancing parameter, alpha, to fine-tune the trade-off between I o U and Generalized IoU ( G I o U ). Utilizing the A l p h a _ G I o U loss function in the training of object detection models enhances the localization and classification accuracy of object bounding boxes.
Here is the mathematical expression of the A l p h a _ G I o U loss function:
A l p h a _ G I o U = I o U ( C U ) / C
where:
-
I o U quantifies the relationship between the predicted bounding box and the target bounding box by typically representing the ratio of the intersection area of these bounding boxes to their union area.
-
C represents the area of the smallest convex polygon encompassing both bounding boxes.
-
U denotes the area encompassed by the union of the two bounding boxes.
-
A l p h a _ G I o U signifies the value of the A l p h a _ G I o U loss function.
A notable feature of the A l p h a _ G I o U loss function is its consideration of both positional and size disparities between the predicted and target bounding boxes. It penalizes size deviations through the term associated with C . When alpha is set to 0, the A l p h a _ G I o U loss function simplifies to the traditional I o U loss function. In contrast, when alpha is set to 1, the A l p h a _ G I o U loss function transforms into the G I o U loss function, which strikes a balance between positional and size deviations. Thus, by adjusting the alpha value, one can achieve an appropriate trade-off between position and size, leading to improved object detection performance.
When α = 1 , the A l p h a _ G I o U loss function is the original G I o U loss function. Hence, in this investigation, the A l p h a _ G I o U loss function is selected to substitute the C I o U loss function in the baseline model, with the power parameter α set to 3.

4. Experimentation and Analysis

4.1. Experimental Environment and Hyperparameters

All experiments in this paper were carried out within a consistent experimental environment, and all the algorithms were trained using PyTorch. The training process consisted of 300 iterations, with weights saved every 50 iterations. The standard input size for training images was set to 640 × 640 pixels. A batch size of 8 was used to ensure the stability of the model. During the initial 3 epochs, a warm-up learning strategy was implemented, commencing with an initial learning rate of 0.01. Following that, a cosine annealing algorithm was applied to regulate the learning rate decay, utilizing a hyperparameter lrf set to 0.2 and a minimum learning rate of 0.002.
The image experiments were conducted on a system running Windows 11 Professional Edition. The experimental platform consisted of an Intel(R) Core(TM) i5-10400F CPU and 16 GB of RAM. The algorithm was trained and predicted using a NVIDIA GeForce RTX 3060 GPU with 12 GB of VRAM. The software used for running the program was PyCharm 2022.2.3, with Python version 3.10 and PyTorch version 2.0.1. 2.2.

4.2. Experimental Dataset

The COCO (Common Objects in Context) dataset is a widely recognized benchmark in computer vision, serving as a comprehensive collection of images for object detection, segmentation, and image captioning tasks. Comprising over 200,000 images with diverse scenes, COCO features a rich array of object categories in complex real-world settings, making it an essential resource for developing and evaluating advanced algorithms in understanding image content, particularly in the context of intricate scenes and objects of varying scales. The annotations encompass accurate bounding boxes, segmentation masks, and descriptive captions, thereby equipping researchers with top-quality data to propel advancements and innovation in the realm of computer vision.
In this article, the experiments were conducted using a self-created dataset, which was compiled by selecting images with bird-related labels from the COCO dataset and combining them with pictures captured using cameras and other mobile devices near high-voltage power lines.
The augmentation techniques employed on the training dataset, which includes 3317 images, are crucial for enhancing the model’s capacity to generalize and reducing the potential for overfitting. The augmentation methods encompass a spectrum of transformations, each geared towards imbuing the dataset with greater diversity. These transformations include:
Mixup: Mixup is a data augmentation technique that expands a dataset by creating new samples through linear combinations of samples from multiple different datasets. The core principle of mixup involves linearly combining two different images based on a certain ratio to generate new samples, with the labels of the new samples also obtained through linear combinations.
Multi-Angle Rotation: As explained earlier, multi-angle rotation involves applying rotations to images at various angles (e.g., 0°, 90°, 180°, and 270°) to augment the training dataset. This helps the model become more robust against different object orientations.
Image Brightness Adjustment: This technique involves changing the brightness of an image by scaling the pixel values. By adjusting the brightness, the model learns to recognize objects in different lighting conditions, making it more adaptable to varying environments.
The augmented training set ultimately comprises 6393 images, encompassing both 3317 original images and an additional 3076 images that have undergone enhancement. A visual representation detailing the distribution of this augmented training set is depicted in Figure 4.

4.3. Evaluation Metrics

To quantitatively assess the outcomes of the experiments in this paper, the performance evaluation metrics encompass mAP@0.5, mAP@0.5:0.95, and precision (P).
-
mAP@0.5 signifies the average detection accuracy at an Intersection over Union (IOU) [13] threshold of 0.5 for all object categories. It reflects the algorithm’s accuracy in detecting various object categories.
-
mAP@0.5:0.95 represents the mean average precision over a range of 10 IOU thresholds, spanning from 0.5 to 0.95, with intervals of 0.05. This metric provides a comprehensive assessment of detection accuracy across various IOU thresholds.
-
Precision (P) represents the ratio of true positive detections to the total number of detections, indicating the algorithm’s precision in correctly identifying bird objects.
These evaluation metrics collectively provide a thorough assessment of our model’s performance in bird detection on power transmission lines.

4.4. Introduce the ODConv in the Backbone of YOLOv7

We introduced the ODConv module into the YOLOv7 network architecture, as illustrated in Figure 5. The aim of this module is to enhance the feature extraction capabilities of the base network, ultimately resulting in enhanced performance. However, upon integration into the backbone network, the ODConv module inadvertently disrupted certain pre-existing weights, resulting in prediction errors within the network. In light of this issue, we strategically positioned the ODConv within the feature enhancement segment of the network’s extraction process, aiming to enhance feature extraction without compromising the integrity of the original network extraction.

4.5. Evaluate the Impact of ODConv on Model Accuracy

To assess the impact of the ODConv, we conducted ablation experiments by substituting SE and ECA modules for the ODConv modules. The SE module primarily encompasses squeeze and excitation operations.
The squeeze and excitation (SE) mechanism is a dynamic channel attention technique crafted to enhance the feature representation within convolutional neural networks (CNNs). It tackles CNN limitations by adaptively recalibrating the importance of different channels within a feature map. The SE mechanism comprises two pivotal operations: squeeze and excitation. In the squeeze operation, global spatial information is obtained by conducting global average pooling on the feature map, which results in a channel descriptor. Subsequently, the excitation operation employs this descriptor to generate channel-wise weights through a set of learnable transformations. These weights effectively modulate the original feature map, accentuating pertinent channels while suppressing less informative ones. This mechanism significantly bolsters the discriminative capacity of features and elevates model performance across a spectrum of tasks.
The efficient channel attention (ECA) module is another attention mechanism tailored to improving feature learning in CNNs while being computationally efficient. ECA operates by exploiting the interdependencies among channels in a feature map. Instead of using global pooling, ECA employs a 1D convolutional operation along the channel dimension, capturing channel-wise dependencies. The resulting attention map is obtained by applying a softmax function to the convolutional output, and it is then used to re-weight the feature map. ECA amplifies information dissemination among channels, empowering the network to prioritize vital features while discarding extraneous ones, all without imposing substantial computational overhead. This makes ECA particularly suitable for resource-constrained scenarios where efficiency is crucial, while still providing performance gains by effectively integrating contextual information within feature representations.

4.5.1. Comparing Results from Experiments Introducing Attention Mechanism

To validate the enhanced algorithm’s efficacy, this research employed the ODConv and two attention mechanisms, integrating it into the YOLOv7 object detection framework for experimental evaluation and utilizing metrics such as mAP@0.5, mAP@0.5:0.95, and precision for assessment. The experimental results are shown in Table 1.
As indicated in Table 1, in comparison to the base YOLOv7 algorithm, the SE-YOLOv7 variant exhibited a 2.99% decrease in precision, a 2.48% decrease in mAP@0.5, and a 1.86% decrease in mAP@0.5:0.95. Conversely, the ECA-YOLOv7 approach demonstrated a 1.26% precision decrease, an 0.42% mAP@0.5 decrease, and a 1.12% reduction in mAP@0.5:0.95.
The results in Table 1 underscore that both the SE-YOLOv7 and ECA-YOLOv7 adaptations exerted less influence than the original YOLOv7 model. In contrast, when juxtaposed with the base YOLOv7 algorithm, the ODConv-YOLOV7 strategy showcased a notable improvement, boasting a 1.13% accuracy upsurge, a 1.31% mAP@0.5 enhancement, and an impressive 0.34% mAP@0.5:0.95 increase.

4.5.2. Contrast Experiment Results of IoU

In a consistent experimental setup using the same neural network model, the convergence of the YOLOv7 loss function was confirmed. Figure 6 showcases graphs that illustrate how two different loss functions evolve in terms of iterations. These curves represent the average bounding box loss, with computations using C I o U and A l p h a _ G I o U for bounding box loss determination.
From Figure 6, it is evident that, with an increase in the number of iterations, both A l p h a _ G I o U and C I o U eventually reach convergence. However, A l p h a _ G I o U consistently demonstrates lower loss values and greater stability in comparison to C I o U . Consequently, the utilization of A l p h a _ G I o U as the bounding box loss function for this dataset holds more significance in terms of augmenting the network model’s performance.

5. Ablation Experiment

To assess the influence of Alpha_GIoU on model training results, we retrained the model using an identical experimental configuration as the prior iteration. Furthermore, we carried out ablation experiments in tandem with ODConv. The training process is visually depicted in Figure 7.
From Table 2, it can be observed that without introducing ODConv and Alpha_GIoU, the average recognition accuracy (mAP) for bird detection on power transmission lines is 75.84%. With the introduction of ODConv and the Alpha_GIoU, the precision for bird detection on power transmission lines improved by 2.28% and the mAP0.5 improved by 2.58%, respectively. This indicates that ODConv and the Alpha_GIoU play a crucial role in enhancing the performance of bird detection on power transmission lines.

5.1. Experimental Comparison between YOLOv7 and the Improved Network Model

To provide a thorough evaluation of the introduced improvements, we conducted a comparative analysis between the YOLOv7 network model and the algorithm presented in this paper. The purpose of this comparison was to underscore the advancements achieved through the proposed modifications.
The detection outcomes for three distinct image scenarios encountered in real-world settings, specifically images with small objects, densely packed objects, and extremely small objects, are depicted in Figure 8, Figure 9 and Figure 10 for both the baseline YOLOv7 network model and the enhanced model.
In Figure 8, for small object images, the original network model detected 49 out of 55 objects, missing 6 objects, while the improved model detected 54 objects, with a loss of only 1 object.
In Figure 9, for densely packed small object images, the baseline model only detected 10 objects, missing 47 objects. In contrast, the improved model detected all 56 objects, with only 1 object missed.
In Figure 10, for extremely small object images, the original network model missed all objects, while the improved model detected 4 objects, with 1 false positive and 3 missed detections.
These findings underscore the enhanced detection capabilities of the algorithm presented in this paper, when contrasted with the baseline YOLOv7 model, for scenarios involving small objects, densely packed objects, and extremely small objects. Our model showcases improved accuracy and a decrease in missed detections, underscoring its efficacy in addressing these demanding situations.

5.2. Comparative Analysis of the Improved YOLOv7 Network Model with Other Network Models

To validate the efficacy of the enhanced YOLOv7 network model, we conducted experiments, comparing it with other network models under consistent configuration settings and initial training parameters. The results are summarized in Table 3.
The results reveal that the improved YOLOv7 network model surpasses classical network models in terms of mAP and precision when operating with images of the same input size. This suggests that the enhanced model is better suited for bird detection on power transmission lines, particularly in scenarios involving small objects.
While we introduced two bird detection algorithms for transmission lines in Section 1, it is important to note that due to the unavailability of their models as open source and their utilization of proprietary datasets, direct comparisons with the improved model proposed in this paper are not feasible.

6. Conclusions

This paper primarily addresses the limitations of existing bird detection models on power transmission lines, specifically their poor performance in detecting small and densely packed targets. To address these obstacles, this research incorporates the ODConv module into the backbone of YOLOv7, amplifying the capability to detect small objects effectively.
Building upon this foundation, the IoU loss function has been optimized by incorporating the Alpha_GIoU loss function to enhance the network’s localization capability. This, in turn, improves the network’s detection accuracy and reduces instances of false positives and false negatives in the detection of small objects.
Additionally, the data augmentation is utilized to enhance the dataset, thereby improving detection accuracy and reducing false positives and missed detections for small targets. It is also expanded to alleviate the issue of insufficient data collection.
In real-world application scenarios, the resolution of bird images on power transmission lines can vary significantly due to changing imaging conditions. Convolutional neural network models frequently exhibit reduced recognition accuracy when dealing with low-resolution targets. Nevertheless, the experimental findings offer strong evidence that the proposed enhanced model surpasses both the original YOLOv7 model and conventional algorithms in the detection of small and closely positioned targets. This underscores the superior detection capabilities of our model in practical, real-world scenarios.
By addressing the limitations of existing models and introducing novel improvements, this study provides valuable insights for enhancing bird detection on power transmission lines, particularly in scenarios involving small and densely packed targets.
Our model has addressed the issue of detecting small targets, but the detection performance remains suboptimal in situations with extremely dense concentrations. Our future work will be focused on addressing this problem.

Author Contributions

T.J.: formal analysis, funding acquisition, investigation, project administration, and resources. J.Z.: conceptualization, data curation, methodology, software, supervision. M.W.: validation and visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Billings, K.; Morey, T. Switchmode Power Supply Handbook, 3rd ed.; McGraw-Hill Education: New York, NY, USA, 2011. [Google Scholar]
  2. Lie, J.; Yu, J.; E, M.; Wang, L.; Yang, Z.J.; Wang, H. Research on application and effect of anti-bird facilities on high voltage transmission lines. J. Northeast. Norm. Univ. Nat. Sci. Ed. 2013, 45, 118–121. (In Chinese) [Google Scholar]
  3. Wu, B.; Wu, X.; Liu, S.; Yan, Z.; Chang, B.; Jia, Z.; Jiang, J.; Ouyang, X. Simulation study on bird streamer flashover of 330 kV transmission line. High Volt. Appar. 2018, 54, 120–127. (In Chinese) [Google Scholar]
  4. Tang, Z.; Yuan, X.; Liao, Z.; Ye, B.; Wang, R.; Wu, Z. Statistics analysis and protection of bird fault of Guangdong Shaoguan power grid. Insul. Surge Arresters 2018, 20–24. (In Chinese) [Google Scholar]
  5. Ou, S.; Wang, Y.; Yang, W. Failure analysis and its precaution measure on bird damage to inner Mongolia power grid. Inn. Mong. Electr. Power Technol. 2007, 25, 1–3. (In Chinese) [Google Scholar]
  6. Ma, Z. Importance of habitat protection for bird protection. Bull. Biol. 2017, 52, 6–8. (In Chinese) [Google Scholar]
  7. Huang, X.; Cao, W. Review of the disaster mechanism of transmission lines. J. Xi’an Polytech. Univ. 2017, 31, 589–605. (In Chinese) [Google Scholar]
  8. Lu, M.; Tan, F.; He, Z.; Liu, Z.; Song, S.; Fan, W.; Wang, W. Research on the mechanism of guano flashover in 110 kV transmission lines. Insul. Surge Arresters 2015, 1–7. [Google Scholar] [CrossRef]
  9. IEEE Std 1651-2010; IEEE Guide for Reducing Bird-Related Outages. IEEE: New York, NY, USA, 2011; pp. 1–32.
  10. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
  11. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot MultiBox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14; Springer International Publishing: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
  12. Chen, Y.; Sun, L.; Zhang, Y.; Fu, Q.; Lu, Y.; Li, Y.; Sun, J. Research on Transmission Line Bird Detection Technology based on YOLO v3. Comput. Eng. 2020, 46, 294–300. [Google Scholar]
  13. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
  14. Zou, C.; Liang, Y. Bird Detection on Transmission Line Based on YOLO V3 Algorithm. Comput. Appl. Softw. 2021, 38, 164–167. (In Chinese) [Google Scholar]
  15. Chen, C.; Xiong, Y.; Yan, B.P. Wild birds target detection and tracking technology in video system. E-Sci. Technol. Appl. 2014, 5, 53–58. [Google Scholar]
  16. Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2023, arXiv:2207.02696v1. [Google Scholar]
  17. Li, S.; Wang, Z.; Liu, Z.; Tan, C.; Lin, H.; Wu, D.; Chen, Z.; Zheng, J.; Li, S.Z. Efficient Multi-order Gated Aggregation Network. arXiv 2022, arXiv:2211.03295. [Google Scholar] [CrossRef]
  18. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. Scaled-YOLOv4: Scaling cross stage partial network. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; IEEE Press: New York, NY, USA, 2021; pp. 13024–13033. [Google Scholar]
  19. Ding, X.H.; Zhang, X.Y.; Ma, N.N.; Han, J.G.; Ding, G.G.; Sun, J. RepVGG: Making VGG-style ConvNets great again. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; IEEE Press: New York, NY, USA, 2021; pp. 13728–13737. [Google Scholar]
  20. Jiang, T.T.; Cheng, J.Y. Target recognition based on CNN with LeakyReLU and PReLU activation functions. In Proceedings of the 2019 IEEE Conference on Sensing, Diagnostics, Prognostics, and Control, Beijing, China, 15–17 August 2019; IEEE Press: New York, NY, USA, 2019; pp. 718–722. [Google Scholar]
  21. Ge, Z.; Liu, S.T.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021[EB/OL]. Available online: https://arxiv.org/abs/2107.08430 (accessed on 23 August 2022).
  22. Yang, B.; Bender, G.; Le, Q.V.; Ngiam, J. Condconv: Conditionally parameterized convolutions for efficient inference. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
  23. Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
  24. Li, C.; Zhou, A.; Yao, A. Omni-Dimensional Dynamic Convolution. arXiv 2022, arXiv:2209.07947. [Google Scholar]
  25. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
  26. He, J.; Erfani, S.; Ma, X.; Bailey, J.; Chi, Y.; Hua, X.-S. Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression. arXiv 2021, arXiv:2110.13675. [Google Scholar] [CrossRef]
Figure 1. Ref. [16] the structure of YOLOv7.
Figure 1. Ref. [16] the structure of YOLOv7.
Applsci 13 11940 g001
Figure 2. This picture describes the structure of DyConv, which allows the dynamical generation and application of appropriately shaped and sized convolutional kernels based on the varying feature requirements of the input data.
Figure 2. This picture describes the structure of DyConv, which allows the dynamical generation and application of appropriately shaped and sized convolutional kernels based on the varying feature requirements of the input data.
Applsci 13 11940 g002
Figure 3. While CondConv and DyConv compute a single attention scalar α w i for the convolutional kernel W i , ODConv introduces a novel multi-dimensional attention mechanism. This mechanism simultaneously calculates four different types of attention— α s i , α c i , α f i , and α w i —for the W i kernel across all four dimensions of the kernel space. Detailed formulations and implementations of these mechanisms are provided in the Method Section.
Figure 3. While CondConv and DyConv compute a single attention scalar α w i for the convolutional kernel W i , ODConv introduces a novel multi-dimensional attention mechanism. This mechanism simultaneously calculates four different types of attention— α s i , α c i , α f i , and α w i —for the W i kernel across all four dimensions of the kernel space. Detailed formulations and implementations of these mechanisms are provided in the Method Section.
Applsci 13 11940 g003
Figure 4. Distribution of the augmented training set.
Figure 4. Distribution of the augmented training set.
Applsci 13 11940 g004
Figure 5. Incorporating the YOLOv7 network structure into ODConv.
Figure 5. Incorporating the YOLOv7 network structure into ODConv.
Applsci 13 11940 g005
Figure 6. Loss function iteration comparison.
Figure 6. Loss function iteration comparison.
Applsci 13 11940 g006
Figure 7. The curve of mAP.
Figure 7. The curve of mAP.
Applsci 13 11940 g007
Figure 8. Small object images. (a) YOLOv7; (b) the algorithm in this paper.
Figure 8. Small object images. (a) YOLOv7; (b) the algorithm in this paper.
Applsci 13 11940 g008
Figure 9. Densely packed object images. (a) YOLOv7; (b) the algorithm in this paper.
Figure 9. Densely packed object images. (a) YOLOv7; (b) the algorithm in this paper.
Applsci 13 11940 g009
Figure 10. Extremely small object image. (a) YOLOv7; (b) the algorithm in this paper.
Figure 10. Extremely small object image. (a) YOLOv7; (b) the algorithm in this paper.
Applsci 13 11940 g010
Table 1. Comparative experiments.
Table 1. Comparative experiments.
MethodmAP@0.5mAP@0.5:0.95Precision
YOLOv775.8445.4271.22
SE-YOLOv773.3643.5668.23
ECA -YOLOv775.4244.369.96
ODConv-YOLOV777.1545.7672.35
Table 2. Ablation Experiment.
Table 2. Ablation Experiment.
ODConvAlpha_GIoUmAP@0.5mAP@0.5:0.95Precision
xx75.8445.4271.22
x77.1545.7672.35
x76.8445.5672.22
78.4246.1473.56
Table 3. Model performance comparison.
Table 3. Model performance comparison.
Comparative ModelsmAP0.5mAP0.5:0.95Precision
YOLOv775.8445.4271.22
YOLOv875.1343.2771.87
YOLOv574.343.4170.69
YOLOx77.544.5672.35
The algorithm in this paper78.4246.1473.56
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiang, T.; Zhao, J.; Wang, M. Bird Detection on Power Transmission Lines Based on Improved YOLOv7. Appl. Sci. 2023, 13, 11940. https://doi.org/10.3390/app132111940

AMA Style

Jiang T, Zhao J, Wang M. Bird Detection on Power Transmission Lines Based on Improved YOLOv7. Applied Sciences. 2023; 13(21):11940. https://doi.org/10.3390/app132111940

Chicago/Turabian Style

Jiang, Tingyao, Jian Zhao, and Min Wang. 2023. "Bird Detection on Power Transmission Lines Based on Improved YOLOv7" Applied Sciences 13, no. 21: 11940. https://doi.org/10.3390/app132111940

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop