Next Article in Journal
Applying the Artificial Neural Network and Response Surface Methodology to Optimize the Drilling Process of Plywood
Previous Article in Journal
Numerical Study of Dig Sequence Effects during Large-Scale Excavation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimization Algorithm for Surface Defect Detection of Aircraft Engine Components Based on YOLOv5

Fundamentals Department, Air Force Engineering University, Xi’an 710051, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(20), 11344; https://doi.org/10.3390/app132011344
Submission received: 20 September 2023 / Revised: 10 October 2023 / Accepted: 13 October 2023 / Published: 16 October 2023

Abstract

:
The aircraft engine is a core component of an airplane, and its critical components work in harsh environments, making it susceptible to a variety of surface defects. To achieve efficient and accurate defect detection, this paper establishes a dataset of surface defects on aircraft engine components and proposes an optimized object detection algorithm based on YOLOv5 according to the features of these defects. By adding a dual-path routing attention mechanism in the Biformer model, the detection accuracy is improved; by replacing the C3 module with C3-Faster based on the FasterNet network, robustness is enhanced, accuracy is maintained, and lightweight modeling is achieved. The NWD detection metric is introduced, and the normalized Gaussian Wasserstein distance is used to enhance the detection accuracy of small targets. The lightweight upsampling operator CARAFE is added to expand the model’s receptive field, reorganize local information features, and enhance content awareness performance. The experimental results show that, compared with the original YOLOv5 model, the improved YOLOv5 model’s overall average precision on the aircraft engine component surface defect dataset is improved by 10.6%, the parameter quantity is reduced by 11.7%, and the weight volume is reduced by 11.3%. The detection performance is higher than mainstream object detection algorithms such as SSD, RetinaNet, FCOS, YOLOv3, YOLOv4, and YOLOv7. Moreover, the detection performance on the public dataset (NEU-DET) has also been improved, providing a new method for the rapid defect detection of aircraft engines and having high application value in various practical detection scenarios.

1. Introduction

As the core component of an aircraft, the stability of the aircraft engine is a prerequisite for safe operation, and its performance is also a key factor determining the technical performance and economic benefits of the aircraft.
Currently, most aircraft use turbofan engines to generate thrust, which consist of three main components: the compressor, the combustion chamber, and the turbine. These critical components work in harsh environments of high temperature, high pressure, and high load for long periods of time, enduring significant mechanical and thermal stresses. Due to the extreme working conditions and continuous operation they undergo, these components are prone to defects and damage such as cracks, scratches, pits, and corrosion [1].
These damages not only affect the performance of aircraft engines but also pose significant safety hazards to flight, potentially leading to serious accidents. In recent years, the frequency of flight accidents caused by issues such as blade strikes and combustion chamber cracks has been increasing. Therefore, the timely detection and repair of defective or damaged components are crucial to ensure the reliability and stability of aircraft in various working environments, and they are essential for flight safety.
To ensure the reliability and performance of aircraft engines, aircraft manufacturers and airlines employ strict inspection procedures. Conventional inspection methods mainly involve visual inspections using borescopes, wherein engineers use borescopes to detect and assess defects and damages in internal engine components, and then take appropriate repair measures. This method has achieved good performance in the detection of aircraft engine components. However, the internal structure of aircraft engines is complex, and the effectiveness of the detection work mainly relies on the experience of operators. Moreover, long hours and high-intensity work can lead to visual fatigue and the potential for overlooking small defects [2]. Therefore, there is an urgent need for new technologies to replace traditional detection methods.
With the continuous advancement of technology, new inspection techniques are emerging. In recent years, with the rapid development of computer technology, computer vision has been widely applied in various fields, and object detection is an important part of it. Compared to traditional manual visual inspection, computer vision-based techniques significantly reduce labor costs and offer advantages such as stability, high efficiency, and accuracy [3]. To improve the efficiency of aircraft engine component inspection and reduce inspection costs, object detection techniques can be applied to the surface defect detection of aircraft engine components, achieving automated and efficient inspection.
Currently, object detection techniques can be categorized into two main types: one-stage algorithms and two-stage algorithms. The difference between them lies in their detection processes. One-stage object detection algorithms include YOLO [4] (You Only Look Once) series, SSD [5] (Single Shot MultiBox Detector), EfficientDet [6], and others. These algorithms treat object detection as a regression problem and directly predict the position and category of bounding boxes through convolutional neural networks. Their detection process usually requires only one forward pass. Two-stage object detection algorithms include R-CNN [7] (Region-based Convolutional Neural Network) series, Fast R-CNN [8], Faster R-CNN [9], Mask R-CNN [10], and others. These algorithms first extract region proposals and then classify and regress locations for each candidate box to obtain the final detection results. Tulbure et al. [11] conducted a comprehensive analysis of modern object detection models, which can be applied to industrial defect detection applications. They also analyzed the application of applicable detection models under different main constraints, providing important references for industrial defect detection. Kou et al. [12] added an anchor-free feature selection mechanism to the YOLOv3 algorithm to shorten model computation time and enhance the representation capability of a specially designed dense convolution block. This achieved the surface defect detection of steel strips. Li et al. [13] improved the YOLOv5 algorithm by utilizing channel attention and spatial attention mechanisms to strengthen the fusion of features in neural network fusion. They improved the original networks’ multi-scale feature fusion structure based on the PANet structure and achieved the efficient and accurate detection of remote sensing images. Zhang et al. [14] proposed an improved YOLOv5 algorithm, which added a micro-scale detection layer based on the original algorithm and incorporated the CBAM (Convolutional Block Attention Module) attention mechanism to control the feature information loss of small target defects and significantly improve the efficiency and accuracy of wind turbine blade defect detection. Hui et al. [15] proposed an improved detection model based on Yolov4-micro for detecting blade crack defects in aircraft engines. They introduced an attention mechanism in the backbone network to enhance background discrimination and improved the effect of multi-scale feature fusion by implementing bilinear interpolation in the upsampling module, which greatly improved the detection performance. Xiang et al. [16] proposed an improved YOLOv5 model for crack detection by introducing the Focal-GIOU loss function and replacing CIOU with GIOU to adapt to more irregular targets, thereby enhancing the effectiveness of crack detection.
Significant progress has been made in terms of detection accuracy and speed in the extensive research on object detection. However, the internal space of aircraft engines is narrow and complex, often densely populated with various potential areas of defects. Moreover, there is a wide variety of defect types, which still require further targeted improvements. Jang et al. [17] used Faster R-CNN to detect damage on aircraft engine blades, achieving high levels of accuracy in detecting dents and punctures. Du et al. [18] addressed the issues of complex structures and low detection accuracy in traditional algorithms for aircraft engine sensor fault detection by proposing an Inception-CNN model. They applied this model to aircraft engine sensor fault detection, achieving a detection accuracy of 95.41% on a sensor fault dataset. Chen et al. [19] added attention mechanisms to YOLOv4, enhancing the algorithm’s detection accuracy for welding damages on aircraft engines. Li et al. [20] achieved further improvements in engine component surface defect detection accuracy and speed by recalculating the parameters of preset anchors using the k-means clustering algorithm, adding the ECA-Net (Efficient Channel Attention) attention mechanism in the network, and replacing the Neck structure in the YOLOv5 algorithm.
Building on the above, this article establishes a dedicated dataset for the detection of defects in aircraft engine components, which maximally reproduces the defect features of aircraft engine components in actual usage scenarios. Based on these defect features, an improved YOLOv5 object detection algorithm is proposed with the aim of improving detection accuracy and reducing model size. The improved YOLOv5 algorithm adds a dual-route attention mechanism to enhance detection accuracy, optimizes the C3 module to make the model lightweight, introduces the NWD index to improve small object detection performance, and adds the CARAFE upsampling operator to enhance content-awareness. These improvements enable the effective detection of surface defects in aircraft engine components.

2. Materials and Methods

2.1. Dataset

To ensure the applicability of the algorithm for detecting defects in aircraft engine components, the dataset used in this study was collected offline in workshops and factories. The collection method involved placing an industrial camera vertically 20 cm above the test specimen. An LED ring light was positioned between the industrial camera and the test specimen to provide illumination. The position of the camera was adjusted to ensure that the field of view covered the central channel of the ring light completely. To conform to the image input requirements of the YOLOv5 algorithm, the output resolution of the industrial camera was adjusted to 640 × 640, and the test specimens were photographed.
A total of 1200 original images were obtained through sampling. The LabelImg (Tzuta Lin, National Taiwan University, Taiwan, China) software was used for annotation to generate the txt files required by YOLOv5, which contain the categories and locations of the defects in the images. These images were then filtered and classified into four defect categories: pits, cracks, scratches, and roughness. Based on the proportion of each defect category in the images, there were 397 images in the pit category, 199 images in the crack category, 283 images in the scratch category, and 321 images in the roughness category. Each image contained varying numbers of defects, ranging from 1 to 20. This is illustrated in Figure 1. The defects in the figure have been labeled and some of the markings have been enlarged for easy viewing.
It can be seen that the defect features in this dataset cover a wide range, including small target defects, commonly found industrial defects, and difficult-to-detect defects. The dataset includes a significant proportion of small target defects, such as the pit category, which accounts for one-third of the total data. These are small defects with a diameter of less than 1 mm, occupying only a minimal number of pixels in the detection images. Crack defects often occur due to processing errors or component fatigue and have a relatively large proportion in practical situations. Roughness defects vary in size and shape and have a high degree of overlap with the background, making them difficult to distinguish.
The images of each defect category in the dataset were divided into training, validation, and testing sets in an 8:1:1 ratio. This resulted in 960 training images and 120 images each for validation and testing.

2.2. YOLOv5 Algorithm

YOLO is a widely used one-stage object detection algorithm that is applied to the tasks of object classification and localization in images or videos. Its main feature is fast inference, allowing for efficient object detection in real-time or high-speed scenarios.
To continuously improve the performance of the algorithm, the authors of YOLO have released multiple versions of the model. The YOLOv5 used in this paper is an improvement based on YOLOv4. During the model training phase, YOLOv5 incorporates various new techniques and functionalities to enhance its performance. It introduces mosaic data augmentation, which combines multiple images to increase the diversity of training data. It utilizes adaptive image scaling and anchor box calculations to adapt to objects of different scales. The Spatial Pyramid Pooling [21] (SPP) layer is introduced to enhance the detection receptive field. YOLOv5 also incorporates the Focus and Cross-Stage Partial (CSP) structures to improve and optimize the base network. Furthermore, the FPN [22] (Feature Pyramid Network) and PAN [23] (Path Aggregation Network) structures are added to further enhance the model’s performance. Improved methods such as DIoU-NMS and GIOU Loss are used for prediction filtering and training, resulting in significant improvements in both accuracy and speed for YOLOv5.
Currently, YOLOv5 has been updated to version 7.0 and provides five different models: n/s/m/l/x, catering to different usage scenarios and needs. These models gradually increase in size and computational complexity, corresponding to improved accuracy. Considering the target application scenario in this paper, we have selected the smaller and faster ‘s’ version model, which ensures detection accuracy while maximizing processing speed to handle various complex scenes.

3. Improvement Methods and Improved Models

3.1. Introducing BiFormer’s Bi-Level Routing Attention Mechanism

The attention mechanism is a mechanism that focuses on important information, adjusts weights, and improves model performance. It is commonly used in various object detection tasks to enhance detection accuracy and robustness by selecting and focusing on regions or features related to the target. The general attention mechanism typically refers to self-attention, which is a key technique used in the Transformer [24] model. It assigns weights by calculating the interdependencies between different positions of the input sequence, thereby adjusting the attention given to each position in the sequence. In the original Transformer network, this type of structure requires computing the correlations between any two elements in the input sequence. When the sequence length is large, the computational complexity grows exponentially, resulting in high computational resource consumption for models incorporating such techniques.
Based on this, Zhu et al. [25] proposed a visual Transformer model called BiFormer, which introduces a Bi-level routing attention mechanism into the visual Transformer. It enhances feature representation by enabling information interaction between the global attention and local attention levels. The global attention level is used to capture the overall structure and global semantic information of the image, while the local attention level is used to capture the details and local features of the image. The introduction of this Bi-level routing attention mechanism enables a better handling of global and local relationships in images, effectively capturing both the structural and detailed features of the image. It significantly improves the performance of image classification tasks. The structure is shown in Figure 2.
In Figure 2, Q is the query used to compute the weighted relevance to the keywords. K represents the keywords or identifiers used to provide information or for matching purposes. V is the value associated with the query results and keyword information. C is a scalar factor used to adjust the allocation of attention and control the focus of attention. A is the adjacency matrix used to represent the semantic relevance between two regions. O is the output of the attention mechanism.
During the training process, the Biformer model first extracts features from the input image using a convolutional neural network, resulting in a feature map of size H × W × C . This feature map is then split into r sub-feature maps of size H × W × C / r along the channel dimension C .
In the global attention part, the Biformer model maps each sub-feature map to a vector of dimension d through two fully connected layers. It calculates the similarity between this vector and all other vectors, obtaining a weight vector. The sub-feature maps are then weighted and summed according to this weight vector, resulting in an output vector that represents the global relationship.
In the local attention part, the Biformer model applies two convolutional layers to each sub-feature map to obtain a local feature map. It then calculates the similarity between this feature map and all other feature maps, obtaining another weight vector. The feature maps are weighted and summed according to this weight vector, resulting in an output vector that represents the local features.
In summary, by utilizing the global routing attention mechanism, each sub-feature map is transformed into a semantic vector representing the global relationship. Simultaneously, through the local routing attention mechanism, it is transformed into a semantic vector representing the local features. Finally, the Biformer model concatenates the output vectors from global attention and local attention. Through fully connected layers, these vectors are mapped to the class and position information of the target object. The probability output is obtained using the s o f t m a x function, providing the class and coordinate information of the target object.
After a series of studies, the Biformer model has been proven suitable for detecting small objects [26,27]. Thanks to the introduction of the Bi-level routing attention mechanism, it can simultaneously focus on the local features and global relationships of the detection target. This enables better discrimination between defective targets and the background. In conclusion, there is a high compatibility between the dataset used in this paper and the Biformer model, which is expected to achieve significant improvement in accuracy.

3.2. C3-Faster: Based on the Lightweight FasterNet Model

In order to improve the speed of object detection algorithms, Chen et al. [28] proposed a new method called PConv (Partial Convolution). The basic idea of this method is to only convolve the non-zero parts of the input, while leaving the zero parts untouched. This approach reduces redundant computations and memory access, allowing for a more efficient extraction of spatial features. Based on this idea, they introduced the FasterNet network, which significantly improves detection speed without compromising accuracy.
Building on this, this paper further proposes the C3-Faster module to replace the C3 module used for feature extraction in YOLOv5. The C3-Faster module utilizes the constructor function of the C3 module and introduces the PConv layer and Faster_Block module.
Both the Partial Convolution layer and Faster_Block module adopt a similar concept to PConv, where a portion of the weights in the convolutional kernel is set to 0, effectively ignoring certain pixels. This partial convolution operation enhances the model’s robustness, generalization ability, and non-linear expression, leading to improved detection accuracy. The Faster_Block processes the input feature map using partial convolutions, reducing computational complexity and model parameters to improve algorithm efficiency. This design allows for a more effective utilization of computational resources and accelerates the inference speed of the model. The structure of the Faster_Block is shown in Figure 3. In the diagram, “conv” is an abbreviation for convolution, referring to the convolution operation.
In the figure above, * stands for the convolution operation. h and w represent the height and width of the feature map; c p refers to the channel number used in PConv; k represents the size of the convolution kernel; Identity denotes the identity mapping, which is used to maintain the input and output sizes consistent.
In the forward propagation process of the Faster_Block module, the connection paths can be disabled by modifying parameters to enable the DropPath (Dropout with Paths) functionality during training. DropPath is a regularization technique for deep neural networks, which can be seen as a structured form of dropout operation. It helps improve the model’s generalization ability, allowing it to better adapt to new, unseen data. DropPath applies dropout operation at each level of the network architecture by randomly “dropping” or disabling certain connection paths with a certain probability. Unlike traditional dropout, DropPath operates on the network’s connection paths rather than individual neurons. During training, DropPath randomly sets certain connection paths to zero, simulating the random disconnection of connections during inference. This encourages the network to learn more robust feature representations in the presence of partially disconnected connections, thus increasing the network’s generalization ability.
In summary, this paper introduces the concept of partial convolution and further proposes the C3-Faster module based on the FasterNet network to replace the original C3 module in order to improve detection speed, reduce algorithm size, and strive to maintain detection accuracy as much as possible.

3.3. Adding NWD to the Loss Section

As mentioned earlier, small object defects account for one-third of the total dataset. However, in traditional object detection tasks, detecting small objects often comes with difficulties. Small objects, due to their small size and sparse information, are easily affected by factors such as noise and background interference, resulting in inaccurate detection results. To address this issue, Wang et al. [29] proposed a new metric called normalized Wasserstein distance (NWD) to measure the similarity between small object bounding boxes. Based on this, NWD is embedded into the loss function of label assignment, non-maximum suppression, and anchor-based detectors to replace the original IoU metric, leading to a new small object detector.
This module extracts small objects from the image. Since these objects are generally not strictly rectangular, the module calculates based on the bounding boxes. Within each bounding box, the pixel weights decrease progressively from the center toward the edges. This models each target bounding box as a two-dimensional Gaussian distribution, with the mean and variance determined by the position and size of the target. For each extracted target, the Wasserstein distance based on optimal transport theory is used to calculate the distribution distance between it and other targets. The distance matrix is then normalized to eliminate the influence of scale factors, making the distances between different targets comparable and better reflecting the similarity of small objects. This yields a new measurement of normalized NWD. Finally, based on the normalized distance matrix, object detection is performed, as shown in Figure 4.
Compared to the original algorithm, incorporating the NWD detector and using Gaussian Wasserstein distance as a similarity measure allows for more accurate differentiation between small objects and the background. By normalizing the distance matrix, the influence of size factors can be effectively eliminated, enabling comparability between different targets and more accurate classification of small objects. This enhances the accuracy and robustness of detection. Modeling small objects based on Gaussian distributions, using the object bounding boxes, provides better adaptability to objects of various scales.

3.4. Addition of Lightweight Upsampling Operator CARAFE

In object recognition algorithms, upsampling is a commonly used operation that expands low-resolution object feature maps to higher resolutions. In convolutional neural networks, target features in higher-level feature maps are more abstract and have lower resolutions compared to lower-level feature maps. On the other hand, object features in lower-level feature maps are finer and have higher resolutions. By performing upsampling on the lower-level feature maps, their size can be expanded to match the size of the higher-level feature maps. This allows for the combination of fine-grained features from the lower-level feature maps with abstract features from the higher-level feature maps, enabling the transfer of detailed information from the lower-level feature maps to the higher-level feature maps. This integration of bottom-up and top-down features provides a more comprehensive representation of the target objects, thereby improving the accuracy and robustness of the object recognition algorithm in complex scenes.
To achieve efficient feature map upsampling in deep neural networks, a Content-aware ReAssembly of Features (CARAFE) module was proposed by Wang et al. [30]. This module is characterized by its versatility, lightweight design, and high efficiency. Compared to traditional upsampling modules, the CARAFE module utilizes local information to reassemble features instead of relying on simple interpolation or convolution operations. By introducing specialized resampling methods, it can expand the receptive field while preserving detailed features. Additionally, CARAFE does not depend on sub-pixel neighborhood operations but integrates information in a larger receptive field, enabling better content awareness. The CARAFE module can also dynamically generate adaptive kernels based on specific content, thereby improving the quality and accuracy of the feature map and demonstrating superior performance when handling different types of features.
During the runtime of the CARAFE module, a predefined region is defined around each position as the center, and feature reassembly is performed using weighted combinations. This is achieved by generating weights in a content-aware manner. Multiple sets of generated upsampling weights are present at each position, and these weights are then rearranged to form a complete spatial block for feature upsampling. Specifically, given a feature map X of size C × H × W and an upsampling rate σ (assuming σ is an integer), a new feature map X of size C × σ H × σ W is generated as shown in Equations (1) and (2):
W l = ψ ( N ( X l , K e n c o d e r ) )
X l = ϕ ( N ( X l , K u p ) , W l )
For the X output above, a source position l = ( i , j ) in X can be found for any selected target position l = ( i , j ) . Here, N ( X l , k ) is a k × k subregion of feature map X centered at l , and K u p is the size of the reassembly kernel, used to define the spatial range of each kernel. For performance and efficiency considerations, the convolution layer K e n c o d e r with a kernel size of K u p 2 is usually used. In Equation (1), ψ is the kernel prediction module that predicts the kernel W l for each l based on the subregion of X l . The output is a reassembly kernel of size C u p × H × W , where C u p refers to the number of channels in the reassembly kernel, which is used to specify the dimensionality of the reassembly kernel. Its definition is shown as Equation (3):
C u p = σ 2 k u p 2
In Equation (2), ϕ is the content-aware reassembly module, it is used to generate reassembly kernels to achieve image reassembly functionality, which combines the subregion of X l with the kernel W l to form a reassembled feature, as shown in Equation (4), where r = k u p / 2 .
ϕ ( X l ) = n = r r m = r r W l ( n , m ) X ( i + n , j + m )
In conclusion, the CARAFE module is an effective content-aware upsampling method that can be integrated into the YOLOv5 network architecture to improve performance in various computer vision tasks.

3.5. Improved Network Structure

Based on the previous text, the network backbone is enhanced by adding the Bi-level routing attention mechanism from the Biformer model. The NWD algorithm is introduced into the loss part. The C3 module in the original YOLOv5 algorithm is replaced with the C3-Faster module proposed based on the FasterNet network. Finally, the lightweight upsampling operator CARAFE is added to the network. The improved YOLOv5 model network structure is shown in Figure 5. In the diagram, “BN” refers to batch normalization. Batch normalization is a commonly used deep learning technique that normalizes the input data.

4. Experimental

The experimental environment in this article is based on the Linux operating system with 90 GB of RAM. Pytorch 1.11.0 is used as the deep learning framework, with Python 3.8 and CUDA 11.3. The CPU used is the Intel(R) Xeon Platinum 8352 V @2.10 GHz, and the GPU is NVIDIA GeForce RTX 4090 24G.

4.1. Experimental Evaluation Metrics

This paper compares the improved network with the original network in terms of model parameters and detection parameters.
Model parameters refer to the number of trainable parameters in the network architecture, which depends on the structure and layers of the network. It includes the sizes of convolutional kernels, channel numbers, and the number of neurons in fully connected layers. Generally, it is an indicator of model complexity and computational resources required. FLOPs (floating-point operations) represent the number of floating-point operations performed during inference. In deep learning models, both parameters and input data are floating-point numbers. The computation process of the model mainly involves operations such as matrix multiplication, convolution, and activation functions, all of which require a large number of floating-point operations. This paper uses GFLOPs to evaluate the computational complexity of the model. Weight volume refers to the size of the weight file generated after training the model. It represents the total amount of weight parameters used in the model. Weight parameters are learned through optimization algorithms during the training process and are used to represent the connection weights of the neural network in the model. These weight parameters store the knowledge and feature representation capability of the model.
The detection parameters of the model include precision, recall, detection speed (FPS), and AP@50 [31]. Precision is a metric for evaluating the model’s prediction results, which measures the proportion of samples predicted as positive by the model that are truly positive. Recall is a metric for evaluating the model’s prediction ability, which measures the proportion of true positive samples that the model successfully predicts as positive. Detection speed is the number of images per second that the model can reason about. AP (average precision) is a commonly used evaluation metric for object detection, which measures the average precision at different confidence thresholds in predictions and uses a specific IoU threshold to determine the matching of positive and negative examples. AP50 represents the average mAP value for each class when the IoU threshold is 0.5, while mAP is the mean precision at different recall levels. This metric considers the comprehensive performance of both precision and recall and is currently a mainstream evaluation metric for object detection. The relevant calculation formulas are shown below:
P = T P ( T P + F P )
R = T P ( T P + F N )
A P = i = 1 n P ( i ) Δ R ( i ) = 0 1 P ( R ) d R
A M P = i = 1 N P A i N
In the formula, P and R refer to precision and recall, respectively. N represents the number of target classes. T p refers to the number of detected boxes with IoU > 0.5 (calculated once for each ground truth value). F p refers to the number of detected boxes with IoU 0.5 (or the number of redundant detections of the same ground truth). F N refers to the number of ground truths that were not detected. A M P refers to the mean average precision, which is mAP.

4.2. Training Parameter Settings

YOLOv5 adopts the K-means clustering algorithm to automatically generate anchor boxes based on the statistical analysis of different-sized targets in the training set. This allows the network to effectively detect objects of various sizes and ratios.
In this study, the training parameter settings are as follows: To ensure fast convergence to a good initial state, the initial learning rate is set to 0.01, and stochastic gradient descent (SGD) is utilized for optimization. To better adjust the learning rate, a warm-up epoch of 5 is employed to gradually increase the learning rate every five training epochs, avoiding instability at the beginning of training. A learning rate momentum of 0.98 is used to maintain stable convergence during the training process. To control model complexity and prevent overfitting, a weight decay coefficient of 5 × 10−5 is set. In terms of computational resources, the work-number is set to 32, enabling parallelized training and improving training efficiency. The original YOLOv5 model uses a batch size of 128. In subsequent training processes, the batch-size parameter is adjusted based on model size and available GPU memory to ensure optimal performance and effectiveness.
During the initial training phase, the original YOLOv5 model’s “s” version is used, and each improvement proposal is sequentially tested to determine the training epoch as 600. Through ablation experiments, the parameters and performance of each model are compared and analyzed.

4.3. Training Loss

This study presents the training loss of the YOLOv5-s model with four improvements: Biformer, C3-Faster, NWD, and CARAFE. Figure 6 shows the training loss (blue line) and validation loss (orange line) of the model. The three different loss curves objectively illustrate the training process of the detection algorithm in this study.
The Box loss in the figure is used to calculate the bounding box localization loss. It measures the model’s ability to predict object positions by calculating the difference between predicted and actual bounding boxes. The Obj loss is the confidence loss, computed using Focal loss, which evaluates the model’s confidence error in predicting objects and backgrounds. The Cls loss is the classification loss, obtained by computing the cross-entropy loss between the predicted class results and actual labels. It measures the model’s ability to classify each category.
It can be observed that the loss values become stable when reaching 600 training epochs, indicating that the model has converged. The Val curve initially shows signs of overfitting but stabilizes after 100 iterations.

5. Results

5.1. Model Performance Comparison

To determine the impact of each optimization structure on network performance, this study conducted ablation experiments by adding different improvements to the YOLOv5-s model. The results, including the model’s parameters and test performance after the improvements, are shown in Table 1.
By analyzing the experimental results in the above table, it can be observed that adding the Biformer attention mechanism to the network increases the model size and consumes more computational resources. Compared to the original YOLOv5 algorithm, the number of training parameters increases by 3.78%, floating-point operations increase by 64%, and weight volume increases by 3.7%. Thanks to these improvements, Biformer is able to better handle global and local relationships in images, as well as capture both the overall structural features and fine-grained details. This enables better discrimination between defect targets and backgrounds, resulting in a 7.2% improvement in AP50 and significant optimization in detection accuracy.
On the other hand, replacing the C3 module with C3-Faster, using a novel convolutional approach similar to PConv, results in a notable decrease in training parameters and computational complexity. Specifically, the number of training parameters decreases by 17.5%, floating-point operations decrease by 20%, and weight volume decreases by 17.05%. However, the AP50 only decreases by 0.7%. This lightweight algorithm model effectively ensures detection accuracy while achieving significant model simplification.
The introduction of NWD and CARAFE does not affect the network’s parameter count, floating-point operations, or model weight volume. Both improvements are reflected in the improvement of detection accuracy. NWD benefits from using Gaussian Wasserstein distance as a similarity metric, which allows for more accurate discrimination between small objects and backgrounds. Compared to the original YOLOv5 model, AP50 is improved by 3.8% and 3.1%, respectively. CARAFE, on the other hand, utilizes local information to reassemble features while preserving details and expanding the receptive field. This results in a 3.1% improvement in AP50.

5.2. Comparison of Defect Detection Performance for Each Category

To study the detection performance of the proposed improvement methods on different categories of defects in the YOLOv5 algorithm, experimental data were extracted, as shown in Figure 7.
From the analysis of the above figure, it can be observed that the best detection performance for all four defect categories in this dataset is achieved by the improved model. The major issue with the original model is its poor detection performance for the “Crack” category of defects. After the improvements, except for the lightweight model introduced by C3-Faster, the other three improvements show significant enhancements in detecting this type of defect, reaching an acceptable level. The best detection performance for the “Pit”, “Crack”, and “Scratch” defect categories is achieved by the green model that incorporates all the improvements. Compared to the original model, the detection accuracy for the “Pit” category improved by 9%, the detection accuracy for the “Crack” category improved by 22.9%, and the detection accuracy for the “Scratch” category improved by 5.2%. The best detection performance for the “Roughness” category of defects is achieved by the orange model that includes Biformer, resulting in a 7.2% improvement compared to the original model.
The improved model significantly enhances the overall detection performance on this dataset, greatly improving the detection performance of defect categories that the original model struggled with, reaching an acceptable level. It has high practical value in real-world detection scenarios.

5.3. Detection Performance Comparison

To visually understand the improvement in detection performance of the model on this dataset and demonstrate the effectiveness of the improvements, the original model and the model incorporating all the improvements were tested on a test set of 120 images. Partial detection results are shown in Figure 8.
From the above comparison, it can be observed that the original model tends to miss small targets in the “Pit” category. It also exhibits issues such as incomplete detection and incorrect localization for targets in the “Crack” and “Roughness” categories when they are blended with the background or in complex scenes. The improved model, by introducing Biformer, NWD, and CARAFE mechanisms, significantly enhances the recognition of small targets, effectively avoiding the issue of missed detections. By expanding the receptive field, it improves the accuracy of localization and recognition for larger targets. Moreover, it enhances the recognition performance for difficult-to-detect targets in complex scenes and shows higher sensitivity toward continuous and dense defects. The coverage and detection accuracy of various defect targets are significantly improved.

5.4. Comparison of Detection Performance on Public Datasets

To further verify the improved algorithm proposed in this paper, we conducted experiments on the steel surface defect dataset (NEU-DET) created by the team led by Song [32]. The dataset contains six defect categories: crazing, inclusion, patches, pitted surface, rolled in scale, and scratches. There are 300 identification images for each type of defect, and the defect annotation information for each category is saved in an XML file. In total, the dataset contains 1800 grayscale images and 4189 bounding boxes of detected defects.
The 1800 images in the dataset were divided into training, validation, and test sets in an 8:1:1 ratio, resulting in 1440 training images and 180 validation and test images each. The defect images for each category in the dataset are shown in Figure 9, the location of one of the defective targets has been marked with a red box for easy viewing. with the location of the defect target indicated by the red box.
The experiments were conducted with exactly identical training parameters for each improvement scheme, and the detection performance of each model on the NEU-DET dataset was compared and analyzed through ablation experiments. The results are shown in Table 2.
Through analyzing the experimental results in the table above, it can be concluded that the four improvement points proposed in this paper have clear advantages in different fields, optimizing the feature representation ability of the detection model. Based on the different improvement points, the following conclusions can be drawn:
Adding a dual-layer routing attention mechanism can improve the detection accuracy of the network, but it also increases the overall computational complexity and weight volume. This means that more computing resources and storage space are needed to obtain more accurate detection results.
By adding the NWD detection metric and the CARAFE upsampling operator, the network can improve detection accuracy while ensuring computational resources and weight volume. This means that, by introducing these improvement points, performance of target detection can be improved under the same hardware conditions.
The C3-Faster module performs well in the NEU-DET dataset, enabling the model to achieve significant detection improvement while remaining lightweight. This means that, in scenarios where speed, accuracy, and model size are all required, the C3-Faster module is an effective choice.
When these four improvement points are used together, they can complement each other and achieve the optimal improvement in detection effect. However, this integrated optimization will also be accompanied by an increase in the model’s computational complexity and weight volume.
In summary, the optimized model proposed in this paper can also improve the detection effect on datasets in other fields, and each improvement point has obvious characteristics. According to the requirements of speed, accuracy, and model size in different fields, we can flexibly choose the appropriate improvement methods to achieve the optimal detection effect for datasets with different target features and different needs.

5.5. Performance Comparison of Mainstream Detection Algorithms

To further evaluate the detection performance of the proposed improved model in the current stage, experiments were conducted to compare and analyze it with other object detection algorithms under the same experimental conditions. We selected classic one-stage detection algorithms such as SSD (Single Shot MultiBox Detector), RetinaNet, and FCOS (Fully Convolutional One-Stage Object Detection), as well as representative algorithms of the same category like YOLOv3, YOLOv4, and YOLOv7.
By conducting experiments on the same dataset and evaluation metrics, we were able to objectively assess the effectiveness and superiority of the proposed improved model. For each algorithm, we recorded its detection accuracy, detection speed, and model size, as shown in Table 3.
Through the comparison of experimental results, we found that the proposed improved model has significant advantages in terms of detection performance compared to traditional algorithms and algorithms of the same category. It demonstrates improvements in overall detection accuracy, detection speed, and model size. It surpasses the new-generation YOLOv7 model in all aspects. Moreover, the targeted improvements in the model show better performance in terms of target localization accuracy, small target recognition, and difficult-to-detect target recognition. This makes it capable of meeting the requirements for high-accuracy, high-speed, and lightweight aviation engine component defect detection applications.

6. Conclusions and Future Work

To achieve efficient and accurate detection of surface defects on aviation engine components, this paper proposes an improved YOLOv5 optimization algorithm. Targeting the defect features of the aviation engine component defect dataset, a Biformer module is added. By using a Bi-level routing attention mechanism, it simultaneously focuses on the local features and global relationships of the detection targets, effectively improving the overall detection accuracy. Based on the FasterNet network, the C3-Faster module is proposed to replace the original C3 module, enhancing robustness, ensuring accuracy, and achieving model light-weight design. The NWD detection metric is introduced, using normalized Gaussian Wasserstein distance to enhance the detection accuracy of small targets. Additionally, a versatile and lightweight CARAFE upsampling operator is added to recombine local information features, expand the receptive field, and improve content-awareness performance. Experimental results demonstrate that the improved YOLOv5 model achieves significant improvements in detection accuracy compared to the original YOLOv5 model. It elevates the recognition of targets with poor performance in the original algorithm to a usable level, while also significantly reducing the model’s parameter quantity and weight volume. Furthermore, it exhibits noticeable advantages in accuracy, speed, and volume compared to mainstream algorithms. This approach can effectively enhance work efficiency and reduce labor demands in the field of aviation engine component detection, providing a high application value.
In future work, the content of our dataset will continue to increase. The present study only includes four types of defects, but in reality, there may be many more types of defects. We will further supplement the defect types in the dataset. To enhance the algorithm’s generalization ability in practical applications, we will consider adding defect image data captured under different lighting conditions, exposures, and other complex environments. Currently, the algorithm model only supports real-time detection when connected to a computer. In the future, efforts will be made to deploy it on portable devices such as bore scopes to increase detection flexibility and efficiency. Once the dataset is fully developed, we plan to explore additional algorithm types to continuously improve detection accuracy and speed, maximizing its applicability in safeguarding and maintenance work for aircraft engines.

Author Contributions

Conceptualization, C.W.; methodology, Y.Q.; software, Y.Q.; validation, Y.Q.; formal analysis, Y.X.; investigation, Y.X. and J.Y.; resources, Y.X. and X.C.; data curation, Y.K.; writing—original draft preparation, Y.Q. and Y.K.; writing—review and editing, C.W.; visualization, Y.Q. and J.Y.; supervision, C.W.; project administration, C.W.; funding acquisition, C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Natural Science Basic Research Program of Shaanxi, Program number 2023-JC-QN-0696.

Data Availability Statement

The data that support the findings of this research are openly available at http://faculty.neu.edu.cn/songkechen/zh_CN/zhym/263269/list/index.htm (accessed on 15 September 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

List of abbreviations.
YOLO24.9
SSD46.6
R-CNNRegions with CNN features
CBAMConvolutional block attention module
FPNFeature pyramid network
PANPath aggregation network
PConvPartial convolution
NWDNormalized Wasserstein distance
IoUIntersection over union
CARAFEContent-aware reassembly of features
PPrecision
RRecall
APAverage precision
mAPMean average precision

References

  1. Shang, H.; Sun, C.; Liu, J.; Chen, X.; Yan, R. Deep learning-based borescope image processing for aero-engine blade in-situ damage detection. Aerosp. Sci. Technol. 2022, 123, 107473. [Google Scholar] [CrossRef]
  2. Li, D.; Li, Y.; Xie, Q.; Wu, Y.; Yu, Z.; Wang, J. Tiny defect detection in high-resolution aero-engine blade images via a coarse-to-fine framework. IEEE Trans. Instrum. Meas. 2021, 70, 1–12. [Google Scholar] [CrossRef]
  3. Kong, X.; Li, J. Image registration-based bolt loosening detection of steel joints. Sensors 2018, 18, 1000. [Google Scholar] [CrossRef] [PubMed]
  4. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  5. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; Part I 14; pp. 21–37. [Google Scholar]
  6. Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
  7. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
  8. Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  9. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef]
  10. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
  11. Tulbure, A.A.; Tulbure, A.A.; Dulf, E.H. A review on modern defect detection models using DCNNs–Deep convolutional neural networks. J. Adv. Res. 2022, 35, 33–48. [Google Scholar] [CrossRef]
  12. Kou, X.; Liu, S.; Cheng, K.; Qian, Y. Development of a YOLO-V3-based model for detecting defects on steel strip surface. Measurement 2021, 182, 109454. [Google Scholar] [CrossRef]
  13. Li, Z.; Yuan, J.; Li, G.; Wang, H.; Li, X.; Li, D.; Wang, X. RSI-YOLO: Object Detection Method for Remote Sensing Images Based on Improved YOLO. Sensors 2023, 23, 6414. [Google Scholar] [CrossRef]
  14. Zhang, R.; Wen, C. SOD-YOLO: A Small Target Defect Detection Algorithm for Wind Turbine Blades Based on Improved YOLOv5. Adv. Theory Simul. 2022, 5, 2100631. [Google Scholar] [CrossRef]
  15. Hui, T.; Xu, Y.; Jarhinbek, R. Detail texture detection based on Yolov4-tiny combined with attention mechanism and bicubic interpolation. IET Image Process. 2021, 15, 2736–2748. [Google Scholar] [CrossRef]
  16. Xiang, X.; Hu, H.; Ding, Y.; Zheng, Y.; Wu, S. GC-YOLOv5s: A Lightweight Detector for UAV Road Crack Detection. Appl. Sci. 2023, 13, 11030. [Google Scholar] [CrossRef]
  17. Jang, J.; An, H.; Lee, J.H.; Shin, S. Construction of faster R-CNN deep learning model for surface damage detection of blade systems. J. Korea Inst. Struct. Maint. Insp. 2019, 23, 80–86. [Google Scholar]
  18. Du, X.; Chen, J.; Zhang, H.; Wang, J. Fault detection of aero-engine sensor based on inception-CNN. Aerospace 2022, 9, 236. [Google Scholar] [CrossRef]
  19. Chen, Z.H.; Juang, J.C. Attention-based YOLOv4 algorithm in non-destructive radiographic testing for civic aviation maintenance. Preprints 2021. [Google Scholar] [CrossRef]
  20. Li, X.; Wang, C.; Ju, H.; Li, Z. Surface defect detection model for aero-engine components based on improved YOLOv5. Appl. Sci. 2022, 12, 7235. [Google Scholar] [CrossRef]
  21. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
  22. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
  23. Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
  24. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
  25. Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R.W. BiFormer: Vision Transformer with Bi-Level Routing Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 10323–10333. [Google Scholar]
  26. Yang, Z.; Feng, H.; Ruan, Y.; Weng, X. Tea Tree Pest Detection Algorithm Based on Improved Yolov7-Tiny. Agriculture 2023, 13, 1031. [Google Scholar] [CrossRef]
  27. Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef]
  28. Chen, J.; Kao, S.H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 12021–12031. [Google Scholar]
  29. Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021, arXiv:2110.13389. [Google Scholar]
  30. Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. Carafe: Content-aware reassembly of features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar]
  31. Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
  32. Bao, Y.; Song, K.; Liu, J.; Wang, Y.; Yan, Y.; Yu, H.; Li, X. Triplet-graph reasoning network for few-shot metal generic surface defect segmentation. IEEE Trans. Instrum. Meas. 2021, 70, 1–11. [Google Scholar] [CrossRef]
Figure 1. Pictures of the four types of defects in the dataset.
Figure 1. Pictures of the four types of defects in the dataset.
Applsci 13 11344 g001
Figure 2. The structure of a Bi-level routing attention.
Figure 2. The structure of a Bi-level routing attention.
Applsci 13 11344 g002
Figure 3. The structure of C3-Faster.
Figure 3. The structure of C3-Faster.
Applsci 13 11344 g003
Figure 4. The structure of NWD.
Figure 4. The structure of NWD.
Applsci 13 11344 g004
Figure 5. Improved YOLOv5 model network structure.
Figure 5. Improved YOLOv5 model network structure.
Applsci 13 11344 g005
Figure 6. Improved model loss curves.
Figure 6. Improved model loss curves.
Applsci 13 11344 g006
Figure 7. Comparison of defect detection results by category.
Figure 7. Comparison of defect detection results by category.
Applsci 13 11344 g007
Figure 8. Comparison of detection effect of improved models.
Figure 8. Comparison of detection effect of improved models.
Applsci 13 11344 g008
Figure 9. Partial target of the NEU-DET dataset.
Figure 9. Partial target of the NEU-DET dataset.
Applsci 13 11344 g009
Table 1. Various types of improved model performance data.
Table 1. Various types of improved model performance data.
BiformerC3-FasterNWDCARAFEParametersGFLOPSP/%R/%AP50/%Weight/KBFPS/s
7,030,4171681.142.252.314,126131.58
7,296,14526.366.961.559.514,64997.09
5,799,88912.851.660.751.611,717117.65
7,030,4171668.458.456.114,126128.21
7,030,4171667.350.655.414,126129.87
6,206,04923.553.364.562.912,525101.01
√ indicates the application of this improvement.
Table 2. Various types of improved model performance data (NEU-DET).
Table 2. Various types of improved model performance data (NEU-DET).
BiformerC3-FasterNWDCARAFEParametersGFLOPSP/%R/%AP50/%Weight/KBFPS/s
7,030,4171681.142.252.314,126131.58
7,296,14526.366.961.559.514,64997.09
5,799,88912.851.660.751.611,717117.65
7,030,4171668.458.456.114,126128.21
7,030,4171667.350.655.414,126129.87
6,206,04923.553.364.562.912,525101.01
√ indicates the application of this improvement.
Table 3. Comparison of detection performance of mainstream algorithms in this dataset.
Table 3. Comparison of detection performance of mainstream algorithms in this dataset.
Algorithm NameAP50/%FPS/sWeight/MB
SSD24.945.398.0
RetinaNet46.622.734.9
FCOS48.517.985.2
YOLOv349.738.427.5
YOLOv450.043.135.4
YOLOv552.3131.613.8
YOLOv754.381.670.2
YOLOv5-Ours62.9101.0212.2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qu, Y.; Wang, C.; Xiao, Y.; Yu, J.; Chen, X.; Kong, Y. Optimization Algorithm for Surface Defect Detection of Aircraft Engine Components Based on YOLOv5. Appl. Sci. 2023, 13, 11344. https://doi.org/10.3390/app132011344

AMA Style

Qu Y, Wang C, Xiao Y, Yu J, Chen X, Kong Y. Optimization Algorithm for Surface Defect Detection of Aircraft Engine Components Based on YOLOv5. Applied Sciences. 2023; 13(20):11344. https://doi.org/10.3390/app132011344

Chicago/Turabian Style

Qu, Yi, Cheng Wang, Yilei Xiao, Jiabo Yu, Xiancong Chen, and Yakang Kong. 2023. "Optimization Algorithm for Surface Defect Detection of Aircraft Engine Components Based on YOLOv5" Applied Sciences 13, no. 20: 11344. https://doi.org/10.3390/app132011344

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop