Improved YOLOv3-Based Bridge Surface Defect Detection by Combining High- and Low-Resolution Feature Images

Teng, Shuai; Liu, Zongchao; Li, Xiaoda

doi:10.3390/buildings12081225

Open AccessArticle

Improved YOLOv3-Based Bridge Surface Defect Detection by Combining High- and Low-Resolution Feature Images

by

Shuai Teng

¹

,

Zongchao Liu

^2,* and

Xiaoda Li

^2,*

¹

School of Civil and Transportation Engineering, Guangdong University of Technology, Guangzhou 510006, China

²

School of Civil Engineering, Guangzhou University, Guangzhou 510006, China

^*

Authors to whom correspondence should be addressed.

Buildings 2022, 12(8), 1225; https://doi.org/10.3390/buildings12081225

Submission received: 18 July 2022 / Revised: 5 August 2022 / Accepted: 11 August 2022 / Published: 12 August 2022

(This article belongs to the Special Issue Structural Health Monitoring of Buildings, Bridges and Dams)

Download

Browse Figures

Versions Notes

Abstract

:

Automatic bridge surface defect detection is of wide concern; it can save human resources and improve work efficiency. The object detection algorithm, especially the You Only Look Once (YOLO) series of networks, has important potential in real-time object detection because of its fast detection speed, and it provides an efficient and automatic detection method for bridge surface defect detection. Hence, this paper employs an improved YOLOv3 network for detecting bridge surface defects (cracks and exposed rebar) and compares the effects of the advanced YOLOv2, YOLOv3 and faster region-based convolutional neural network (Faster RCNN) in bridge surface defect detection, and then two approaches (transfer learning and data augmentation) are used to improve the YOLOv3. The results confirm that, by combining high- and low-resolution feature images, the YOLOv3 improves the detection effect of the YOLOv2 (using single-resolution feature images); the average precision (AP) value of the improved YOLOv3 (0.9–0.91) is 6–10% higher than that of the YOLOv2 (0.83–0.86). Then, the anti-noise abilities of the YOLOv2 and YOLOv3 are studied by introducing white Gaussian noise, and the YOLOv3 is better than the YOLOv2. Simultaneously, the YOLO series of detectors perform better in detection speed; the detection speed of the improved YOLOv3 (FPS (frames per second) = 23.8) is 103 times that of the Faster RCNN (FPS = 0.23) with comparable mAP values (improved YOLOv3 = 0.91; Faster RCNN = 0.9). It is demonstrated that, in consideration of detection precision and speed, the proposed improved YOLOv3 is a decent detector for fast and real-time bridge defect detection.

Keywords:

YOLO network; surface defect; transfer learning; data augmentation; detection precision and speed

1. Introduction

The detection of surface defects in concrete structures is an important part of structural health monitoring (SHM) [1,2,3]. Bridge surface defects mainly include surface cracking, the exposure of rebars and corrosion; these defects affect the durability of concrete [4]. Therefore, it is necessary to detect the defects of concrete structures to prevent further damage. In traditional methods, visual observation is the most commonly used method for detecting bridge surface defects; the inspector collects images or videos through on-site optical instruments, processes the collected data and finally draws the inspection conclusion [5,6]. However, this method has some challenges in practical application, such as being personnel-intensive and time-consuming, and the detect conclusions are also subjective and unreliable [7]. To overcome these shortcomings, it is essential to develop more objective and automatic methods that avert human intervention. A more efficient and accurate approach for bridge defect detection is highly desired, and some comprehensive methods (e.g., computer vision technology) should be investigated thoroughly [8].

Computer vision technology provides a new method for defect detection with respect to improvement in work efficiency [9]. A large number of image processing methods can be used for defect detection, classification and evaluation [5], especially machine learning (ML) [10], which uses extracted image features to complete specific tasks, e.g., defect classification, damage level regression and so on. ML-based methods have been used to detect various types of defects, including cracks [11], spalling [12] and corrosion [13] in concrete structures. ML technology has significantly improved efficiency and robustness in defect detection compared with traditional image processing technology. As an algorithm of ML methods, the artificial neural network (ANN) [14] provides a new alternative for this situation. The traditional back-propagation neural network (BPNN) has achieved promising results in defect detection [15], though it has some inherent shortcomings (for instance, slow convergence, time-consuming, etc.) [16]. Therefore, more advanced algorithms are needed for automatic feature extraction and fast computational speed. Recently, the technology based on deep learning and object detection has received extensive attention in crack detection, significantly improving the knowledge level of crack detection technology.

(1): State-of-the-art deep learning studies.

Deep convolutional neural network (DCNN) algorithms can automatically extract features from raw data and gradually obtain advanced features through multiple processing layers [17,18] without using image preprocessing or/and feature pre-extracting. Furthermore, a DCNN uses partial connections and the pooling of neurons; thus, it requires less computation time and effectively prevents over-fitting, which makes the DCNN an effective SHM method. Due to its excellent learning ability, various data forms (vibration signal [19,20,21] and defect image [7,22,23]) have been investigated for SHM based on the DCNN [7,24]. Especially in the field of image processing [25], the DCNN can improve the accuracy and efficiency of image processing technology. In the field of crack detection, the DCNN is employed to detect cracks and noncracks [26]. In order to find the defect location, a basic method is to divide the image into small windows of uniform size and apply the DCNN to each small window. Based on the above method, Cha et al. proposed a DCNN to classify whether there are cracks in each small image [7]. Liang et al. transferred a well-known DCNN [27] to classify cracks and spalling [28]. However, the challenge of this small windows-based method is to find the appropriate window size when there are defects of different sizes. Moreover, the computational cost of this method is very high, as the DCNN classifier must be applied to many small windows (images). To improve the speed of detecting structural defects, more advanced object detection technology needs to be further explored.

(2): Two- and one-stage detection algorithms’ applications in crack detection.

Region-based classification or object detection provides an epoch-making method for SHM [29]. This method creates a bounding box around the region of interest (ROI), such as cracks, spalls, etc. Commonly used methods can be classified into two-stage and one-stage algorithms. Two-stage algorithms include region-based CNN (RCNN), Fast RCNN and Faster RCNN. Yeum et al. [30] used the RCNN in postevent building reconnaissance with an accuracy of nearly 60%. Xu et al. [31] employed the Fast RCNN to detect different concrete defects using bounding boxes. Liang [32] adopted a Faster RCNN algorithm, using bounding boxes to automatically detect the components of a bridge structure. Hou et al. [33] using an improved Mask RCNN detected ground-penetrating radar (GPR) signature. The one-stage algorithms include You Only Look Once (YOLO) and the single-shot multibox detector (SSD). Zhang et al. [5] employed the YOLOv3 to detect multiple concrete bridge damages; Hiroya et al. [34] used the SSD to detect road defects. Compared with the two-stage algorithm, these one-stage algorithms are greatly improved in computational speed; nevertheless, the precision is compromised to some extent [35]. Therefore, it is important to improve their precisions and maintain encouraging detection speeds.

(3): Contribution of this study.

The latest YOLO model (i.e., the YOLOv3 [36]) could achieve a comparable precision without sacrificing its computational efficiency. A recent study proved that the YOLOv3 is advanced in the detection of bridge surface defects [5] and another study was conducted in the crack detection of pavement [37]. These studies have achieved comparable precision to the Faster RCNN. However, the influence of the updated YOLO version on the detection precision of bridge surface defects is not clear. Recently, the YOLOv3 has been further developed, and some novel improvement strategies can significantly improve the detection performance. Therefore, in this study, the YOLOv2 and YOLOv3 were employed to detect two kinds of defects (cracks and exposed rebar) on the bridge surface, and the detection precision and speed were compared. Then, the YOLOv3 was improved by introducing transfer learning (retraining a well-known CNN model) to enhance the performance of the feature extractor; in addition, data augmentation technology was introduced into the training process of the network to improve the robustness of the YOLOv3 and the improvement effect was evaluated. Furthermore, the results were compared with those of the Faster RCNN.

2. Methods

As a one-stage detection network, the YOLO detector uses a DCNN model (Figure 1) to extract feature images from a certain feature extraction layer. Then, these feature images are input into the YOLO detection layer (subnet) and the object classification and anchor box are retured.

Usually, the architecture of a feature extractor is a CNN. A variety of CNNs, such as AlexNet and VGG networks, have been designed to classify specific objects (up to 1000 categories). These well-known networks have strong feature extraction abilities, which can be employed to complete the feature extraction tasks in this paper.

In this paper, we employ a lightweight model (SqueezeNet) as the feature extractor of the YOLO. The SqueezeNet has 18 layers (Figure 2), which contain a series of convolutional, pooling and ReLU layers [19]. Its classification accuracy is as high as that of AlexNet (the winner of the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) [38] in 2012) and its model size is smaller than 0.5 MB (which means less computing parameters). The SqueezeNet has been trained on abundant images (from the ImageNet database [25]) and can classify images into 1000 objects.

In this study, the YOLO detectors (including: YOLOv2 and YOLOv3) were established by using MATALB (MathWorks Inc., Natick, MA, USA). The training platform was performed on a computer with NVIDIA GTX GeForce 2060 GPU, Intel Core i7-4790 @ 3.60 GHz CPU, Windows 10. The YOLOv2 detects defects by using high-resolution or low-resolution (only single-resolution can be selected for a network model) feature images (acquired by a feature extraction network, e.g., SqueezeNet), respectively. Moreover, the YOLOv3 detects defects by combining high-resolution and low-resolution feature images (a network model can select multiple resolutions (two or more)). In order to further improve the detection performance of the YOLO, an improved YOLOv3 is proposed and its performance is compared.

3. YOLOv2

The YOLOv2 uses SqueezeNet as the feature extractor and selects the feature image of a certain layer as the input of the subnet (Subnet A or Subnet B in Figure 3). For the YOLOv2-HR (YOLOv2 with high-resolution feature images), the feature images of the 28 × 28 pixels are used as the input of the detection layer (Subnet A). Moreover, the YOLOv2 network returns the classification and location of the object.

The subnet of the YOLOv2 uses anchor boxes to detect the location and classification of the defined object. Each anchor box can extract a subimage from the original image, and each subimage can obtain a large number of feature images after the convolution layer. These feature images can obtain the category through the classification layer and obtain the location through the regression layer (the detection process of each anchor box is independent). The use of anchor boxes improves the performance for the detection process of an object detector [30] to detect objects in an image, and the selection of anchor boxes was realized by an automatic evaluation function (‘estimateAnchorBoxes’) of MATLAB (https://ww2.mathworks.cn/help/vision/ref/estimateanchorboxes.html, accessed on 10 August 2021). During detection (Figure 4), the feature images obtained by the feature extractor were used as the input of the subnet and then the anchor boxes were labeled into the image, and the mapping relationship between the anchor boxes and the defined image labels was formed through the neural network (the detection layer in Figure 4) composed of two convolution layers. The location error was gradually by reduced using minimizing the loss function. As shown in Figure 4, the coordinates of the predefined anchor box were (x₁, y₁), and those of the refined anchor box were (x₂, y₂). During training, the precise location of the anchor box was obtained by minimizing the loss function (squared error loss, SEL). Finally, tiled anchor boxes representing the background were removed and the anchor box with the object was retained.

S E L = {(x_{2} - x_{1})}^{2} + {(y_{2} - y_{1})}^{2}

(1)

4. YOLOv3

The YOLOv3 is also a single-stage CNN based on a feature extraction network, different from the YOLOv2 [39]. The YOLOv3 obtains feature images with different resolutions through one head subnet (Figure 5). By doing so, multiscale (resolution) features can be effectively extracted from the input image, thus exceeding the limit of single resolution on the detection results. In this study, the YOLOv3 constructed a two-level library of feature images with 28 × 28 and 14 × 14 pixels. Each resolution feature was used to detect targets of different scales (resolutions). The low-resolution feature images were used to detect large objects (obvious objects and high visibility images), while high-resolution feature images were used to detect small objects (difficult to observe objects and blurred images). After combining high- and low-resolution feature images, the YOLOv3 predicted objects’ locations and classifications.

5. Improved YOLOv3

The YOLO series of networks consists of a feature extractor (CNN) and a detection subnet. The quality of feature extraction will determine the detection performance of the subnet as the input of the subnet is the feature image obtained by the feature extractor. Therefore, this paper introduces a higher-performance feature extractor through transfer learning (retraining SqueezeNet with defect images) and applies the feature extractor to the YOLOv3. On the other hand, data augmentation is a way to improve the performance of the YOLOv3. Therefore, this paper improves the performance of the YOLOv3 by the following methods:

Transfer learning is the reuse of a pretrained model on a new problem. With transfer learning, we basically try to generalize what has been learned in one task to another one. We transfer the weights that a network (SqueezeNet) has learned at ‘task A’ to a new ‘task B’. The classes detected by the SqueezeNet network (e.g., drake, brain coral, slug, beacon, etc.) are not similar to the classification of the new task (bridge surface defects, e.g., a crack). This will affect the performance of feature extraction in bridge defect detection. Therefore, this paper proposes to use the defect images (from bridges) for transfer learning (retraining the SqueezeNet network) to obtain a defect classifier (Figure 6) and then use the classifier as the feature extractor of the YOLOv3. In total, 1000 images (including 500 crack and 500 exposed rebar images) were used as the transfer learning dataset, of which 90% were used for training and 10% were used to validate (evaluate) the effectiveness of the classifier.

Data augmentation technology was used to improve the training effect by randomly transforming the raw data during the training process. The following augmentations were applied to the training sample: (1) color jitter augmentation (Figure 7b); (2) random horizontal flip (Figure 7c) and (3) random scaling by 10 percent (Figure 7d). Data augmentation was implemented during the training process; each epoch of the network was randomly transformed by data augmentation technology, which improved the robustness of the network. This meant that the network would have better performance.

The experimental design is as follows:

The transfer learning network (SqueezeNet, retrained by bridge defect images) was used as the feature extractor of the YOLOv3, namely, improved YOLOv3-TL.
Data augmentation technology was used in the training process of the YOLOv3, namely, improved YOLOv3-DA.
The above two technologies were combined to improve the YOLOv3, namely, improved YOLOv3-TL-DA.

6. Experimental Setup and Performance Evaluation

An image dataset (for YOLO) consisted of 1660 RGB images (including crack images (namely, CR) and exposed rebar images (namely, ER)) of concrete bridges (along the Erenhot-Guangzhou and Guangzhou-Kunming Expressways; Figure 8a shows an example of a bridge) in China. Among them, 90% of the images were used for training and 10% for testing (evaluating) the performance of the YOLO, and all images were uniformly trimmed to 227 × 227 pixels. The ‘Image labeler’ toolbox in MATLAB was used to label defect objects (including the classification and anchor boxes of defects). Figure 8b shows two examples of the labeled images.

In this study, the detection of bridge surface defects was designed as the following three stages:

Stage 1: The performance of high- and low-resolution feature images on the YOLOv2 was compared. The feature images of the two layers (shown in Figure 3) were used as the inputs of the subnets (detection layers) of the YOLOv2-HR and YOLOv2-LR, respectively. After the network training was completed, the performance of the two networks was evaluated, respectively.

Stage 2: The effect of the YOLOv3 was evaluated by combining the feature images of high and low resolution (two layers (Figure 5)) as the input of the subnet (the detection layer) and compared with those of single resolution (Stage 1).

Stage 3: Transfer learning was used to improve the performance of the feature extractor which was further improved by introducing data augmentation technology in the training process. The testing data were input into the detector and the evaluation results were compared with those of Stage 1 and Stage 2.

In the real world, the images will be blurred by environmental factors (such as lighting conditions or weather), resulting in poor image quality. Therefore, this study used white Gaussian noise (WGN) to blur the testing samples (by using the ‘wgn’ function of MATLAB) and compared the detection effect under the influence of different-intensity noise. The intensity was set at 1~50 dB. Figure 9 is an example blurred by WGN; the image becomes more blurred as the intensity of the noise increases.

Detection speed and precision are important indexes to evaluate the performance of the model. The detection speed is the processing time of the single image (i.e., frames per second—FPS). Detection precision includes precision, recall and average precision (AP). Next is a detailed explanation of the three subindexes:

Precision represents the classification effect of the classifier, which is the ratio of true positive instances to the total positive instances (Equation (2)). Recall is the ratio of true positive instances to the sum of true positives and false negatives in the detector (Equation (3)).

P = \frac{T P}{T P + F P}

(2)

R = \frac{T P}{T P + F N}

(3)

where TP (true positive instance): positive instances that are predicted to be positive instances; FN (false negative instance): positive instances that are predicted to be negative instances; and FP (false positive instance): negative instances that are predicted to be positive instances.

The AP is used to reflect the detection precision, which is a combined outcome of precision and recall metrics.

A P = \sum_{k = 1}^{N} P (k) △ R (k)

(4)

where k = 1, 2, …, N and N is the number of samples, P(k) is the precision of the k-th sample and

Δ R (k)

is the difference of the recall between the k-th sample and (k − 1)-th sample. Furthermore, the mAP is the average of all classes:

m A P = \frac{\sum_{i}^{M} A P_{i}}{M}

(5)

where i = 1, 2, …, M and M is the number of classes.

7. Results and Discussion

7.1. Detection Results of the YOLOv2 Use of High- and Low-Resolution Feature Images

In order to obtain the testing results of the YOLOv2 for bridge surface defects, the training data (YOLO) of Section 6 were input into the YOLOv2 model, and the testing data (YOLO) were used to evaluate the detection effect. Figure 10a,b show the AP of each class (CR and ER) of the testing data, and Figure 11 is the partial detection results. (The anchor boxes are located in the corresponding area of the image. When the shape of the defect is complex, the anchor boxes appear as a certain overlapping area, which also confirms that the detection results of each anchor box in the subnet are independent.) Figure 10a,b are the detection results of the YOLOv2 using high-resolution feature images (YOLOv2-HR) and using low-resolution feature images (YOLOv2-LR) as the inputs of the detection layers, respectively. The AP values increased with the increase in epoch number and finally reached the optimal value. The mAP values of the YOLOv2-HR and YOLOv2-LR were 0.83 and 0.86, respectively. For YOLOv2-HR, the AP values of CR and ER were 0.78 and 0.87, respectively. For YOLOv2-LR, the AP values of CR and ER were 0.85 and 0.87, respectively. Therefore, using low-resolution feature images is helpful to improve the mAP. Interestingly, the detection of cracks (CR) has been greatly improved when using low-resolution feature images.

Figure 12a (YOLOv2-HR) and Figure 12b (YOLOv2-LR) show the variation in the AP values of the YOLOv2 under the influence of 1~50 dB noise. It can be found that the AP value gradually decreased with the increase in noise intensity and finally approached 0. Interestingly, the rate of descent of the CR curve was higher than that of ER. Therefore, it can be inferred that CR detection is more susceptible to noise. Interestingly, the comparisons of the YOLOv2-HR and YOLOv2-LR detection results (Figure 12c) indicate that the YOLOv2-HR had higher anti-noise ability, while the mAP of the YOLOv2-LR decreased faster than that of the YOLOv2-HR.

7.2. Detection Results of the YOLOv3

Firstly, all the training data (YOLO) were input into the original YOLOv3 and the testing data were input into the detector when the YOLOv3 finished training. The detection results are shown in Table 1. Compared with the YOLOv2, the AP of the YOLOv3 was improved. The AP of the original YOLOv3 was 2–6% higher than that of the YOLOv2. Therefore, it proves that the detection performance of the YOLO can be improved by combining low-resolution and high-resolution feature images.

To obtain a high-performance feature extractor, firstly, the training data (for transfer learning) described in Section 6 were used to train the transfer learning model (SqueezeNet). The testing data were used to evaluate the classification effect of the feature extractor network. Figure 13 shows the accuracy and loss in the process of network training. The validation results were encouraging (with 100% accuracy). In order to further improve the YOLOv3, the transfer learning model trained with the bridge surface defect database was used as the feature extractor of the YOLOv3-TL.

Then, all the training data (YOLO) were applied to the improved YOLOv3 model (improved YOLOv3-TL, improved YOLOv3-DA and improved YOLOv3-TL-DA). The testing results are shown in Table 2. The mAP values of the improved YOLOv3-TL, improved YOLOv3-DA and improved YOLOv3-TL-DA were 0.9, 0.9 and 0.91, respectively. Compared with using a single approach to the improved YOLOv3, the combination of the two approaches was only a small improvement (the AP increased by 1%). Nevertheless, compared with the original YOLOv3, the AP improved by 3%, while, compared with the YOLOv2, the AP improved by 6–10%.

The testing data (YOLO) with noise were input into the YOLOv3 (including the original YOLOv3, improved YOLOv3-TL, improved YOLOv3-DA and improved YOLOv3-TL-DA) and the testing results are shown in Figure 14. Figure 14a,b show the AP values of CR and ER under the influence of different intensities of noise, and Figure 14c shows the mAP of the YOLOv3. The results illustrate that the original YOLOv3 has the same anti-noise ability as the improved YOLOv3. Compared with the YOLOv2, the YOLOv3 has higher anti-noise ability than the YOLOv2. In Figure 15, the mAP descent point of the YOLOv3 is later than that of the YOLOv2.

8. Comparisons of Detection Precision and Speed

For detection precision, Table 3 shows the detection precision of the YOLOv2 and YOLOv3. The results show that:

(1): The original YOLOv3 and YOLOv2 have similar AP values, but after improvement, the detection AP of YOLOv3 is greatly improved.
(2): Through the two strategies of transfer learning and data augmentation, YOLOv3 can achieve a relatively ideal AP value.

For detection speed, a faster detection speed provides the possibility of real-time applications of the network. Table 4 shows the detection speeds of the YOLOv2 and YOLOv3. The results show that:

(1): The YOLOv2 is faster than the YOLOv3, which illustrates that the combination of high- and low-resolution feature images will increase the detection time.
(2): Moreover, the detection speed of the YOLOv2-HR is higher than that of the YOLOv2-LR. Therefore, the selection of the feature extraction layer has an impact on the detection speed and selecting a deeper feature extraction layer will increase the detection time of the YOLO.

Interestingly, the original YOLOv3 has the same detection speed as the improved YOLOv3. This shows that the improved method proposed in this paper does not change the computing time of the network. For a fair comparison, a comparative experiment was made with the Faster RCNN. The two improvement approaches (transfer learning and data augmentation) were employed with the Faster RCNN. The results showed that the mAP value of the Faster RCNN was 0.9, and its FPS value was 0.23. Therefore, the detection speed of the YOLOv2 is 145 times that of the Faster RCNN, and the detection speed of the YOLOv3 is 103 times that of the Faster RCNN, which is consistent with the results reported in the literature [40].

9. Conclusions

In this study, we investigated the detection performance (including the speed and precision) of the YOLOv2 and YOLOv3, and it was proved that the precision of the YOLO can be improved by combining high-resolution and low-resolution feature images. The improved YOLOv3 proposed in this paper showed excellent performance in detection precision without the loss of detection speed; meanwhile, its anti-noise ability was improved. Compared with the Faster RCNN, the improved YOLOv3 is more accurate and faster. It provides the possibility of the real-time detection of bridge defects. Based on the above research results, this paper comes to the following conclusions:

By using both high- and low-resolution feature images, the YOLOv3 (AP value exceeding 0.9) has better detection performance than the YOLOv2 (AP value about 0.83–0.86, using single resolution).
The precision ranking from high to low is the improved YOLOv3 (AP = 0.91), original YOLOv3 (AP = 0.88) and YOLOv2 (AP = 0.83–0.86).
Under the influence of noise, the YOLOv3 has better anti-noise ability than the YOLOv2.
Compared with the original YOLOv3, the improved YOLOv3 is more accurate without the compromise of the detection speed (FPS = 23.8).
The improved YOLOv3 (FPS = 23.8) is 103 times faster than the Faster RCNN (FPS = 0.23) and the precision (improved YOLOv3 = 0.91; Faster RCNN = 0.9) is comparable.

In consideration of the detection precision and speed, the proposed improved YOLOv3 has important application significance for fast and real-time bridge defect detection.

There are still some limitations in this paper. This paper only detects defects without further statistics of defects. The impact of these defects on bridge structure also needs to be further evaluated. Therefore, a future research direction could aim at automatic evaluation based on the detected defects, so as to accurately obtain the real state of the bridge and provide decision making for maintenance personnel. Furthermore, because the exposed steel of new and old concrete is inconsistent and the influence on structural damage is also inconsistent, the exposed steel detection of new and old concrete is also an important research direction, which is of great significance for the evaluation of bridge conditions.

Author Contributions

Conceptualization, S.T. and Z.L.; methodology, Z.L. and S.T.; software, X.L. and S.T.; validation, X.L. and S.T.; formal analysis, Z.L. and S.T.; investigation, Z.L. and X.L.; resources, S.T.; data curation, Z.L.; writing—original draft preparation, S.T.; writing—review and editing, Z.L.; visualization, X.L.; supervision, Z.L.; project administration, X.L.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yang, Q.; Shi, W.; Chen, J.; Lin, W. Deep convolution neural network-based transfer learning method for civil infrastructure crack detection. Autom. Constr. 2020, 116, 103199. [Google Scholar] [CrossRef]
Khan, N.; Saleem, M.R.; Lee, D.; Park, M.-W.; Park, C. Utilizing safety rule correlation for mobile scaffolds monitoring leveraging deep convolution neural networks. Comput. Ind. 2021, 129, 103448. [Google Scholar] [CrossRef]
Park, C. A Deep Learning-based detection of Fall Portents for Lone Construction Worker. In Proceedings of the 38th International Symposium on Automation and Robotics in Construction (ISARC), Dubai, United Arab Emirates, 2–4 November 2021; pp. 419–426. [Google Scholar]
Jahangir, H.; Khatibinia, M.; Kavousi, M. Application of Contourlet Transform in Damage Localization and Severity Assessment of Prestressed Concrete Slabs. J. Soft Comput. Civ. Eng. 2021, 5, 39–67. [Google Scholar] [CrossRef]
Zhang, C.; Chang, C.; Jamshidi, M. Concrete bridge surface damage detection using a single-stage detector. Comput. Civ. Infrastruct. Eng. 2019, 35, 389–409. [Google Scholar] [CrossRef]
Shen, J.; Xiong, X.; Li, Y.; He, W.; Li, P.; Zheng, X. Detecting safety helmet wearing on construction sites with bounding-box regression and deep transfer learning. Comput. Civ. Infrastruct. Eng. 2020, 36, 180–196. [Google Scholar] [CrossRef]
Cha, Y.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
Gao, W.; Kong, Q.; Lu, W.; Lu, X. High spatial resolution imaging for damage detection in concrete based on multiple wavelet decomposition. Constr. Build. Mater. 2021, 319, 126057. [Google Scholar] [CrossRef]
Shi, B.; Cao, M.; Wang, Z.; Ostachowicz, W. A directional continuous wavelet transform of mode shape for line-type damage detection in plate-type structures. Mech. Syst. Signal Process. 2021, 167, 108510. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, J.Y.; Liu, J.X.; Zhang, Y.; Chen, Z.P.; Li, C.G.; He, K.; Yan, R.B. Research on Crack Detection Algorithm of the Concrete Bridge Based on Image Processing. Procedia Comput. Sci. 2019, 154, 610–616. [Google Scholar] [CrossRef]
Nishikawa, T.; Yoshida, J.; Sugiyama, T.; Fujino, Y. Concrete Crack Detection by Multiple Sequential Image Filtering. Comput. Civ. Infrastruct. Eng. 2011, 27, 29–47. [Google Scholar] [CrossRef]
Dawood, T.; Zhu, Z.; Zayed, T. Machine vision-based model for spalling detection and quantification in subway networks. Autom. Constr. 2017, 81, 149–160. [Google Scholar] [CrossRef]
O’Byrne, M.; Schoefs, F.; Ghosh, B.; Pakrashi, V. Texture Analysis Based Damage Detection of Ageing Infrastructural Elements. Comput. Civ. Infrastruct. Eng. 2013, 28, 162–177. [Google Scholar] [CrossRef]
Jain, A.; Mao, J.; Mohiuddin, K. Artificial neural networks: A tutorial. Computer 1996, 29, 31–44. [Google Scholar] [CrossRef]
Xu, H.; Humar, J. Damage Detection in a Girder Bridge by Artificial Neural Network Technique. Comput. Civ. Infrastruct. Eng. 2006, 21, 450–464. [Google Scholar] [CrossRef]
Teng, S.; Chen, G.; Wang, S.; Zhang, J.; Sun, X. Digital image correlation-based structural state detection through deep learning. Front. Struct. Civ. Eng. 2022, 16, 45–56. [Google Scholar] [CrossRef]
Hao, X.; Zhang, G.; Ma, S. Deep Learning. Int. J. Semant. Comput. 2016, 10, 417–439. [Google Scholar] [CrossRef]
Li, Y.; Wei, H.; Han, Z.; Huang, J.; Wang, W. Deep Learning-Based Safety Helmet Detection in Engineering Management Based on Convolutional Neural Networks. Adv. Civ. Eng. 2020, 2020, 9703560. [Google Scholar] [CrossRef]
Teng, S.; Chen, G.; Gong, P.; Liu, G.; Cui, F. Structural damage detection using convolutional neural networks combining strain energy and dynamic response. Meccanica 2019, 55, 945–959. [Google Scholar] [CrossRef]
Teng, S.; Chen, G.; Liu, G.; Lv, J.; Cui, F. Modal Strain Energy-Based Structural Damage Detection Using Convolutional Neural Networks. Appl. Sci. 2019, 9, 3376. [Google Scholar] [CrossRef]
Lin, Y.-Z.; Nie, Z.-H.; Ma, H.-W. Structural Damage Detection with Automatic Feature-Extraction through Deep Learning. Comput. Civ. Infrastruct. Eng. 2017, 32, 1025–1046. [Google Scholar] [CrossRef]
Gao, Y.; Mosalam, K.M. Deep Transfer Learning for Image-Based Structural Damage Recognition. Comput. Civ. Infrastruct. Eng. 2018, 33, 748–768. [Google Scholar] [CrossRef]
Kumar, S.S.; Abraham, D.M.; Jahanshahi, M.R.; Iseley, T.; Starr, J. Automated defect classification in sewer closed circuit television inspections using deep convolutional neural networks. Autom. Constr. 2018, 91, 273–283. [Google Scholar] [CrossRef]
Rafiei, M.H.; Adeli, H. A novel machine learning-based algorithm to detect damage in high-rise building structures. Struct. Des. Tall Spéc. Build. 2017, 26, e1400. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
Zhang, A.; Wang, K.C.P.; Li, B.; Yang, E.; Dai, X.; Peng, Y.; Fei, Y.; Liu, Y.; Li, J.Q.; Chen, C. Automated Pixel-Level Pavement Crack Detection on 3D Asphalt Surfaces Using a Deep-Learning Network. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 805–819. [Google Scholar] [CrossRef]
Kaur, T.; Gandhi, T.K. Automated Brain Image Classification Based on VGG-16 and Transfer Learning. In Proceedings of the 2019 International Conference on Information Technology (ICIT), Bhubaneswar, India, 19–21 December 2019. [Google Scholar]
Liang, Y.; Bing, L.; Wei, L.; Liu, Z.; Xiao, J. Deep Concrete Inspection Using Unmanned Aerial Vehicle Towards CSSC Database. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, BC, Canada, 24–28 September 2017; pp. 24–28. [Google Scholar]
Zhang, C.; Chang, C.-C.; Jamshidi, M. Simultaneous pixel-level concrete defect detection and grouping using a fully convolutional model. Struct. Health Monit. 2021, 20, 2199–2215. [Google Scholar] [CrossRef]
Yeum, C.M.; Dyke, S.; Ramirez, J. Visual data classification in post-event building reconnaissance. Eng. Struct. 2018, 155, 16–24. [Google Scholar] [CrossRef]
Xu, Y.; Wei, S.; Bao, Y.; Li, H. Automatic seismic damage identification of reinforced concrete columns from images by a region-based deep convolutional neural network. Struct. Control Health Monit. 2019, 26, e2313. [Google Scholar] [CrossRef]
Liang, X. Image-based post-disaster inspection of reinforced concrete bridge systems using deep learning with Bayesian optimization. Comput. Civ. Infrastruct. Eng. 2018, 34, 415–430. [Google Scholar] [CrossRef]
Hou, F.; Lei, W.; Li, S.; Xi, J.; Xu, M.; Luo, J. Improved Mask R-CNN with distance guided intersection over union for GPR signature detection and segmentation. Autom. Constr. 2020, 121, 103414. [Google Scholar] [CrossRef]
Hiroya, M.; Yoshihide, S.; Toshikazu, S.; Takehiro, K.; Hiroshi, O. Road Damage Detection and Classification Using Deep Neural Networks with Smartphone Images. Comput. Aided Civ. Infrastruct. Eng. 2018, 33, 1127–1141. [Google Scholar]
Kumar, S.S.; Abraham, D.M. A Deep Learning Based Automated Structural Defect Detection System for Sewer Pipelines. Comput. Civ. Eng. 2019, 34, 226–233. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Liu, J.; Yang, X.; Lau, S.; Wang, X.; Luo, S.; Lee, V.C.; Ding, L. Automated pavement crack detection and segmentation based on two-step convolutional neural network. Comput. Civ. Infrastruct. Eng. 2020, 35, 1291–1305. [Google Scholar] [CrossRef]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Teng, S.; Liu, Z.; Chen, G.; Cheng, L. Concrete Crack Detection Based on Well-Known Feature Extractor Model and the YOLO_v2 Network. Appl. Sci. 2021, 11, 813. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]

Figure 1. The YOLO architecture.

Figure 2. The SqueezeNet architecture.

Figure 3. The YOLOv2 architecture.

Figure 4. The detection process of the subnet.

Figure 5. The YOLOv3 architecture.

Figure 6. Transfer learning for feature extractor of the YOLOv3.

Figure 7. Data augmentation. (a) Original image; (b) color jitter augmentation; (c) random horizontal flip; (d) random scaling by 10 percent.

Figure 8. Xinqing viaduct and some labeled images using the ‘Image labeler’ toolbox. (a) Bridge appearance; (b) image annotation.

Figure 9. The CR image blurred by WGN.

Figure 10. Detection results of the YOLOv2 with high- and low-resolution feature images: (a) using high-resolution feature images; (b) using low-resolution feature images.

Figure 11. Partial detection results of CR and ER.

Figure 12. The detection results of the YOLOv2 in the influence of noise: (a) using high-resolution feature image; (b) using low-resolution feature image; (c) the mAP of the YOLOv2-HR and YOLOv2-LR.

Figure 13. The training process of transfer learning.

Figure 14. Detection results of the YOLOv3 with influence of noise. (a) CR; (b) ER; (c) the mAP of CR and ER.

Figure 15. Comparisons of the YOLOv2 and YOLOv3 with influence of WGN.

Table 1. Detection results of original YOLOv3 and YOLOv2.

Defects	Networks
Defects	YOLOv3	YOLOv2-HR	YOLOv2-LR
CR	0.84	0.78	0.85
ER	0.92	0.87	0.87
Average	0.88	0.83	0.86

Table 2. Detection results of the improved YOLOv3.

	Networks
Defects	Improved YOLOv3-TL	Improved YOLOv3-DA	Improved YOLOv3-TL-DA
CR	0.87	0.85	0.88
ER	0.92	0.95	0.94
Average	0.9	0.9	0.91

Table 3. Detection precision (AP) of the YOLOv2 and YOLOv3.

Defects	YOLOv2		YOLOv3
Defects	YOLOv2-HR	YOLOv2-LR	Original YOLOv3	Improved YOLOv3-TL	Improved YOLOv3-DA	Improved YOLOv3-TL-DA
CR	0.78	0.85	0.84	0.87	0.85	0.88
ER	0.87	0.87	0.92	0.92	0.95	0.94
Average	0.83	0.86	0.88	0.9	0.9	0.91

Table 4. Detection speed (FPS) of the YOLOv2 and YOLOv3.

Models	YOLOv2		YOLOv3
Models	YOLOv2-HR	YOLOv2-LR	Original YOLOv3	Improved YOLOv3-TL	Improved YOLOv3-DA	Improved YOLOv3-TL-DA
FPS	33.3	26.3	23.8	23.8	23.8	23.8

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Teng, S.; Liu, Z.; Li, X. Improved YOLOv3-Based Bridge Surface Defect Detection by Combining High- and Low-Resolution Feature Images. Buildings 2022, 12, 1225. https://doi.org/10.3390/buildings12081225

AMA Style

Teng S, Liu Z, Li X. Improved YOLOv3-Based Bridge Surface Defect Detection by Combining High- and Low-Resolution Feature Images. Buildings. 2022; 12(8):1225. https://doi.org/10.3390/buildings12081225

Chicago/Turabian Style

Teng, Shuai, Zongchao Liu, and Xiaoda Li. 2022. "Improved YOLOv3-Based Bridge Surface Defect Detection by Combining High- and Low-Resolution Feature Images" Buildings 12, no. 8: 1225. https://doi.org/10.3390/buildings12081225

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved YOLOv3-Based Bridge Surface Defect Detection by Combining High- and Low-Resolution Feature Images

Abstract

1. Introduction

2. Methods

3. YOLOv2

4. YOLOv3

5. Improved YOLOv3

6. Experimental Setup and Performance Evaluation

7. Results and Discussion

7.1. Detection Results of the YOLOv2 Use of High- and Low-Resolution Feature Images

7.2. Detection Results of the YOLOv3

8. Comparisons of Detection Precision and Speed

9. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI