Interpretability Analysis of Convolutional Neural Networks for Crack Detection

Wu, Jie; He, Yongjin; Xu, Chengyu; Jia, Xiaoping; Huang, Yule; Chen, Qianru; Huang, Chuyue; Dadras Eslamlou, Armin; Huang, Shiping

doi:10.3390/buildings13123095

Open AccessArticle

Interpretability Analysis of Convolutional Neural Networks for Crack Detection

by

Jie Wu

^1,†

,

Yongjin He

^2,†,

Chengyu Xu

³,

Xiaoping Jia

³,

Yule Huang

²,

Qianru Chen

^4,*,

Chuyue Huang

¹,

Armin Dadras Eslamlou

²

and

Shiping Huang

^2,*

¹

School of Civil Engineering and Architecture, Wuhan Polytechnic University, Wuhan 430023, China

²

School of Civil Engineering and Transportation, South China University of Technology, Guangzhou 510640, China

³

China Railway 17th Bureau Group (Guangzhou) Co., Ltd., Guangzhou 510799, China

⁴

School of Electronics and Information Engineering, South China Normal University, Foshan 528225, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Buildings 2023, 13(12), 3095; https://doi.org/10.3390/buildings13123095

Submission received: 20 November 2023 / Revised: 10 December 2023 / Accepted: 11 December 2023 / Published: 13 December 2023

(This article belongs to the Special Issue Soft Computing for Structural Health Monitoring)

Download

Browse Figures

Versions Notes

Abstract

:

Crack detection is an important task in bridge health monitoring, and related detection methods have gradually shifted from traditional manual methods to intelligent approaches with convolutional neural networks (CNNs) in recent years. Due to the opaque process of training and operating CNNs, if the learned features for identifying cracks in the network are not evaluated, it may lead to safety risks. In this study, to evaluate the recognition basis of different crack detection networks; several crack detection CNNs are trained using the same training conditions. Afterwards, several crack images are used to construct a dataset, which are used to interpret and analyze the trained networks and obtain the learned features for identifying cracks. Additionally, a crack identification performance criterion based on interpretability analysis is proposed. Finally, a training framework is introduced based on the issues reflected in the interpretability analysis.

Keywords:

structural health monitoring; crack detection; interpretability analysis; convolutional neural network

1. Introduction

Crack detection methods based on CNNs are efficient and intelligent crack detection methods, which have unique advantages. The main working principle of CNNs is to extract features from an image through continuous convolution operations and, after completing the feature extraction, to rely on the analysis of feature vectors by neurons to determine whether there are cracks in the image. This crack recognition method not only has high efficiency and accuracy, but also uses unmanned aerial vehicles and other equipment to collect images from locations that are difficult for inspectors to reach. This saves manpower and avoids safety issues for inspectors [1,2,3].

In recent years, with the development of CNNs [4,5,6,7,8,9], many researchers have conducted extensive research on crack detection based on CNNs [10,11,12,13,14,15]. Dorafshan et al. [16] evaluated the performance of six edge recognizers and one convolutional neural network-based method. Their results showed that the accuracy and recall of convolutional neural network detection methods were much higher than those of edge recognizers in the same dataset. Silva et al. [10] developed an image classification algorithm for identifying concrete cracks based on VGG16, which can greatly eliminate the impact of surface brightness, roughness, and humidity on the detection results of concrete. Gopalakrishnan et al. [17] applied a pre-trained depth CNN model for transfer learning of crack images and combined this with UAV technology to detect crack damage in civil infrastructure images. A large amount of research has focused on optimizing network structure, optimizing training methods, and optimizing recognition performance, but few studies have evaluated the basis of structural recognition to see if the features learned by the network are the crack features we hope to learn.

The process of providing recognition results from the image input to the network is opaque. In other words, we cannot directly understand the basis for the network’s judgment. CNNs may make errors during the training process, resulting in the learning of incorrect features. This can lead to serious consequences in practice. Therefore, it is crucial to evaluate the basis for identifying cracks before applying a trained crack recognition network to practical engineering applications.

Geetha and Sim [18] proposed an efficient, pixel-level classification approach based on explainable AI. They performed image binarization as a preliminary step and employed a Fourier-based 1D CNN model to identify crack candidate regions. Subsequently, a combination of CNN explainability and a t-distributed Stochastic Neighborhood Embedding (t-SNE) technique was applied to extract cracks without the need for pixel-level labeling. Cardellicchio et al. [19] investigated the interpretability capability of eight different deep CNNs in recognizing defects in RC bridge heritage. They utilized Class Activation Maps (CAMs) on a database of images containing seven types of real defects. For this purpose, they applied GradCAM and GradCAM++ techniques to observe the activation zones and proposed two indexes as new metrics to quantitatively inspect the bridges in a fast and reliable manner. Kavitha et al. [20] applied transfer learning for crack detection in 40,000 images of concrete surfaces. To enhance the explainability of their models, they utilized the Local Interpretable Model-Agnostic Explanations (LIMEs) technique, which indicates super pixels in the image that increase the likelihood of belonging to the crack class. According to their results, the transfer model of inceptionV3 provides higher accuracy and precision.

This article aims to analyze the recognition basis of convolutional neural networks for identifying various types of cracks, in order to ensure that the recognition results provided by the network are accurate and reliable.

2. Interpretability of CNNs

2.1. Convolutional Neural Network Interpretation Algorithm

The neural network model approximates the relationships between variables in the dataset through nonlinearity, non-monotonicity, and non-polynomials, which makes its internal operating principles highly opaque. Neural networks often obtain correct results in training due to incorrect reasons, resulting in models performing well in training but not well in practice [21,22,23,24]. If the network is only treated as a black box model without evaluating its judgment basis, it often buries hidden dangers [25,26,27]: due to its strong learning ability in training, the neural network may not learn the crack information in the image, but may mistakenly consider the material of background concrete, shadows present in the background, etc., as the crack information we want it to learn.

Therefore, we must evaluate the basis for judging cracks and determine whether the network has learned the correct method for judging cracks. Directly evaluating the basis for determining cracks can also analyze, to a certain extent, the degree to which the network is affected by background factors such as light and water stains.

2.1.1. Grad-CAM

The Grad-CAM convolutional neural network interpretation algorithm [28] is based on gradient and backpropagation methods. It backpropagates layer by layer to the input space based on the output results provided by the network, generating feature images of the same size as the input image to describe the region of interest and the corresponding degree of interest of the network towards the input image.

Convolutional neural networks extract features from images through layers of convolutional layers, and as the convolutional layers deepen, the extracted features become more abstract. The feature map output by the convolutional layer can retain the position information of features in the images. The Grad-CAM algorithm considers both the hierarchical and positional information of features, and chooses to focus on the impact of the feature map output from the last convolutional layer on its prediction results. The algorithm process is shown in Figure 1. The upper half of the figure shows the process of obtaining prediction results using convolutional neural networks, while the lower half shows the process of using the Grad-CAM algorithm. According to the prediction results of the network (classes score), it selects the class y_c with the highest score, and makes it calculate the partial derivative of the last convolutional layer output element by element, so as to obtain the gradient tensor of the feature map. Global average pooling is performed on each channel of the gradient tensor to obtain one-dimensional weight tensors. The length of a one-dimensional weight tensor is equal to the number of channels in the feature map, and the elements in the weight tensor are the weights of the corresponding position channels in the feature map

ω_{k}^{c}

. By linearly combining the weight tensor with the feature map, a single channel image is obtained. This image is then scaled to the size of the input image through the ReLu function and is transformed into a three channel image, resulting in a heat map of the input image. The highlighted parts in the heat map are the main basis for determining the input image category using convolutional neural networks for classification.

a_{k}^{c} = \frac{1}{Z} \sum_{i} \sum_{j} \frac{\partial y^{c}}{\partial A_{i j}^{k}}

(1)

L_{G r a d - C A M}^{c} = Re L U (\sum_{k} a_{k}^{c} A^{k})

(2)

This method has the explanatory power of fine-grained networks and has strong universality, but it may have the problem of vanishing gradients during backpropagation, leading to interpretation failure. The disappearance of the gradient can occur due to various reasons. One common factor that causes vanishing gradients in deep architectures is the multiplication of gradients during the backpropagation stage. Consequently, if the weight of the layers is low, the gradients can decrease to values close to zero.

2.1.2. Score-CAM

Score-CAM [29] shares the same approach as Grad-CAM, using the method of linearly weighted additions of each channel of the feature map output from the convolutional layer to obtain the region of interest of the network. The main difference lies in the way linear weights are obtained. In order to address the problem of vanishing gradients with Grad-CAM, Score-CAM breaks away from gradients and adopts the global confidence score of the model for feature maps to measure linear weights.

Score-CAM mainly consists of two stages: the first stage is to extract the feature maps’ output by the convolutional layer; the second stage of work is to upsample and restore the feature maps of each channel extracted in the first stage to the size of the input image, ultimately using them as image masks to cover the original image. This coverage map is re-input into the network and the corresponding category scores obtained from the coverage map are used as the weights of the channel. Finally, the weight vector is multiplied by the corresponding channels of the feature map, and the sum is calculated to obtain a small feature map with the same size and one channel number. The feature map is enlarged to the original image size to obtain a thermal map that reflects the network’s area of interest.

Using this method can avoid the problems caused by using gradients and focus more on the targets, effectively reducing the impact of noise in the background.

3. Results

In this section, five networks were examined for the basis of crack detection: 1. the classic VGG16; 2. replacing VGG16’s flattening with a global average pooling (marked as VGG16_gap); 3. removing a convolution block of VGG16 (marked as VGG4); 4. replacing VGG4’s flattening with global average pooling (marked as VGG4_gap); 5. replacing VGG3_gap’s flattening with global average pooling (marked as VGG3).

Calculate the accuracy and recall metrics for each network using a test set containing 488 crack images and 199 non-crack images. The metrics for each network are shown in Table 1.

3.1. Interpretability Analysis of Convolutional Neural Networks

This section will simultaneously use Grad-CAM and Score-CAM to test and analyze the crack image discrimination criteria of the five crack recognition networks trained earlier.

This section sets up a crack image with a sample size of 50 to be inputted into the network trained earlier for the interpretability analysis. The aim is to elucidate the basis on which the network identifies cracks, thereby preventing the potential embedding of erroneous features learned during the training process that could pose risks in subsequent practical engineering applications. The results of the Grad-CAM interpretability analysis of some images are shown in Figure 2. To facilitate the comparison of crack recognition performance among different networks, the recognition results of the same crack image will be used for display and analysis.

(1): VGG16

The crack recognition network presents the crack discrimination criteria under two interpretation algorithms, as shown in Figure 3.

From the figure, it can be seen that under both interpretable algorithms, the VGG16 structural crack recognition network can accurately identify the presence of cracks in the image through the crack information in the image.

(2): VGG16_gap

The crack recognition network presents the crack discrimination criteria under two interpretation algorithms, as shown in Figure 4.

From the figure, it can be seen that, under two interpretable algorithms, changing the flattening of VGG16 to a global average pooling layer network structure can also accurately identify the presence of cracks in the image through crack information. But at the same time, it was noted that, compared to VGG16, the original structure, as shown by Grad-CAM, reduces the image area of interest to a certain extent. However, it still covers the cracks present in the image. The reason for this situation is that using a global average pooling layer can cause a certain degree of feature loss, but this feature loss does not affect the correct learning and recognition of crack features by the crack recognition network. Therefore, it also indicates that using a global flat pooling layer instead of a fully connected layer to reduce the computational burden of the network is reasonable and effective.

(3): VGG4

The crack recognition network presents the crack discrimination criteria under two interpretation algorithms, as shown in Figure 5.

In VGG4, there was a problem with gradient vanishing when using the same crack images as the interpretability analysis of the previous two structures for Grad-CAM analysis. Although the network can still correctly identify the presence of cracks in the image, the output Grad-CAM thermal map cannot provide the location of the region of interest based on which the network makes a judgment. Using the Score-CAM interpretation algorithm, it can be seen that the basis for making judgments about the image network is indeed the crack information in the image. In addition, it can be observed that the main focus area of the network on cracks is no longer the center of the image, but the entire crack. This may be because the VGG16 partial weight for crack recognition is used for initialization.

(4): VGG4_gap

The crack recognition network presents the crack discrimination criteria under two interpretation algorithms, as shown in Figure 6.

From the figure, it can be seen that both interpretation algorithms reflect that the structure can accurately identify cracks in the image, but the regions of interest reflected by Score-CAM are relatively concentrated.

(5): VGG3

The crack recognition network presents the crack discrimination criteria under two interpretation algorithms, as shown in Figure 7.

This result reflects that the network only learns the non-crack, but not the crack, features. We hope to obtain a crack recognition network after training, and although this network structure performed well in both the validation and training sets, apparently it did not learn the fracture characteristics. If interpretability analysis is not performed on it and it is directly applied to crack identification work as a black box, it will cause significant safety hazards, and, to a large extent, it indicates that interpretability analysis of convolutional neural networks is important and necessary in the high-security task of crack detection.

In response to the problems encountered during the training process discovered by the interpretation algorithm, the background information of the crack image is relatively single and mostly features concrete. This article takes weakening the diversity of background information as the starting point to solve such training problems.

3.2. Optimized Training Methods

Based on the analysis in Section 3.1, VGG3 does not capture the crack features in the image well, but focuses more on the features in the background of the crack. Therefore, in order to weaken the influence of the background on the network weight and strengthen the crack features in the image, we used the following method. We selected 30 1024 × 1024 crack images with a similar background of 1024 and divided them into an average of 16 small images per image. After classifying and filtering the 320 images obtained, a dataset consisting of 236 cracked images and 204 non-cracked images was formed. Some of the images are shown in Figure 8.

Due to the small size of this dataset, the generalization ability of the model trained solely from this dataset is low and not suitable for practical engineering. Therefore, the network model after the training is completed will use the freezing training and unfreezing training method to carry out transfer learning training from the crack dataset to enhance the generalization ability of the network model. In this section, we use the small crack dataset for VGG3. After pre-training, we use the crack dataset in 1.1 to conduct transfer learning training on the network model to improve the generalization ability of the model.VGG3’s transfer learning has a total of 60 epochs, of which the first 35 epochs are frozen training and the last 25 epochs are unfrozen training. The data for network model training are shown in Figure 9.

After comparison, it was found that the model of the 10th epoch had the best overall ability to capture cracks and identify them accurately.

The region of interest obtained by improving the training method of VGG3 using the same image containing cracks and performing interpretability analysis on it is shown in Figure 10. This network model can roughly capture the edge feature information of cracks through improved training methods and complete the identification of cracks in the image based on the extracted features. Although there may still be a certain weight of attention paid to the background features of cracks, the detection accuracy of this structure for images with cracks is 98.8%, and the missed detection rate is only 1.2%. This is in line with the idea of detecting as many cracks in the structure as possible and minimizing missed inspections in practical engineering.

After improving the training method of learning crack features through small datasets and enhancing the generalization ability of the model through large datasets, two different interpretation algorithms were used to test the two networks that had problems in the previous training. It can be seen that the network structure can accurately learn the features of cracks and identify cracks in the image based on them. It shows that the training method is effective.

3.3. Evaluation Indicators for Crack Recognition Network Based on GradCAM

Based on the aforementioned Grad-CAM analysis, taking into account the uneven quality of crack images captured in practical engineering applications and the importance of evaluating the basis for identifying cracks in crack recognition networks, this paper proposes an evaluation index for crack recognition networks based on the interpretability algorithm of Grad-CAM convolutional neural networks. The crack recognition network can accurately complete the task of identifying crack images based on the crack information in the image, and the minimum ratio of crack information to the total information in the image is required. Generally speaking, this means that the crack recognition network can identify how small cracks are in the image.

For the purpose of comparison, this article has designed the following calculation example to calculate this indicator:

For the thinning of cracked areas using erode algorithms, the areas with more corrosion are filled with random pixel values of background pixels. As shown in Figure 11, the name of the image refers to the proportion of crack information in the image to all information.

Input this dataset into the network, select the image with the smallest proportion of cracks that can be identified by the crack recognition network, and perform Grad-CAM interpretation analysis on this image to see if the network correctly identifies the cracks present in the image. If the network can recognize a crack image based on the crack information in the image, the proportion of the crack information in the image to the total image information is used as the minimum recognition proportion indicator of the network. The higher the value of this indicator, the finer the cracks in the image that the network can recognize, and the better the crack recognition performance of the network.

(1): VGG16

The network identified all 23 images in the indicator calculation dataset, with the corresponding crack information accounting for the lowest proportion of 0.32% of the image information. The Grad-CAM analysis of the network is shown in Figure 12, which can accurately identify the remaining crack information after image processing.

(2): VGG16_gap

The network identified all 23 images in the indicator calculation dataset, with the corresponding crack information accounting for the lowest proportion of 0.32% of the image information. The Grad-CAM analysis of the network is shown in Figure 13, which can accurately identify the remaining crack information after image processing.

(3): VGG4

The network identified all 23 images in the indicator calculation dataset, with the corresponding crack information accounting for the lowest proportion of 0.32% of the image information. The Grad-CAM analysis of the network is shown in Figure 14, which can accurately identify the remaining crack information after image processing.

(4): VGG4_gap

The network identified all 23 images in the indicator calculation dataset, with the corresponding crack information accounting for the lowest proportion of 0.32% of the image information. The Grad-CAM analysis of the network is shown in Figure 15, which can accurately identify the remaining crack information after image processing.

(5): VGG3

The network failed to fully recognize all 23 images in the indicator calculation dataset, and the corresponding crack information accounted for the lowest proportion of image information at 0.42%. The Grad-CAM analysis of the network is shown in Figure 16. It can be seen that, for images with crack information accounting for 0.32% of the total image information, the recognition network focuses more on the background of the entire image and, based on this, it is believed that there are no cracks in the image. After the crack information is appropriately increased, the recognition network can identify the small cracks that exist in the image and determine the presence of cracks in this image based on this. From this, it can be seen that VGG3’s ability to identify small cracks is slightly inferior to that of the aforementioned networks.

4. Conclusions

This article aims to address the issue of learning irrelevant features for convolutional neural networks in crack recognition. Two different convolutional neural network interpretation algorithms are used to evaluate the basis for network recognition of cracks. Some problems in crack recognition networks are identified and a solution is proposed. The main conclusions are as follows:

Some crack recognition networks have the problem of learning background features as crack features. This type of network does not have the ability to identify cracks, and direct application in engineering may cause missed identification problems, thus burying safety risks. Therefore, it is necessary to evaluate the basis for identifying cracks through the crack recognition network.
This article proposes a solution for optimizing training methods to address the problem of network error learning features. The optimized training method first uses a small dataset with a single background to train the network’s ability to recognize crack features, and then uses a large dataset to increase the network’s generalization ability. This method successfully solved this type of problem.
This article proposes an index based on a convolutional network interpretation algorithm to evaluate the crack recognition performance of crack detection networks based on the amount of crack information contained in the image. And based on this indicator, a calculation example was designed. Based on the analysis of the results of the calculation example, the crack recognition networks trained in this article can at least recognize small cracks that only account for 0.32% of the total image information. However, the VGG3 network’s recognition ability for small cracks in the image is not as good as that of other networks, possibly due to the lack of a convolutional block in VGG3 compared to other networks and partial feature loss caused by using global average pooling.

Author Contributions

Conceptualization, S.H.; methodology, S.H.; software, Y.H. (Yongjin He); validation, C.X. and X.J.; data curation, Y.H. (Yongjin He), Y.H. (Yule Huang) and C.H.; writing—original draft preparation, Y.H. (Yongjin He); writing—review and editing, J.W., A.D.E., S.H. and Q.C.; supervision, S.H.; funding acquisition, J.W. and A.D.E. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Hubei Provincial Department of Education Program (No. Q20221606), the Department of Housing and Urban-Rural Development of Hubei Province (Urban and rural construction and development-202001), the Scientific research project of Wuhan Polytechnic University Grant 2021Y047, the Fundamental Research Funds for the Central Universities (2023ZYGXZR089), the Science and Technology Planning Project of Guangdong Province (Foreign Experts Program of the Department of Science and Technology of Guangdong Province, China), and the Special Construction Fund of the Faculty of Engineering (No. 46201503).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available from the second author upon reasonable request.

Conflicts of Interest

Authors Chengyu Xu and Xiaoping Jia were employed by the company China Railway 17th Bureau Group (Guangzhou) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Tao, X.; Hou, W.; Xu, D. A survey of surface defect detection methods based on deep learning. Acta Autom. Sin. 2021, 47, 1017–1034. [Google Scholar]
He, S.; Wang, A.; Zhu, Z.; Zhao, Y. Research Progress on Intelligent Detection Technologies of Highway Bridges. China J. Highw. Transp. 2021, 34, 12–24. [Google Scholar]
Wang, L.; Wang, Q.; Zhu, Z.; Zhao, Y. Current Status and Prospects of Research on Bridge Health Monitoring Technology. China J. Highw. Transp. 2021, 34, 26–45. [Google Scholar]
Lin, M.; Chen, Q.; Yan, S. Network in Network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.-C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Silva, W.; Lucena, D. Concrete Cracks Detection Based on Deep Learning Image Classification. Proceedings 2018, 2, 489. [Google Scholar]
Xu, H.; Su, X.; Wang, Y.; Cai, H.; Cui, K.; Chen, X. Automatic Bridge Crack Detection Using a Convolutional Neural Network. Appl. Sci. 2019, 9, 2867. [Google Scholar] [CrossRef]
Cha, Y.J.; Choi, W.; Suh, G.; Mahmoudkhani, S.; Büyüköztürk, O. Autonomous Structural Visual Inspection Using Region based Deep Learning for Detecting Multiple Damage Types. Comput. Aided Civ. Infrastruct. Eng. 2018, 33, 731–747. [Google Scholar] [CrossRef]
Liu, Y.; Yao, J.; Lu, X.; Mahmoudkhani, G.; Buyukozturk, O. DeepCrack: A Deep Hierarchical Feature Learning Architecture for Crack Segmentation. Neurocomputing 2019, 338, 139–153. [Google Scholar] [CrossRef]
Fan, Z.; Li, C.; Chen, Y.; Wei, J.; Loprencipe, G.; Chen, X.; Di Mascio, P. Automatic crack detection on road pavements using encoder-decoder architecture. Materials 2020, 13, 2960–2974. [Google Scholar] [CrossRef] [PubMed]
Knig, J.; Jenkins, M.D.; Mannion, M.; Barrie, P.; Morison, G. Optimized deep encoder-decoder meethods for crack segmentation. Digit. Signal Process. 2021, 108, 102907. [Google Scholar] [CrossRef]
Sattar, D.; Thomas, R.J.; Marc, M. Comparison of deep convolutional neural networks and edge detectors for image-based crack detection in concrete. Constr. Build. Mater. 2018, 186, 1031–1045. [Google Scholar]
Gopalakrishnan, K.; Khaitan, S.K.; Choudhary, A.; Agrawal, A. Deep Convolutional Neural Networks with transfer learning for computer vision-based data-driven pavement distress detection. Constr. Build. Mater. 2017, 157, 322–330. [Google Scholar] [CrossRef]
Geetha, G.K.; Sim, S.H. Fast identification of concrete cracks using 1D deep learning and explainable artificial intelligence-based analysis. Autom. Constr. 2022, 143, 104572. [Google Scholar] [CrossRef]
Cardellicchil, A.; Ruggieri, S.; Nettis, A.; Renò, V.; Uva, G. Physical interpretation of machine learning-based recognition of defects for the risk management of existing bridge heritage. Eng. Fail. Anal. 2023, 149, 107237. [Google Scholar] [CrossRef]
Kavitha, S.; Baskaran, K.; Dhanapriya, B. Explainable AI for Detecting Fissures on Concrete Surfaces Using Transfer Learning. In Proceedings of the 2023 International Conference on Inventive Computation Technologies (ICICT), Lalitpur, Nepal, 26–28 April 2023; pp. 376–384. [Google Scholar]
Luigs, H.G.T.; Mahlein, A.K.; Kersting, K. Making deep neural networks right for the right scientific reasons by interacting with their explanations. Nat. Mach. Intell. 2020, 2, 476–486. [Google Scholar]
Zech, J.R.; Badgeley, M.A.; Liu, M.; Costa, A.B.; Titano, J.J.; Oermann, E.K. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med. 2018, 15, e1002683. [Google Scholar] [CrossRef]
Badgeley, M.A.; Zech, J.R.; Oakden-Rayner, L.; Gilcksberg, B.S.; Liu, M.; Gale, W.; Mcconnell, M.V.; Percha, B.; Snyder, T.M.; Dudley, J.T. Deep learning predicts hip fracture using confounding patient and healthcare variables. NPJ Digit. Med. 2019, 2, 31. [Google Scholar] [CrossRef]
Hamamoto, R.; Suvarna, K.; Yamada, M.; Kobayashi, K.; Shinkai, N.; Miyake, M.; Takahashi, M.; Jinnai, S.; Shimoyama, R.; Sakai, A.; et al. Application of Artificial Intelligence Technology in Oncology: Towards the Establishment of Precision Medicine. Cancers 2020, 12, 3532. [Google Scholar] [CrossRef]
Dou, H.; Zhang, L.; Han, F.; Shen, F.; Zhao, J. Survey on Convolutional Neural Network Interpretability. J. Softw. Available online: http://www.jos.org.cn/1000-9825/6758.htm (accessed on 28 December 2020).
Piano, S.L. Ethical principles in machine learning and artificial intelligence: Cases from the field and possible ways forward. Humanit. Soc. Sci. Commun. 2020, 7, 1–7. [Google Scholar] [CrossRef]
Brundage, M.; Avin, S.; Wang, J.; Belfield, H.; Krueger, G.; Hadfield, G.; Khlaaf, H.; Yang, J.; Toner, H.; Fong, R.; et al. Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. arXiv 2020, arXiv:2004.07213. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. arXiv 2016, arXiv:1610.02391. [Google Scholar]
Wang, H.; Wang, Z.; Du, M.; Yang, F.; Zhang, Z.; Ding, S.; Mardziel, P.; Hu, X. Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks. arXiv 2019, arXiv:1910.01279. [Google Scholar]

Figure 1. Grad-CAM algorithm.

Figure 2. Grad-CAM analysis results of partial crack images.

Figure 3. Basis for crack identification of VGG16. (a) Grad-CAM. (b) Score-CAM.

Figure 4. Basis for crack identification of VGG16_gap. (a) Grad-CAM. (b) Score-CAM.

Figure 5. Basis for crack identification of VGG4. (a) Grad-CAM. (b) Score-CAM.

Figure 6. Basis for crack identification of VGG4_gap. (a) Grad-CAM. (b) Score-CAM.

Figure 7. Basis for crack identification of VGG3_gap. (a) Grad-CAM. (b) Score-CAM.

Figure 8. Partial images from small training datasets. (a) Partial crack image. (b) Partial background image.

Figure 9. Transfer learning data.

Figure 10. Basis for crack identification of VGG3 after optimization. (a) Grad-CAM. (b) Score-CAM.

Figure 11. Indicator calculation dataset.

Figure 12. Judgment basis for VGG16.

Figure 13. Judgment basis for VGG16_gap.

Figure 14. Judgment basis for VGG4.

Figure 15. Judgment basis for VGG4_gap.

Figure 16. Judgment basis for VGG3. (a) 0.32% judgment basis. (b) 0.42% judgment basis.

Table 1. Model indicators.

	Accuracy	Recall
VGG16	98.7%	98.2%
VGG16_gap	99.7%	100%
VGG4	100%	100%
VGG4_gap	99.7%	100%
VGG3	100%	100%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, J.; He, Y.; Xu, C.; Jia, X.; Huang, Y.; Chen, Q.; Huang, C.; Dadras Eslamlou, A.; Huang, S. Interpretability Analysis of Convolutional Neural Networks for Crack Detection. Buildings 2023, 13, 3095. https://doi.org/10.3390/buildings13123095

AMA Style

Wu J, He Y, Xu C, Jia X, Huang Y, Chen Q, Huang C, Dadras Eslamlou A, Huang S. Interpretability Analysis of Convolutional Neural Networks for Crack Detection. Buildings. 2023; 13(12):3095. https://doi.org/10.3390/buildings13123095

Chicago/Turabian Style

Wu, Jie, Yongjin He, Chengyu Xu, Xiaoping Jia, Yule Huang, Qianru Chen, Chuyue Huang, Armin Dadras Eslamlou, and Shiping Huang. 2023. "Interpretability Analysis of Convolutional Neural Networks for Crack Detection" Buildings 13, no. 12: 3095. https://doi.org/10.3390/buildings13123095

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretability Analysis of Convolutional Neural Networks for Crack Detection

Abstract

1. Introduction

2. Interpretability of CNNs

2.1. Convolutional Neural Network Interpretation Algorithm

2.1.1. Grad-CAM

2.1.2. Score-CAM

3. Results

3.1. Interpretability Analysis of Convolutional Neural Networks

3.2. Optimized Training Methods

3.3. Evaluation Indicators for Crack Recognition Network Based on GradCAM

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI