Due to their constant use, road pavement surfaces are subject to continuous degradation, with cracks being the first sign of pavement surface deterioration [
1]. Therefore, crack detection is crucial for monitoring and maintaining pavement surfaces, ensuring the security of the drivers. However, if traditional human inspection procedures are used, crack detection can be extremely tedious, time-consuming, and subjective [
2]. Implementing automatic pavement condition monitoring systems can overcome these limitations, allowing a more precise, faster, and safer analysis than traditional methods, minimizing experts’ effort and human subjectivity.
The automatic detection and classification of cracks in road pavements are challenging due to the different characteristics of pavement materials and the wide variability of crack shapes and their inhomogeneous textures, among other factors. Moreover, depending on the sensors used to gather road pavement imagery, the resulting images may be affected by different types of noise, shadows, debris, or oil and water spots. These challenges make developing a robust and reliable automatic crack detection and classification system a difficult task.
1.2. Related Work
Crack detection in road pavements has been a hot topic of research over the years due to its importance, high level of complexity, and challenging characteristics. It has seen a remarkable evolution and a huge variety of proposals based on different methodologies. What most differentiates the available methodologies using a camera sensor is the strategy used to extract features from images that allow for identifying pavement surface distresses [
4].
In the scientific literature, two main categories of methods can be identified: (i) those based on traditional image processing; (ii) those using machine learning techniques. Traditional image processing techniques can be further divided into three subcategories: (i) threshold segmentation; (ii) edge detection; and (iii) region growing [
5].
Crack detection using threshold segmentation assumes that pixels belonging to a crack are darker compared to its surroundings [
6]. These methods start with a pixel-based analysis to determine whether they belong to a crack or to the background, where selecting a suitable threshold value is the key to good method performance. For instance, Oliveira and Correia [
7] identified dark pixels belonging to potential cracks using the dynamic thresholding technique that comprised two steps. The first threshold was able to label image pixels into “image background” and “potentially belonging to cracks”. Then, after dividing the resulting segmented image into non-overlapping blocks of a given dimension, they were finally classified into “crack” and “non-crack” after applying a second threshold determined based on the set of entropy values obtained from the analysis of the pixel intensities of each non-overlapping image block (as many entropy values as the number of non-overlapping blocks in the image).
Edge detection methods are based on identifying the edges of regions of the image presenting pavement surface distress, enabling the outlying of their contours [
6]. Various edge detection operators include Sobel, Roberts, Prewitt, and Canny operators. However, using a single operator can hardly reach the desired results. Hence, many researchers used edge detection operators with other techniques to improve crack detection performance [
5]. Ayenu-Prah and Attoh-Okine [
8] proposed a road crack detection method, combining a bi-dimensional empirical mode decomposition (BEMD) with Sobel edge detection with BEMD, extending the original empirical mode decomposition [
9] to remove noise from the input signal efficiently.
Region-growing strategies aim to gather similar pixels together to form regions. Then, the characteristics of pavement surface defect regions can be estimated. Li et al. [
10] proposed an automatic crack detection method, the F* Seed growth Algorithm (FoSA), to improve the detection of discontinuous and blurred cracks. The FoSA is an extension of the F* method [
11], exploiting a seed-growing strategy (as illustrated in
Figure 1) to eliminate the requirement that start and endpoints should be known or pre-defined in advance. The global search area is reduced to local to improve search efficiency. However, the results of FoSA may be impaired by the lighting conditions of the images to be analyzed.
More recently, through the enormous development of machine learning techniques, deep learning has dominated proposals for the detection of cracks in pavement surface images taken during road surveys, as well as their classification, by allowing a computer to automatically learn new characteristics and classification rules based on a sufficiently large set of representative training examples. Adopting a supervised learning approach requires that the training set contains images labelled with information about the presence and location of cracks, a task that typically requires a substantial previous effort by human experts to produce this labelling information and that can bring out the subjectivity of human analysis.
Methods based on deep learning have achieved great success in many related areas, such as image classification, object detection, and image segmentation. Therefore, deep learning methods, mainly using convolutional neural networks (CNN), have also been proposed for automatic crack detection and/or crack-type classification.
Crack detection and crack-type classification can be seen as image classification problems, and two different approaches can be followed: (i) assigning a label to the whole input image; or (ii) dividing the image into blocks and classifying each block as belonging to a particular class. The labels to be considered can be binary (“crack” or “non-crack”) or else indicate the type of crack that was detected (longitudinal, transverse, and crocodile skin, among others).
For example, Lei et al. [
12] proposed a deep learning-based algorithm that divides pavement surface images of size 3264 × 2248 pixels into smaller blocks of size 99 × 99 pixels (keeping the 3 RGB channels). They used a CNN to perform a binary classification of each image block according to the probability of whether or not it contains cracks. The CNN used by the authors was ConvNet, and its architecture is illustrated in
Figure 2.
Aslan et al. [
13] implemented a CNN that classifies several types of surface distress present in images of road pavements. The proposed CNN architecture (
Figure 3) consists of two convolutional units with 32 filters, each applied before a 16-neuron fully-connected layer and a softmax classification layer with four output neurons. The developed model was trained through a balanced dataset that includes four types of pavement surface defects: (i) longitudinal crack; (ii) transversal crack; (iii) alligator crack; and (iv) pothole.
Anand et al. [
14] proposed an autonomous real-time system with an associated camera to detect cracks and potholes in images of road pavement surfaces. The method, denoted Crack-pot, uses traditional image processing techniques, such as edge detection, to locate and generate the bounding boxes of the detected objects, combined with a CNN employed as a classification model.
Jenkins et al. [
15] proposed a fully convolutional neural network [
16] to classify all pixels presented in pavement surface images. The network architecture proposed by the authors is an encoder–decoder architecture based on the U-Net model [
17]. In this type of architecture, the layers belonging to the encoder reduce the size of the input image through down-sampling operations and learn a high density of lower-level feature maps. The layers belonging to the decoder then map the encoded features back to their original resolution, using pooling layers to perform up-sampling operations efficiently. The network ends with a classification layer to assign an individual label to each pixel. During the evaluation of this method, the authors concluded that the training data were scarce and contained a low diversity of cracks, which meant that the performance of the proposed classification system could still improve, given the network’s capabilities.
Zhang et al. [
18] proposed a fully convolutional network for per-pixel semantic segmentation/detection. The authors stated that their system, CrackGAN, aims to solve issues found in published research works dealing with the crack detection problem in images of pavement surfaces: the one known as “All black”, which occurs in FCN-based pixel-level crack detection when using partially accurate ground truths, where the network treats all the pixels as belonging to the image background, along with the data imbalance issue concerning the network’s training step. Their system processed images from three publicly available image datasets, but most of them were captured by very sophisticated imaging systems, namely laser imaging systems. Although the proposed approach achieved state-of-the-art performance, the calculation of the pavement surface distress level was not performed, nor the classification of distress types.
Liu et al. [
19] proposed a feature fusion encoder–decoder network (FFEDN) with two novel modules to improve the crack detection accuracy by enhancing the representation capability of tiny cracks and reducing the influence of background interference. The FFEDN system processed images from three publicly available image datasets mainly captured by cell phones, and published results showed that the proposed system outperformed all the eight methods used for comparison. Nevertheless, all the images presented in the published work are free of artifacts in the pavement surface, such as white lane markings (continuous or interrupted), oil spots, and shadows cast by objects near the road edges, which does not allow us to assess on the system performance in processing images with such artifacts.
A recently published research paper by Ma et al. [
20] addressed problems regarding data acquisition (image and video) and the defect counting of road pavement surfaces, aiming at increasing the efficiency and accuracy of traditional detection systems. The authors developed a generative adversarial network (PCGAN) to generate realistic crack images, aiming to solve the problem of small amounts of training data, an important issue when using deep neural networks. Regarding crack detection, an improved version of a regression-based object detection and median flown model was developed (YOLO-MF) that allows for obtaining the number of cracks in an image/video captured by an imaging sensor carried by an unmanned aerial vehicle. Although the published performance metrics are auspicious, the images shown in the paper exhibit very prominent cracks that appear to be easily detectable. On the other hand, the images presented in the paper did not show some often-visible artifacts on road pavement surfaces, such as oil spots or even white lane markings. Additionally, the authors concluded that their method presented difficulties in dealing with small and complex pavement distresses and detecting extensive alligator cracks.
CrackW-Net was the methodology developed by Han et al. [
21], namely a skip-level round-trip sampling block structure with the implementation of convolutional neural networks aimed at segmenting cracks in images of the pavement surface at the pixel level. The authors tested their proposed approach with only two image datasets, namely the Crack500 [
22] and a self-built dataset. Processed images were free from well-known artifacts (oil spots and white lane markings, among others), with most of them exhibiting simple and fairly noticeable cracks. CrackW-Net training costs were compared to FCN [
23], U-Net [
24], and ResU-Net [
25], which were significantly higher concerning the number of neural network parameters and the training speed. The authors concluded that extensive validations are still needed before the use of CrackW-Net becomes feasible.
As highlighted, numerous studies have been devoted to deep learning in crack detection and classification due to convolution neural networks’ excellent feature representation capabilities. The approach presented in this paper follows this trend, proposing, however, a segmentation deep neural network to identify crack regions and a classification deep neural network to classify detected cracks into types to handle more complex images of road pavement surfaces that frequently present noticeable artifacts’ visible and more complex crack shapes and textures.