Introduction
The February 6, 2023, earthquakes in southern Turkey, with magnitudes of 7.8 and 7.5, caused extensive damage to infrastructure, including residential buildings, bridges, transportation systems, and critical lifelines. Significant ground deformations, such as landslides, fault ruptures, and ground subsidence, further exacerbated the destruction [1], [2]. Cities like Kahramanmaras, Hatay, Gaziantep, and others, with a combined population of approximately 15 million, were severely impacted, resulting in nearly 50,000 deaths and the collapse of 19,284 buildings [3], [4]. Previous studies focused on the seismic performance of structures, including Ozkula et al. [5], who conducted an observational analysis of structural and geotechnical damage, and Işık et al. [6], who explored the deterioration of masonry structures in Adiyaman. Kahya et al. [7] also examined earthquake-related damage to the Hatay Governorship Building, attributing structural damage to the seismic events. However, these studies primarily concentrated on structural and historical buildings, leaving a gap in addressing the deformations affecting highways and asphalt surfaces. To fill this gap, the present study specifically analyzes highway deformations caused by the earthquakes in Kahramanmaras (Mw = 7.7 and 7.6) through field investigations, emphasizing the importance of segmenting road and asphalt cracks for the effective service and maintenance of highways following such disasters.
Cracks represent a prevalent issue in civil infrastructure, as highlighted in existing literature [8], [9]. The impact of structural cracks on a civil structure can vary in severity, potentially compromising its serviceability, leading to structural damage, and posing a threat to both structural integrity and human safety.
A crack is a split in asphalt concrete caused by complete fracture (splitting) or breaking into two or more pieces. Cracks can occur on various surfaces, such as buildings, bridges, roads, sidewalks, and railroad tracks. We can classify cracks into two categories: active and inactive. In active cracks, a change in direction, width, or depth occurs over a measured period, while in inactive cracks, these characteristics remain unchanged. If left uncorrected, both active and inactive cracks provide a gateway for moisture penetration, which can lead to further damage. There are several types of active cracks, but they are mainly divided into three groups: transverse cracks, longitudinal cracks, and crocodile (or web) cracks [10]. Each of these crack types is repaired in different ways. For this reason, it is crucial to identify the cracks and determine the type of each crack.
Transverse cracks [11] are perpendicular to the centerline of the pavement and are typically caused by thermal changes. Other causes include asphalt binder hardening or reflection cracks caused by underlying cracks below the asphalt surface.
The main causes of longitudinal cracks are divided into fatigue joints and weak joints. Fatigue cracks are caused by heavy vehicles that are constantly overloaded. The least dense areas of the pavement are usually the joints. If the joints are located in areas of high stress, a crack may appear [12], [13].
Finally, crocodile cracks are a combination of fatigue and instability of the asphalt base [12].
Transverse and longitudinal cracks [13], [14] Small, wide cracks should be repaired to prevent moisture infiltration, while larger cracks should be repaired by removing and replacing the cracked pavement layer. In the case of alligator cracks, if the crack is small, the repair involves removing the cracked pavement area, excavating it, and replacing the asphalt in that area. If alligator cracks are large, a new pavement layer must be placed over the entire surface. Non-mobile cracks are very fine and have the ability to heal spontaneously over time.
The various types of cracks, according to their structure, include micro-cracks, fine cracks, closed cracks, mixed cracks, line-like cracks, small cracks, medium cracks, large cracks, and complex cracks [15].
Small cracks are very fine and have three subtypes: fine, small, and line-like cracks. These types of cracks are commonly found in RC bridges, underwater dams, plastics, automobiles, and airplanes.
Moderate cracks are not considered very serious but still require remedial measures. These cracks usually occur in underwater dams and concrete roads, and include types such as moderate, impermeable, and severe cracks.
Severe cracks are very large and dangerous, requiring immediate corrective measures. These cracks are commonly found in subway tunnels, concrete roads, bridges, sidewalks, underwater dams, and civil structures. Severe cracks include large, simple, and complex cracks.
Since the occurrence of a crack reduces the value of civil infrastructure, it is important to assess the severity of the crack. Crack detection and classification techniques play a major role in determining crack severity [11].
The detection of cracks in images has led to the development of various techniques, which can be categorized under four main headings: manual detection, image processing, feature-based machine learning, and deep learning-based algorithms [16], [17], [18]. Manual detection, although supervised by experts in the field, is time-consuming and prone to inaccuracies due to human error.
Image-based inspections for crack detection can be performed in three general ways: raw image inspection, image enhancement, or autonomous image processing. Raw image inspection involves the inspector viewing the images taken during the inspection. Advanced image inspection refers to the use of image processing algorithms to make it easier to identify defects in the inspection images. Autonomous image processing involves the use of algorithms that detect cracks in the images. This is usually accomplished using machine learning algorithms or other artificial intelligence techniques [19], [20].
Given the rapid advancements in computer algorithms and high-performance computing devices in recent years, there has been a thorough investigation into the application of deep convolutional neural networks (DCNNs), a subset of deep learning methods, for tasks associated with crack detection [21], [22], [23], [24], [25], [26]. Numerous recent applications of deep convolutional neural networks (DCNNs) for crack classification have been documented in the literature [27], [28], [29], [30], [31], [32], [33]. Despite reporting varying degrees of success, these methodologies detect cracks through image patches, leading to limitations in the resolution of the extracted cracking information. The latest advancements in DCNNs predominantly focus on pixel-wise crack classification through semantic segmentation. Semantic segmentation involves segmenting an image into distinct regions by assigning each pixel a categorical label. Yang et al. [34] introduced an encoder-decoder network utilizing VGG19 [35] to generate pixel-level crack maps for concrete pavements and walls. Zou et al. [36] presented DeepCrack, an encoder-decoder network based on SegNet [37] that incorporates multi-scale cross-entropy losses for crack detection. Additionally, Bang et al. [38] introduced an encoder-decoder network based on ResNet [39], explicitly tailored for the segmentation of roadway cracks in images that may contain non-road objects. Dung and Anh [40] proposed an encoder-decoder network utilizing a VGG16-based encoder to perform crack segmentation on the surface of concrete structures. Meanwhile, Zhang et al. [41], Zhang et al. [42], and Fei et al. [43] introduced CrackNet and its derivatives. These models are DCNNs specifically trained and evaluated on laser-scanned range images for roadway crack segmentation.
This paper proposes a field observation study to assess road cracks after destructive earthquakes. The main objective of this study is to develop deep learning-based segmentation methods to rapidly detect cracks, facilitating prompt maintenance in chaotic situations following earthquakes. SegNet, Attention SegNet, U-Net, FCN(8s), and DeepLab models were employed for the automatic segmentation task. The segmentation performances of these models were analyzed in detail. Additionally, a new dataset was created with pixel-based labeled images for the automatic segmentation of post-earthquake cracks.
Among the various image processing techniques used for crack detection, the segmentation method is discussed in this study.
Material and Method
A. Material
In this study, the deformations of asphalt pavements after the Kahramanmaras Pazarcik and Elbistan earthquakes (Mw = 7.7 and 7.6) were obtained through a field investigation and used as a dataset. The data were collected from the provinces of Kahramanmaras, Hatay, Malatya, Adiyaman, and Gaziantep, where the earthquakes were strongly felt. The images were captured in JPEG format with a professional Canon camera with a resolution of 24 megapixels. The total number of images is 528, with 111 images from Adiyaman, 38 images from Gaziantep, 191 images from Hatay, 101 images from Kahramanmaras, and 87 images from Malatya. Representative asphalt deformation images from each province are shown in Fig. 1. The cracks in Fig. 1 ranged from 10–30 cm in width and 1–5 meters in length. The dataset is randomly divided into 80% for training and 20% for validation.
B. Method
The methodological sequence of the study is presented as follows:
Transferring data to digital (computer) media
Reading the data with the software program and creating the data type for each image
Creating pixel-labeled images for each image
Obtaining the database by combining data types, dividing the database into training and testing sets, and preparing it for the application environment
Creating deep learning-based segmentation models
Testing the performance of the models with training-test data
During the training process, a loss function (H) called “crossentropy” is used to optimize the learnable parameters of the deep learning models. This function is calculated as shown in Equation 1.\begin{equation*} H\left ({{ P^{x}/P }}\right)=-\sum \nolimits _{i} {P_{(i)}^{X}loglog P_{(i)}} \tag {1}\end{equation*}
\begin{align*} Accuracy& =\mathrm {(TP+TN)/(TP+TN+FP+FN)} \tag {2}\\ Sensitivity& =\mathrm {TP/(TP+FN)} \tag {3}\\ Precision& =\mathrm {TP/(TP+FP)} \tag {4}\end{align*}
In the study, five different deep learning-based segmentation models were used. These are SegNet, Attention SegNet, U-Net, DeepLab, and FCN 8s models. The numbers of layers in the SegNet, Attention SegNet, U-Net, DeepLab, and FCN 8s models are 31, 103, 58, 206, and 51, respectively. Among these models, the models other than the Attention SegNet architecture were loaded from Matlab libraries. The Attention strategy in the Attention SegNet structure was added by coding.
The advantages and some disadvantages of these models are outlined below.
SegNet: Produces clean results on simpler objects but tends to generate softer and less defined boundaries on complex objects.
Attention SegNet: Focuses more on key regions, providing sharper boundaries and better results in cluttered scenes.
U-Net: Excellent at preserving boundaries and object detail, with strong performance in fine-grained tasks like medical image segmentation.
DeepLab: Provides highly accurate results, especially in multi-scale and complex scenes, with well-defined boundaries.
FCN 8s: Offers decent segmentation results but may blur small object boundaries and intricate structures.
FCN is a deep architecture that provides pixel-level information, referred to as semantic segmentation. Unlike classification architectures, FCN does not include fully connected layers. It takes inputs of arbitrary size and produces outputs of the corresponding size, allowing the creation of a segmentation map. The output of FCN is the same size as the input, and each pixel is assigned to a class [46]. FCN uses an encoder-decoder architecture for image segmentation. High-level image features are first extracted by an encoder using convolutional layers and then reconstructed by a decoder using deconvolutional (upsampling) layers. There are three networks of fully connected layers in
SegNet is one of the encoder-decoder, pixel-based classifier models applied in road segmentation applications. The encoder typically consists of one or more convolutional layers, followed by batch normalization, a ReLU activation function, and max pooling [47]. The decoder regenerates the input data to create a segmentation map by highlighting and categorizing the input points. The output of the decoder is constrained to a given interval by a sigmoid activation at the final stage (0,1). By using the memorized maximum pooling indices at each stage, the feature map stores the corresponding maximum pooling position for each process. Finally, a pixel-wise classification layer is applied to predict the class of the corresponding pixel [47].
DeepLabv3 further enhances the Atrous Spatial Pyramid Pooling (ASPP) segmentation by utilizing image-level features that analyze convolutional features of different sizes through the application of convolutions at different rates [48], [49]. ASPP segmentation consists of deeply separable convolutions at different rates (6, 12, 18), global average pooling, and
By updating the filter weights in convolutional layers, the filters are adjusted using the loss information, allowing them to better match the relevant image features. This optimization minimizes the loss function, or optimizes the parameters, to reduce the difference between the output values of the neural network and the observed values. In this study, to minimize the loss function, the stochastic gradient descent (SGD) and momentum methods, which are commonly used in semantic segmentation problems with deep learning models, were employed.\begin{align*} W_{t+1}& ={W}_{t}-\alpha \frac {\partial L}{\partial Wt} \tag {5}\\ W_{t+1}& =W_{t}-\alpha m_{t} \tag {6}\\ m_{t}& ={\beta m}_{(t-1)}+(1-\beta)\partial L\mathrm {/}\partial Wt \tag {7}\end{align*}
Results and Discussions
To obtain the findings in this study, Matlab software was used, running on a 12th Generation i9 processor, 64GB of memory, and a 12GB graphics card (RTX 3080ti). All coding was performed in this environment. For the training parameters in the deep learning models, the learning rate was set to 0.1, the maximum number of epochs was set to 20, and the minimum batch size was set to 32. The SGDM method (stochastic gradient descent with momentum) was chosen to optimize the learnable parameters.
First, training was performed for the Attention SegNet model. As shown in Fig. 3, the accuracy and loss values stabilized after approximately 200 iterations. According to the accuracy change graph (Fig. 3(a)), the training accuracy reached 83.4% after 430 iterations. Fig. 3(b) shows the loss change graph, where the training loss was 0.27 after 430 iterations.
Training results of the Attention SegNet model: (a) Training accuracy across iterations, (b) Loss across iterations.
After completing the training process of the Attention SegNet model, testing was performed using the final training parameters. In this study, the testing process was applied to 190 asphalt crack images. Fig. 4 shows the segmentation accuracy, precision, and sensitivity calculated for these 190 images.
In Fig. 4, the performance of the Attention SegNet model on asphalt crack segmentation is summarized using several key metrics. The highest accuracy achieved by the model was 0.93, indicating that it correctly classified 93% of the pixels in the best-performing image. Conversely, the lowest accuracy recorded was 0.71, suggesting that the model’s performance varied, with a drop to 71% accuracy in the least-performing case. Across all 190 test images, the model achieved an average accuracy of 76.86%, meaning it correctly classified about 77% of the pixels on average.
In terms of precision, which reflects the model’s ability to avoid false positives, the highest precision score was 1.0, meaning the model perfectly identified all crack pixels in certain images without any false positives. On the other hand, the lowest precision was 0.62, indicating that for some images, the model struggled more, producing a higher number of false positives. The average precision score across all 190 images was 75.07%, showing that, on average, 75% of the pixels the model identified as cracks were indeed cracks.
Finally, regarding recall, which measures the model’s ability to capture actual crack pixels without missing any, the highest recall was 0.92, and the lowest recall was 0.72. The average recall score for the 190 images was 74.91%, indicating that, on average, the model was able to identify around 75% of the actual crack pixels in the images but missed approximately 25%.
This analysis demonstrates solid overall performance while highlighting some variability in the model’s ability to consistently segment cracks across different images. Fig. 5 shows sample result images from the test process with the Attention SegNet architecture, including real images and correctly labeled segmentation images.
Segmentation Results Using Attention SegNet: (a) Original Crack Image, (b) Correctly Labeled Segmentation Image, (c) Model-Generated Segmentation.
The training process for SegNet, U-Net, FCN (8s), and DeepLab models was performed, and the results are shown in Fig. 6. For all these architectures, the training accuracy exceeded 99%. The training loss values were as follows: 0.0136 for U-Net, 0.0235 for FCN (8s), and 0.0113 for DeepLab. However, when evaluating the test performance metrics—accuracy, precision, and sensitivity—SegNet outperformed the other architectures by 10%, 5%, and 8%, respectively.
Accuracy and Loss Graphs of Popular Deep Segmentation Models: SegNet, U-Net, FCN (8s), and DeepLab during the Training Process.
To provide a comparative analysis between the SegNet model and the Attention SegNet model, the performance of the SegNet model was evaluated after completing the training process. The testing phase utilized the final training parameters, and the results were presented in graphical form for detailed examination. In this study, 190 images of asphalt cracks were used to test the model’s segmentation capabilities.
As shown in Fig. 7, the performance metrics of the SegNet model—accuracy, precision, and sensitivity—are summarized for these 190 test images. The highest accuracy score achieved by the model was 1.0, indicating perfect classification of all pixels for certain images, while the lowest accuracy recorded was 0.51, demonstrating a significant drop in performance for some challenging images. Despite this range, the average accuracy across all test images was 85.36%, suggesting that, on average, the model correctly classified about 85% of the pixels.
In terms of precision, which measures how well the model avoids false positives, the highest precision score was also 1.0, signifying perfect precision for some images, while the lowest precision score was 0.54, indicating that the model sometimes produced a relatively high number of false positives. The average precision score across all 190 images was 92.61%, showing that, on average, around 93% of the pixels identified as cracks were correctly classified, with a relatively low occurrence of false positives.
Finally, for sensitivity (recall), which reflects the model’s ability to detect actual crack pixels without missing them, the highest recall score was 0.95, and the lowest recall score was 0.61, indicating variability in the model’s ability to capture all true crack pixels across different images. The average recall for the 190 images was 84.91%, meaning the model successfully identified about 85% of the actual crack pixels in the images but missed around 15%.
These results demonstrate that the SegNet model performed well in asphalt crack segmentation, with high precision and accuracy scores, although some variability was noted across the images. This provides a meaningful comparison to the Attention SegNet model, particularly in terms of the trade-offs between precision and recall.
The SegNet model demonstrated superior performance in asphalt crack segmentation due to several key factors. Its efficient encoder-decoder architecture allows for detailed feature extraction during the encoder phase while accurately reconstructing pixel-wise classifications in the decoder phase. This structure is particularly effective for capturing the fine details of complex road cracks. Unlike many other models, SegNet utilizes indices from pooling operations instead of traditional bilinear upsampling, which enhances spatial accuracy and improves the detection of small structures such as cracks. Additionally, SegNet’s lightweight architecture enables faster training and inference times, making it suitable for large datasets without sacrificing detail. The model excels in pixel-level precision, which is crucial for accurately identifying minor deformations and cracks on asphalt surfaces. Overall, SegNet’s combination of high accuracy, efficient processing, and robust detail recognition makes it a particularly effective choice for asphalt crack segmentation tasks.
Fig. 8 shows sample result images from the test process using the SegNet architecture. Along with these images, real images and correctly labeled segmentation results are also included.
Example segmentation results for SegNet architecture test data: a) crack image, b) correctly labeled segmentation image, c) SegNet segmentation image.
Data augmentation techniques were applied to the segmentation models, including rotation, mirroring, cropping, and scaling. For each technique, the size of the training dataset was increased by a factor of four, resulting in a total increase of four times. The test dataset was not modified. The validation results for SegNet, DeepLab, FCN (8S), Attention SegNet, and U-Net are provided in Table 1.
Five deep learning-based segmentation models—SegNet, Attention SegNet, U-Net, DeepLab, and FCN (8s)—were evaluated in this study. Their performances were compared based on accuracy, precision, and sensitivity metrics.
SegNet: This model achieved the best overall performance, with an average accuracy of 86.72%, precision of 92.99%, and sensitivity of 78.45%. Its encoder-decoder structure enables detailed feature extraction and robust pixel-wise classifications. However, it sometimes struggled with the segmentation of very complex or overlapping cracks.
Attention SegNet: Incorporating attention mechanisms allowed this model to focus on key regions, resulting in sharper boundaries. While its test performance (accuracy 80.21%) was slightly lower than SegNet’s, it excelled in segmenting cluttered and noisy scenes.
U-Net: Known for its boundary preservation, U-Net performed well in identifying smaller details but suffered from overfitting due to the limited size of the dataset, resulting in a lower average accuracy (78.64%).
DeepLab: With its multi-scale feature extraction using Atrous Spatial Pyramid Pooling, DeepLab provided accurate segmentation for complex scenes but required higher computational resources. Its test accuracy (76.89%) was lower compared to SegNet and Attention SegNet.
FCN (8s): While this model achieved a balance between simplicity and performance, its tendency to blur small object boundaries limited its utility for intricate crack segmentation tasks.
Conclusion
In this study, 528 images of cracks in asphalt pavements, obtained after the Kahramanmaras Pazarcik and Elbistan earthquakes (Mw = 7.7 and 7.6), were used. The goal was to detect and segment the cracks using deep learning-based models. These models were created using Attention SegNet, SegNet, U-Net, FCN (8s), and DeepLab architectures. Considering the training performances of these models, the training accuracies of all were above 99%, except for the Attention SegNet architecture (83.4%). However, the test performances (accuracy, precision, and sensitivity) of the U-Net, FCN (8s), and DeepLab architectures for 190 test images were below 10%. This indicates an overfitting problem for these architectures, likely due to the lack of training data or their inability to select discriminative features effectively. The Attention SegNet architecture achieved the second-best test performance, with accuracy, precision, and sensitivity ranging from 70% to 80%. The SegNet architecture achieved the best test performance, with accuracy, precision, and sensitivity ranging from 80% to 90%. Despite having fewer layers, SegNet outperformed the other architectures, making it the baseline model for this dataset. Therefore, SegNet was selected as the reference method for future artificial intelligence-based studies using this dataset.
ACKNOWLEDGMENT
This study was supported by the Firat University Scientific Research Project Management Unit (FUBAP) under project numbers ADEP.24.20 and MF.24.96. The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: The author is aware of the article’s content and approve its submission. The article has not been published previously and is not under consideration for publication elsewhere. No conflict of interest exists in the article and if accepted, the article will not be published elsewhere in the same form, in any language, without the written consent of the Publisher.