Robust Concurrent Detection of Salt Domes and Faults in Seismic Surveys Using an Improved UNet Architecture

Interpretation of seismic structural traps for accurate hydrocarbon reservoirs characterization is a challenging task. Seismic interpreters learn to accurately delineate subsurface structures after going through a lengthy process of training and expertise-acquiring that is challenging and time-consuming. In this paper, we propose a novel semantic segmentation model for salt domes and faults identification in a real concurrent scenario using an improved encoder-decoder deep neural network that achieves high detection accuracy for both salt domes and faults. We also introduce transfer learning to alleviate the everlasting scarcity issue of labeled seismic data and develop a robust model whose performance is not affected by event similarities among various discontinuities in seismic data. In addition, we use residual blocks in our deep neural network to make it even more robust. To demonstrate the effectiveness of our model, extensive experiments were conducted through validation and testing on real-world seismic data from the publicly available Netherlands offshore F3 block, the LANDMASS, and the TGS datasets. Both qualitative and quantitative evaluations are provided to confirm the superior performance achieved by our deep learning based workflow under the challenging scenario of multiple events detection in subsurface surveys.


I. INTRODUCTION
Interpretation of seismic records is a crucial task for understanding and analyzing geological information about subsurface structures. Seismic interpretation is a workflow that is traditionally undertaken within a collaborative work involving domain experts (i.e. geologists, geophysicists, geoscientists, etc) and is normally done interactively on robust interpretation workstations. These workstations are sets of high-powered computers and software tools, meant to assist interpreters with storing, rendering, and analyzing seismic images. The main goal of seismic interpretation is to accurately identify geological structures from seismic surveys. Such structures include salt domes, faults, unconformities, horizons, facies, and gas chimneys, to name a few. Seismic hazard analysis, natural resources exploration, hydrocarbon The associate editor coordinating the review of this manuscript and approving it for publication was Vicente Alarcon-Aquino . reservoir characterization, and depositional environments understanding, are some of the broad range applications of seismic interpretation [1].
Even though the process of seismic interpretation is computer-aided, it still requires many hours of manual interpretation including visualizing, editing, picking, and labeling different seismic features, along with using distinct marks (or colors) on a slice-by-slice basis. The difficulty of seismic interpretation is compounded by successive and iterative processes of manual corrections/modifications to guarantee acceptable seismic velocity models that are compliant with geological and geophysical knowledge. Being timeconsuming, labor-intensive, and subject to prediction biases, plenty of efforts have been put into developing automated seismic interpretation tools.
A variety of approaches have been developed for seismic interpretation automation over the past two decades. These approaches can, generally, be classified from three perspectives, namely the seismic data modality, seismic event entity, and feature extraction methodology. From the first perspective, approaches either deal with 2D seismic sections (or slices) w.r.t. a specific acquisition direction (i.e. in-line, cross-line or time-line), or with 3D seismic volumes resulting from a combination (or stack) of 2D seismic sections. From the second perspective, a large majority of approaches address a single seismic event detection at a time, with some using diverse texture and even quality metrics for seismic multi-event identification [2]- [6]. From the third perspective, approaches can be categorized into handcrafted feature-based and DL feature-based. In this work, we propose a new approach for seismic interpretation using a deconvolutional neural network (DCNN). We focus on the challenging concurrent detection of salt bodies and faults from 2D seismic sections using real-world data from the Netherlands offshore F3 block in the North Sea.
Currently, seismic interpreters are more than ever faced with increasingly larger seismic data volumes and continually dealing with tight deadlines. Thus, global and integrated solutions are needed to automate the seismic interpretation process. Data-driven solutions capable of exploiting the full potential of the challenging seismic big data and speed up workflows while guaranteeing high interpretation accuracy [7].
Recent years have witnessed the rapid development of deep neural networks (DNN), resulting in significant performance improvement, and great success in numerous computer vision and pattern recognition tasks [8], including image classification, segmentation, and enhancement [9], [10], object detection [11], and so on. The overwhelming efficiency of these techniques triggered the interest of researchers in developing robust seismic interpretation by leveraging the power of DNNs to solve problems associated with seismic surveys understanding, modeling, and interpretation.
This work is another attempt in this direction and from a new perspective. The novelty of the proposed approach resides in solving the problem of multiple seismic events detection using a hybrid DL architecture. Here, we focus on analyzing complex seismic structures, involving salt domes and faults, both known for being reliable hydrocarbon indicators. These are very challenging seismic structures, due to the weak and chaotic reflection patterns of salt deposit, the varying geometry and distribution of faults, and the complicated wavefield behavior involved in these structures. The primary objective of this work is to concurrently identify faults and salt domes using an improved deep convolutional encoder-decoder architecture, capable of performing a pixel-based prediction on seismic sections to determine whether a pixel is a fault, salt, or none of these. The main challenges encountered within this study involve the following issues: 1) how to deal with the complexity of remote sensing data types such as seismic profile records, which fundamentally differ from natural images, and where the signal to noise ratio (SNR) is quite low. 2). What will be the intuition behind building a task-oriented DL architecture for semantic segmentation of multiple classes of seismic events? 3) How to encounter the problem of insufficient quantity of labeled samples despite the availability of massive seismic data and large scale seismic volumes.
The main contributions of this paper can be summarized as follows: • The accurate detection of multiple seismic structures in a concurrent scenario using an improved DCNN model for semantic segmentation.
• The leverage of transfer learning, using pre-trained models on natural images, onto the context of seismic image analysis and interpretation.
The remainder of the paper is organized as follows. Related works for seismic interpretation of salt and fault structures are presented in Section II. In Section III, we introduce the workflow of the proposed deep encoder-decoder network for relevant seismic features segmentation, where different UNet variants are employed to address concurrent detection of salt domes and faults in real-world seismic data. In Section IV, comprehensive experiments are presented and detection results are obtained to assess the performance of the proposed approach. Both qualitative and quantitative evaluations are reported with a comparison to a recently developed approach dealing with multi-event detection. Lastly, the conclusion is reported in Section V.

II. RELATED WORK
In this section, we present an overview of previous works on seismic interpretation. We propose a two-fold categorization; first, we review handcrafted-based methods; then we give an overview of DL-based methods.

A. HANDCRAFTED FEATURE-BASED METHODS
In the literature of salt domes and faults interpretation, the majority of approaches were applied to 2D seismic sections using well-established feature extraction methods. Basic edge-detection techniques are primitive, but still being in use, for simplistic seismic interpretation. These techniques can be classified into two families; coherency-based and differencing-based algorithms. Coherency metric checks similarity/dissimilarity between adjacent seismic traces and can be calculated using different explicit formulations such as cross-correlation, semblance, variance, eigen-structure analysis and gradient structural tensor (GST) [12].
In contrast, differencing-based algorithms detect discontinuities through differencing amplitude attributes of adjacent seismic traces using operator-based edge detection techniques, such as Sobel, Robert, Prewitt, and Canny [3]. Wu and Hale [4] were among the first to work on the problem of multiple geologic structures detection and interpretation. Fault, unconformity and horizon surfaces were extracted automatically from a single 3D seismic volume. Seismic attributes estimated from differencing seismic amplitudes, and seismic normal vector fields are then used to compute fault and unconformity likelihoods, respectively.
Unfaulting and flattening processes are conducted for straightforward extraction of horizons. Most of the processing was achieved by solving partial differential equations.
Along with edge detection techniques, we can distinguish other engineered features such as geometric, texture and graph -based features. Geometric features are obtained by quantifying geometric variations of seismic reflectors using reflector curvature or flexure. Textural features are rather difficult and challenging to extract, compared to other types of handcrafted features. They are based on statistical analysis, where the spatial distribution of intensity levels in a pixels vicinity is estimated. The relationships between neighboring pixels are evaluated w.r.t. the corresponding gray level and spatial arrangement, so as to decide if they form one and the same region of interest, or not. Different texture measures are proposed directionality, smoothness and edge content in [13], gray-level co-occurrence matrix (GLCM) contrast and homogeneity [14], gradient of texture (GoT) [15], code-book based learning [16], and seismic saliency [17]. Despite being widely used in remotely sensed data, texture-based techniques suffer from the problem of finding line-like edges when dealing with spatial resolution statistics.
As for graph-based features, they were first applied in [18] to generate mesh representation of seismic images. Inspired by this later work, in [19], the authors applied the normalized cuts image segmentation (NCIS) technique and validated the successful use of graph-based representation for seismic interpretation.
Even though designed with expert knowledge, engineered feature extraction methods are unable to fully describe seismic objects of interest and exploit, to the fullest extent, greater value from complex and noise-contaminated real-world data. Besides, most of these approaches address single-event interpretation tasks, while in a real-world scenario, seismic data are most likely to contain multiple events and variant seismic features.
Unfortunately, the aforementioned methods remain either trapped in the experimental phase and impractical for industrial deployment, or serving as a suite of computer-aided tools to assist seismic interpreters. Moreover, the proliferation of large three-dimensional (3D) seismic surveying technologies with a large scale coverage, relative to basin size, has allowed for capturing a massive amount of high-resolution seismic data. This has revealed the weaknesses of handcrafted features based methods. Usually not robust and computationally intensive, these methods struggle in achieving high accuracy when dealing with such large scale data.

B. DL-BASED METHODS
The DL paradigm brings data-driven technologies to the next level by providing powerful tools that are capable of automatically extracting extremely detailed features from an enormous amount of data. The effectiveness of DL-based methods has been shown in different applications of seismic data analysis. In an interesting investigation by Di et al. [20], a proof-of-principle study is conducted with focusing on the contributing factors to the superiority of CNN-based methods, over traditional techniques, in detecting important seismic structures. In the study, two key strengths are highlighted, the ability to generate a rich suite of feature maps, and the patch-based encoding of seismic reflection patterns to map seismic signals into targeted seismic structures.
In what follows, we give an overview of main DL approaches related to seismic interpretation while sundering them into two families; CNN-based [21]- [25] and DCNN-based [26]- [30]. Under the first type of approaches, the interpretation task is a classification-oriented problem, whereas, under the second type, it is a segmentation-oriented one. The equivalence relation between image segmentation and pixel-level classification legitimates the adopted problem solving direction.

1) CNN-BASED APPROACHES
Waldeland et al. [21] work appears to be one of the early attempts at applying CNNs to learn features from seismic data for salt bodies delineation. A simple CNN architecture is proposed, built using 5 convolutional layers and one fullyconnected layer and a two-node softmax for salt or non-salt pixel classification. By using only one manually labeled inline slice, a set of small cubes, centered around the corresponding pixels, are selected to train the proposed model. Only qualitative assessment of the salt detection is reported using an illustration of pixel-wise classification results for a few selected sections of the Netherlands off-shore F3 block seismic volume.
Xiong et al. [24] applied CNN for fault mapping within a 3D seismic volume. Fault probability cube is generated using the CNN model composed of only 2 convolutional layers, two fully-connected layers and a two-node softmax classifier (i.e. fault prediction). Only three seismic slices forming orthogonal cross-sections are used to feed the 3 channels input layer. Fault or non fault prediction is set to be associated with the cubes' central point. Real data from 8 annotated seismic cubes are used to generate the training data set with one holdout for validation. To test the model both synthetic and real seismic data are used. The fault probability cube imaging, generated by the proposed model, highlights seismic faults and shows off discontinuities more clearly compared to the traditional coherence cube method [31]. Wu et al. [32] used a CNN-based pixel-wise classification method not only to predict fault probability, but also to estimate fault orientations (i.e. dips) simultaneously. To train and validate the proposed CNN model, the authors also developed a well-established workflow to automatically generate synthetic 2D seismic data and their corresponding labeling. The proposed model outperforms conventional methods when tested on real seismic data. Inspired by the latter work, Zheng et al. [23] used two CNN models for predicting fault presence and its orientation (i.e. dip and azimuth attributes) simultaneously. They demonstrated that CNN models trained on synthetic data can be used efficiently for fault predictions on field data.

2) DCNN-BASED APPROACHES
Shi et al. [26] considered salt body detection as semantic image segmentation problem. Inspired by both Segnet [33] and UNet [34], they developed a DL encoder-decoder architecture for an end-to-end salt body detection. Zeng et al. [27] applied the state-of-art UNet model, along with the residual learning framework ResNet for salt body identification. Alaudah et al. [28] proposed a deconvolutional network for various seismic interpretation tasks including salt domes and faults. Di et al. [30] proposed a real-time seismic interpretation approach using a DNN model. The method is capable of accurately identifying several seismic features simultaneously. Karchevskiy et al. [29] got into the 27 th place in Kaggle competition for salt identification using UNet variant and fine-tuned the encoder based on the pre-trained ResNeXt50 model. More recently, Li et al. [35] used the UNet for seismic fault detection and highlighted the efficiency of such a model in achieving good performance without any issues regarding insufficient training data. To perform 3D fault segmentation, Wu et al. [36] proposed FaultSeg3D, a simplified version of UNet, where a set of 15 convolutional layers are used instead of the original 23 of UNet, and also a reduced number of feature channels per layer. Although trained on synthetic data, the FaultSeg3D model showed high efficiency in recognizing faults in several seismic data volumes acquired at different surveys. A thorough comparison against several conventional methods is reported in terms of both qualitative illustrations and quantitative measurement to demonstrate the superiority of the FaultSeg3D in achieving state-of-the-art results.
Very limited number of works have addressed the multiple seismic structures detection problems. To handle this particular challenging task, the proposed approaches either tackle the issue from an image processing perspective using complex engineered features [4], [5], or employ a simple deconvolutional network architecture with poor performance appraisal when it comes to detection results [28].
Since UNet was introduced by Ronneberger et al. [34] for medical image segmentation, it has become the go-to architecture for segmentation tasks due to its simplicity and success in tackling diverse segmentation problems. In this work, we employ the UNet model for concurrent detection of salt domes and faults in real seismic data. The UNet has become the benchmark approach for semantic segmentation which led us to select it for tackling our problem over other semantic segmentation deep learning based approaches. In addition, the simple design of the UNet allows for more customization flexibility, which we need to develop our own workflow. And most importantly, the UNet is suitable for small training datasets. Also, since the UNet is an encoder-decoder network type, we exploit transfer learning using two different encoders (VGG19 [37] and ResNet34 [38]), that have been trained on a substantial natural images database. The use of pre-trained encoders improved the detection accuracy of our DL network and led to excellent accuracy despite the fact that we only have a small number of labeled seismic images. Moreover, we show the benefits of using transfer learning by applying pre-trained networks compared with the ones trained from scratch.

III. METHODOLOGY
Building a successful DL workflow requires the availability of an adequate amount of labeled data so that the network can learn the relevant features to the problem at hand. The availability of diverse DL frameworks and various libraries facilitate the use of off-the-shelf CNN architectures as well as selecting best practices in the field. However, the bottleneck, most of the time, for a high performing model is the lack of labeled data. For seismic applications, there are several publicly available datasets such as the Netherland F3 dataset [39] and the SEAM Phase I dataset [40]. Nonetheless, the seismic interpretation field is actually facing shortage of labeled data, since it requires experts' effort, time and knowledge, to acquire. Researchers mainly dealt with this problem in 4 different ways: (1) Acquiring a few manually labeled sections, (2) Labeling data using conventional image processing techniques, (3) Synthesizing data which labeling can be derived automatically, or (4) using a small set of labeled data to train a weakly supervised learning approach for predicting labels of a larger pool of dataset.
In this experiment, we choose to use a small set of manually labeled sections from the F3 block for the problem of concurrent detection of salt domes and faults in seismic data. Figure 1 shows three seismic images from the training data (top row) and their corresponding event labeling (bottom row). From left to right the seismic samples illustrate salt dome, fault, and multiple faults seismic events, respectively.

A. PROPOSED ARCHITECTURE
We propose to explore two UNet variants in order to find out which architecture has higher detection accuracy. The two networks have similar decoders, where they contain convolution layers, up-sampling layers, and concatenation layers. The concatenation layers are used to concatenate the output of the up-sampling layers with the feature maps passed from the encoder, along the feature map axis. Each convolution layer is followed by a batch normalization (BN) layer, and a rectified linear units (ReLU) activation layer. The difference between the two networks resides in the encoder. The first DL network employs a VGG19 network as encoder, where successive convolution layers and max-pooling layers are used.  The number of filters is doubled after the max-pooling layer and this process is duplicated five times. On the other hand, the second network's encoder is built using ResNet34, where identity mapping is used to improve the backward flow of the gradient of the error during the training of very deep networks and avoid the vanishing gradient problem [41]. In the ResNet network, a residual block ( Figure 2) is used where the input to the block is added to the output after two convolution layers, and passed to the next stage. However, when the input and the output have a different number of feature maps, the input is passed through a 1 × 1 convolution layer to match the number of feature maps to that of the output. Therefore, the ResNet34 is built by stacking Residual Blocks. Figure 2 shows Residual Blocks 1 and 2 where the first block is used when there is a mismatch between the number of feature maps of the input and output, and the second block is used when there is no such mismatch [38]. Figures 3 and 4 show the proposed architectures, namely UNet-VGG19 and UNet-ResNet34, respectively, which we built for simultaneous salt domes and faults segmentation. The last layer in both networks is a convolution layer with three filters followed by a softmax layer, and where the three channels in the output represent the background, fault, and salt classes, respectively. The fusion of the UNet with either VGG19 or ResNet34 is achieved by substituting the UNet encoder with VGG19 and ResNet34 networks, respectively. However, both CNNs have their fully-connected layers removed, usually used for classification tasks and are not relevant to semantic segmentation ones.

B. NETWORK TRAINING
The UNet-VGG19 and UNet-ResNet34 networks were trained on seismic data from the Netherland F3 block using two methods. The first training method involves random initialization of the weights of the neural networks, which is commonly referred to as training from scratch. The second training method uses the weights of pre-trained encoders as initialization parameters. Then, a fine-tuning process is applied to adjust the weights for our specific task. The pre-trained encoders we used are the VGG19 and ResNet34, which were trained on natural images from the ImageNet dataset [42]. With off-the-shelf pre-trained deep neural network architectures, we can use the attributes that are learned from a huge amount of data, such as the ImageNet dataset, and avoid the need for a large amount of labeled data in our seismic interpretation task.
Ideally, we want to train deep neural networks with thousands of labeled images. However, we were limited by a small number of seismic labeled images, where both salt domes and faults coexist. The seismic dataset we used contains 61 labeled salt dome images and 43 labeled fault images. Out of the 61 salt images, 49 are used for training and 12 for validation. Similarly, out of the 43 fault images, 35 are used for training and 8 for validation. So, 80% of the salt dome and fault images are used for training and the remaining 20% are held out for validation. The images are rectangular and have different sizes, but neural networks accept a fixed input size. Following the practice in the field, we chose to set the size of the images to be of the power of 2 and to have a square shape. Otherwise, we would have to change the implementation of the networks which would make it difficult to use transfer learning. Thus, the input to the network is chosen to be 128 × 128 which is obtained by randomly cropping the input image. Also, the input image pixel values are normalized and the labels are one-hot encoded.
While carrying out experiments, we noticed that faults are mainly line-like structures that are thin, occupying only one or two pixels in width. Thus, the fault class makes the data highly imbalanced, and the network would be biased toward predicting all samples (pixels) as non-fault. We remedy this class imbalance by manually thickening the faults in the ground truth and by using the balanced cross-entropy loss function. The balanced cross-entropy loss [43] is given by where y is the ground truth label, p is the prediction probability, N is the number of samples in the image, β = i=N i=0 1−y i N is the ratio of the non-fault samples, and (1 − β) is the ratio of fault samples.

IV. EXPERIMENTAL RESULTS
First, we pre-process each input image from the North Sea F3 Block by only normalizing the image and then randomly cropping it to a 128 × 128 image size. After that, we train 4 DCNN architectures on these samples. Overall, we have 4 training scenarios: 2 U-Net networks trained from scratch, where the first has a VGG19 encoder and the second has a ResNet34 encoder. The other 2 networks, similar to the first two, except that the encoders, this time, are pre-trained  on ImageNet. Each network is trained for 100 epochs with a learning rate initialized at 10 −4 and decayed by a factor of 0.5 when the validation accuracy does not improve for successive 5 epochs.
Unsurprisingly, the pre-trained networks noticeably show better accuracy results compared to the ones trained from scratch. Table 1 summarizes the accuracy rates comparing performances using the pre-trained network against the non-pre-trained ones. For further assessment of the models' performance, we use the most common evaluation metrics utilized to quantify the accuracy of classification/segmentation models, namely, Precision, Recall, F1-score, and IoU. Table 2 summarizes the evaluation metrics values obtained using the deployed deep networks. Their performance is also compared with the results obtained from the basic UNet model, serving here as a baseline model. Regarding these evaluation metrics measures, the four networks reach high accuracy in detecting the background and salt samples. Nonetheless, it is worth noting that the applied deep models still struggle, to a certain extent, with detecting the fault event. The pre-trained networks, however, show a big improvement in fault detection. This performance improvement is obviously gained from the benefits of incorporating pre-trained CNN models in our proposed deep model, which allowed for the enhancement of our proposed framework with better generalization ability. The incorporation of pre-trained CNN models, in the encoder side of the proposed architecture, brings a deep learning model capable of retrieving low-level features (i.e. primitive features); such as curves, line-segments, and edges, learned from a substantial database of natural images.   The baseline UNet model shows, also, good performance and achieves the highest precision rate in detecting faults, but it misses out on most of the fault structures, as revealed by the low rate of the corresponding recall measure.
Furthermore, qualitative evaluation of the proposed framework is provided through the visualizations of seismic event class prediction for all proposed networks. Figures 5 and 6 show typical salt dome and fault images, respectively, with class prediction results, along with manual ground truth, and the superposition of both (labeled as overlaid), respectively. We can clearly observe that the 4 networks achieve accurate detection of salt dome events across all sample images. As for faults, the two networks trained from scratch, either missed the fault events or detected them partially. Similarly, UNet shows accurate detection of salt domes, but fails in detecting faults as depicted in Figure 7. On the other hand, the two pre-trained networks have benefited from being trained on the ImageNet, even though it contains natural images, and both networks were able to detect most or all fault samples accurately as shown with the 2 sample fault images  in Figure 6. The improved performance of the pre-trained networks suggests that attributes learned from natural images can be transferred to the seismic domain and can be used to obtain high detection accuracy with small size labeled datasets.
Note that our semantic segmentation model is fed with 128 × 128 sub-images generated from the original seismic section using random cropping. This resulted in small geometric variations (position-wise) in the images used for comparison in Figure 6. However, these small variations due to random cropping, do not invalidate the comparison between the networks.
In Figure 8, we display some failure cases of seismic event detection. The figure shows zoomed regions where the pre-trained ResNet34 failed to accurately detect fault(s) or salt boundaries. We can see that for the salt sample, the network has thickened more the salt boundary. On the other hand, for faults samples, the network struggled to delineate the upper and lower bounds of the fault endpoints, and failed to label these extreme parts as faults. We speculate that a possible cause for such failure may be due to the strategy of random region cropping we adopted for preparing the input data to our deep framework. The lack of large amounts of labeled data with more variance, in the geometry and subsurface conditions, can be another reason for such missed detection. With more labeled sample images, and as we showed in Tables 1 and 1, the proposed architecture is able to learn effectively from the data, and extract the most useful and relevant features to delineate accurately fault and salt boundaries.

A. TESTS ON THE LANDMASS DATASET
The sizes of the UNet-VGG19 and the UNet-ResNet34 networks are very large with 29 million parameters for the VGG19 version and 24.5 million parameters for the ResNet34 version. Conversely, the training dataset is relatively small (only 84 images). Therefore, we decided to carry out further testing to ensure that the models are not overfitting and only learning attributes that are useful for seismic data. Moreover, since the networks trained from scratch did not perform well on faults, in what follows, we focus on the test results obtained using the pre-trained networks. We selected the LANDMASS dataset [44] to perform our tests which contains different types of seismic events including salt domes and faults. To evaluate the networks' prediction performance, we overlay the prediction on the original seismic image for visual inspection. In Figure 9, we show the results for salt domes detection for three images with the prediction for the UNet-VGG19 network on the left and the UNet-ResNet34 on the right as well as for three fault images in Figure 10. In the case of salt examples, both pre-trained networks were able to detect salt domes and delineate salt boundaries accurately, but, in some cases, the UNet-VGG19 cannot distinguish between salt boundaries and faults. Also, the fault detection is accurate but the UNet-VGG19 network is more sensitive to discontinuities, in the sense that it detects more discontinuities as faults that are not faults. Both seismic and natural images contain primitive features (i.e. line-segments, edges, corners, etc) that can be learned as low-level features using DL architectures. VGG and ResNet were trained to detect low level features from substantial natural images databases. Similar low level features exist in seismic images, hence, transfer learning using these networks exploits what has been learned from natural images to help with the detection of seismic events.

B. COMPARISON WITH PREVIOUS WORKS
It is worth noting that very few related works provide quantitative evaluation of their seismic interpretation models. Most approaches do not apply common evaluation metrics, other than accuracy rates, for performance assessment. Experimental outcomes are usually limited to qualitative evaluation through some illustrations of the interpretation results on test data. Moreover, some approaches generated synthetic data to train their models, and reported high performance through very few illustrations of interpretation results on field data. This is mainly due to lack of labeled data. Indeed, as metric formulas involve ratios of prediction and ground truth samples, a sufficient set of labeled data is therefore needed, which is not easily accessible when it comes to seismic data. Table 3 summarizes comparison with several other seismic interpretation approaches. As outlined in the table, the comparison study takes into consideration the approach category (handcrafted or DL-based), the seismic event object of interpretation, the dataset type for training (synthetic or real-world), and eventually the evaluation metrics along with corresponding maximum values. As mentioned before, most proposed approaches tackle one particular seismic event using different state-of-art DL models. Only two approaches consider more than one seismic event, the handcrafted-based work in [45] and [46]. Different metrics were reported in [45], standard metrics such as IoU and AUC, but also new ones were introduced, such as pixel accuracy (PA), mean intersection over union (MIU), and frequency-weighted intersection over union (FWIU). In the case of [46], only qualitative evaluation is carried out through visualization of segmentation results on field test data.
For salt domes, the authors in [26] trained a SegNet network using 8 crossline sections on salt dome segmentation and obtained 98.77% accuracy on the training dataset, with no qualitative evaluation on the test dataset. Also, the authors in [27] proposed to use a U-Net+ResNet DL model for salt domes segmentation. They used 2 inline sections for training and 1 section for testing, with no quantitative evaluation of the network's performance. The approach in [47] for salt dome segmentation trained a basic U-Net on 10 crossline slices, and, again, no qualitative evaluation. As the authors who used SegNet pointed out, SegNet is prone to checkerboard artifacts, which makes it not suitable for small features learning such as faults thin line-segments. Visual comparison of our salt dome detection segmentation against the three aforementioned approaches, shows that we achieve better segmentation results of salt domes, where salt boundaries are accurately traced over the whole image.
For fault detection, the authors in [49] generated synthetic data to train CNNs, where each image contains straight lines faults. They reported their result for one DL model that  gave the best results on synthetic data, and another that gave the best visual results on real-world data. The results for the second CNN on synthetic data are: Accuracy = 0.94, Sensitivity = 0.69, Specificity = 0.99, F1-score = 0.80, and AUC = 0.96. Even though our pre-trained networks have a lower F1-score: 0.7712 with the VGG19 network and 0.6817 with the ResNet34 network, our results were obtained on challenging field data. Moreover, using CNN introduces redundancy since the network classifies only one pixel in each run, whereas U-Net classifies all pixels at once in one run. The authors in [35] used a small set of real seismic data to train a U-Net model on fault detection. The best obtained result achieves IoU = 0.500, after a post-processing stage. In contrast, our pre-trained networks reached higher IoU with 0.6588 with VGG19, and 0.5419 with the ResNet34 network.
Lastly, we compare our proposed method with the multiresolution approach developed in [45]. Four multiresolution techniques, based on texture attributes, were used to label seismic structures from the Netherland F3 block. Specifically, the Gaussian pyramid, the Discrete Wavelet Transform, Gabor filters, and the Curvelet Transform. In [45], the Curvelet Transform provided the best results, with a detection accuracy = 0.7955, IoU = 0.2656 for faults, and IoU = 0.5261 for salt domes, using only four inline images from the F3 block for validation. Our results are significantly better with IoU = 0.6588 compared to 0.2656 for faults, and IoU = 0.9776 compared to 0.7953 for salt domes using the U-Net-VGG19 pre-trained network. We should note, however, that the authors considered four seismic event classes (chaotic, fault, salt, and other) whereas we considered only three (fault, salt, and background). Figure 11 shows predictions of our proposed DL models on the same 4 inlines that were used in [45] from the Netherland F3 block (only the bottom section is displayed). Since our network accepts input of size 128 × 128, we divided each image into patches of size 128 × 128 and passed them to the network one by one. The network was able to detect the salt dome with high accuracy, and also detect most of the faults structures. Our fine-tuned neural networks were able to learn features that are specific to faults and others specific to salt boundaries with hardly any confusion.
Finally, we should note that our proposed framework is well suited for interfacing with user-friendly GUIs to assist interpreters in visualizing different types of events either simultaneously or separately. The interpretation results can also be translated into saliency maps or likelihood maps (i.e. probability maps), with meaningful colormap encoding various seismic events for enhanced visualization. The work discussed here fits well with the efforts put in the industry for optimizing oil and gas exploration processes.
Over recent years, we have witnessed major partnerships between companies from the oil and gas industry and advanced IT companies joining efforts in developing intelligent systems for enhancing productivity, such as the example of TOTAL (France) and Google Cloud, or ExxonMobil with MIT. Such partnerships turned to powerful AI/ML (Artificial Intelligence / Machine Learning) tools to make the work of seismic volumes interpreters more efficient. Among the different IT companies focusing on developing dedicated tools and systems for 2D and 3D interpretation tasks for industry, we mention Eliis International, PaleoScan, and GVERSE, to mention a few. These companies developed advanced software packages for seismic interpretation with some of the algorithms using diverse Deep Learning networks.
We should, however, be cautious when using diverse machine learning models, as these can be sensitive to the data distribution. For example, if the distribution of data (i.e. histogram) for the training data (e.g. F3 block) is different from the distribution of data for another dataset (e.g. TGS data) as shown in Figure 12, their performance can be significantly degraded. We show in Figure 13 the prediction of UNet-ResNet34 on 3 samples from the TGS dataset with their corresponding ground truth. The first 2 rows show accurate detection compared to the ground truth but the fault channel is activated in a very small portion at the salt boundary. The last row shows an example of a partial failure where the network hardly detects the salt boundary, at the bottom left of the image, and instead a fault channel was activated (in green).

V. CONCLUSION
Concurrent detection of various events from seismic surveys, while extremely important, is very challenging. In this paper, we introduced a novel semantic segmentation workflow for the simultaneous detection of salt domes and faults, using an improved UNet deep network. To further enhance the UNet performance, we exploit transfer learning from two different encoders, namely the VGG19 and ResNet34. The networks are first trained on natural images (ImageNet) before using the fused UNet on seismic surveys. We showed that transfer learning paradigm alleviates the everlasting scarcity problem of labeled seismic data, and is very useful in the case of faults identification, given the limited availability of training data. The knowledge learned from natural images (edges, corners, intensities, etc.) was very useful for identifying the subsurface structures solely from the seismic amplitude attributes.
Using transfer learning, high delineation accuracy is obtained with reduced execution time, and with using a small amount of labeled training data. Moreover, using transfer learning, we developed a robust model that is not affected by the similarity between different types of discontinuities in noisy seismic data which is improved by utilizing the skip connections strengths of the ResNet model. Comprehensive experiments were conducted through validation and testing on real-world seismic data from the publicly available Netherlands offshore F3 block, LANDMASS, and TGS datasets. Both qualitative and quantitative evaluations confirmed the superior performance achieved by our developed DL workflow, under the challenging scenario of multiple events detection in subsurface surveys. MOHAMED DERICHE (Senior Member, IEEE) received the B.Sc. degree in electrical engineering from the National Polytechnic School, Algeria, in 1986, and the Ph.D. degree in signal processing from the University of Minnesota, in 1994. He worked with the Queensland University of Technology, Australia, before joining King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, Saudi Arabia, where he leads the Signal Processing Group. He has published more than 300 articles in multimedia signal and image processing. He delivered numerous invited talks and chaired several conferences, including GlobalSIP-MPSP, IEEE Gulf (GCC), Image Processing Tools and Applications, and TENCON (a Region 10 conference). He has supervised more than 40 M.Sc. and Ph.D. students. His current research interests include different aspects of multimedia signal and image processing, seismic applications, biomedical signal processing, and diverse applications of machine learning. He  AHMED MAALEJ received the Ph.D. degree in computer science from the University of Lille I, France. He is currently an Assistant Professor with the University of Kairouan, Tunisia. He is also with the Laboratory of Advanced Technology and Intelligent Systems (LATIS), National Engineering School of Sousse (ENISo), Tunisia. His research interests include pattern recognition, shape analysis, 3D image processing, and deep learning. VOLUME 10, 2022