Seismic Fault Identification Based on Multi-Scale Dense Convolution and Improved Long Short-Term Memory Network

Aiming at the local, global and temporal morphological characteristics of faults in seismic profiles, this paper proposes the MCD-ABiLSTM method for fault identification. The method uses multi-channel, convolution kernels of different sizes and convolution of different depths to extract multi-scale seismic profile features, and makes full use of the extracted features to enhance the model’s sensitivity to small faults. By combining Bi-directional Long Short-Term Memory (BiLSTM) with multi-scale dense convolution (MCD), the spatial and temporal characteristics of seismic signals are extracted successively, so that the seismic attribute features can be better represented in space and time. In order to solve the problem of extreme imbalance between label data and background data, an improved Weighted Cross Entropy is adopted as the loss function of MCD-ABiLSTM, which alleviates the imbalance between label data and background data. Then, the label data is expanded to establish a fault label data set suitable for deep learning, which further alleviates the problem of imbalance between fault label data and background data. The comparison results show that compared with FCN, U-net, U-net++ and Deeplap V3, the method proposed in this paper improves the precision by 6.27%, 3.65%, 2.94% and 4%.


I. INTRODUCTION
Deep learning, as a pivotal core technology within machine learning, finds extensive applications in various interdisciplinary domains such as natural language processing [1], computer vision [2], and other cross-disciplinary fields [3], [4].In seismic data fault identification, deep learning methods extract low-dimensional and high-dimensional features hidden by faults in the formation to optimize the network.These characteristics include low continuity and high discontinuity in seismic reflection coherent axes [5].

Furthermore, deep learning inherently necessitates copious
The associate editor coordinating the review of this manuscript and approving it for publication was Bing Li .data for model training and demonstrates rapid and efficient data processing capabilities [6], [7].Seismic exploration, in particular, aligns perfectly with the prerequisites of deep learning, and the exploration and development of oil and gas fields urgently require the robust feature extraction capabilities inherent in deep learning.For these reasons, numerous researchers have introduced a variety of deep learning theories and model methods into the realm of fault recognition.
Networks (GAN), etc., [8] and [9].Aiming at the problem of the manual identification of faults is a time-consuming and tedious process, Zhu et al. adopted the efficient spatial pyramid network (ESPNet) to obtain the context information of seismic data, which improved the continuity of seismic data and obtained more accurate faults.Identify the effect [10].In 2022, Wei et al. adopted transfer learning to alleviate the problem of fault and background data imbalance.Based on this, a convolutional neural network is used to conduct fault identification experiments using synthetic and real data.Experiments show that the new method performs well in detecting faults in seismic data [11].In 2022, Orchere et al. adopted the Deep Convolutional Neural Network (DCNN) to detect faults in seismic profiles, aiming at the problems of less fault-labeled data and poor seismic imaging.DCNN is applied to the Groninge oilfield, and the effect is more obvious than the conventional method [12].Zou et al. used convolutional neural network to realize seismic fault identification with 8 attributes of seismic data as input.Through comparative experiments, it was found that the fault results predicted by the convolutional neural network are in good agreement with the manual interpretation, and the effect of the model is more than 85%, which proves that this method has certain feasibility and provides a new way to shorten the fault interpretation period and improve the interpretation effect [13].

B. A SEMANTIC SEGMENTATION-BASED FAULT IDENTIFICATION METHOD
This type of method is mainly based on the auto-encoder framework, the encoder extracts the low-dimensional features of the fault, and the decoder restores the features extracted by the encoder to the original dimension [14], [15].Representative methods are U-shaped network (U-Net) [16], [17].In 2022, Lin et al. used U-Net to obtain excellent fault identification results for the problems of artificially picking faults or low seismic attribute detection in complex seismic profiles [18].In 2019, Li et al. used an encoding-decoding structure combined with a convolutional neural network to predict the pixels of the seismic profile image one by one to determine whether each pixel is a fault.This method achieves good identification results on real data [19].U-Net can not only be directly used for automatic identification of faults, but also can be used as a network framework with superior performance [20], [21].The specific idea is to design the corresponding extraction module according to the actual fault characteristics, and build the corresponding model with U-Net as the framework to improve the effect of fault identification.In 2022, Gao Kai et al. introduced the Multi-scale Attention Convolutional Neural Network (MACN) in the encoding-decoding structure of U-Net, which merged and refined the different spatial features of the fault.The network was tested with synthetic and field data, and the test results showed that it was better than the conventional convolutional neural network method in identifying faults in complex seismic profiles [22].
Deep learning methods and models have realized end-toend seismic profile fault identification, showing the research significance and application prospects of deep learning in fault identification, and achieved fruitful results [23], [24], but there are the following problems: First, most networks use convolution operations to extract features.The convolution operation can effectively extract the local spatial features of the seismic profile, but it neglects the global and time series features.For example, in 2019, when Wu et al. used FCN fault identification, the precision on the test set could be as high as 95%, but the actual application effect was not ideal [25].Second, noise has a great impact on the effect of fault identification, especially deep learning algorithms.The sharpness of the reflection events in the seismic profile affects the extraction of fault features and reduces the overall identification effect [26], [27].Finally, most researchers use synthetic tomographic sample data.Although in most cases, artificial creation of fault sample data is more efficient than collecting real data, when faced with complex seismic profiles, the model trained from synthetic data is unstable [28], [29].
In response to the above problems, we use the label inflation method to construct a fault label dataset suitable for deep learning based on the results of professional identification.Aiming at the spatiotemporal features of seismic profiles, a fault feature extraction network of MCD-ABiLSTM is constructed.MCD-ABiLSTM fully extracts local, global and time-series features of seismic profiles, and solves the problems of low identification precision, insensitivity to small and hidden faults caused by incomplete feature extraction of conventional deep learning networks.

II. FAULT IDENTIFICATION PROCESS BASED ON DEEP LEARNING AND PRINCIPLE OF MCD-ABILSTM A. OVERVIEW OF DEEP LEARNING FAULT IDENTIFICATION PROCESS
In seismic exploration, the collected seismic raw data are processed and displayed as seismic amplitude images.The resulting image is stored digitally in the computer.The identification of each fault in the seismic profile is done by pixels.As shown in Figure 1, when a fault is marked on a seismic profile, the data of the fault is represented by 1, and the non-fault area is represented by 0. Deep learning algorithms distinguish faults from non-faults by different data.
Fault mask labels and seismic profiles constitute the training dataset for deep learning fault identification.According to the characteristics of seismic profiles and the specific needs of fault identification, researchers design and train corresponding models.In the training process of the model, specific operation modules extract corresponding features, such as convolution operation to extract local spatial features, recurrent neural network module to extract time series features, etc.In addition, researchers design a loss function according to the characteristics of the data and actual needs, and iterate repeatedly until the model is optimized.During testing, the seismic amplitude images were input into the trained model.The model performs probabilistic calculation on each pixel of the seismic amplitude image, and outputs the probability that each pixel is a fault or non-fault.Finally, the threshold is set to determine whether the pixel is a fault, so as to realize automatic identification of seismic faults, as shown in Figure 2.
The above process shows that the effect of fault identification methods based on deep learning mainly depends on the quality of seismic data sets, the ability to characterize faults, and the ability of models to extract fault features.Next, we mainly introduce the basic structure and principle of the multi-scale dense connection and the bidirectional long-short memory network based on the encoder-decoder structure mainly used in this paper.Among them, the multi-scale dense connection is mainly based on the convolutional neural network, and the bidirectional long-short-term memory network is mainly based on the recurrent neural network, which combines the attention mechanism to strengthen the extraction of the global feature of the fault.

B. STRUCTURE OF MCD-ABILSTM 1) MULTISCALE CONVOLUTION
In CNN, different convolution kernels have different receptive fields, and the extracted data features are also different [30], [31].The large-scale convolution kernel is suitable for extracting the global information of the data, and the small-scale convolution kernel is suitable for extracting the local information of the data.Different scales of convolution kernels are used to extract different scale features of the seismic profile, so that the extracted features are more comprehensive.The multiscale convolutional (MC) structure is shown in Figure 3.
The seismic profile data X is used as the input of the model, X ∈ R N ×M , where M is the number of seismic profile traces, and N is the number of sampling points of each seismic data trace.The multi-scale convolutional structure contains multiple branches, using 1 × 1, 3 × 3, and 5 × 5 equal convolutions, respectively.Usually the 5×5 convolution is replaced by two 3 × 3 convolutions to reduce the number of operations.The j-th layer convolution output of the i-th channel is: where X i j is the j-th layer convolution feature vector output of the i-th channel.W i 3×3 is the weight coefficient matrix of the convolution operation.The superscript indicates the channel to which it belongs, and the subscript indicates the size of the convolution kernel of the convolutional layer.b i 3×3 is the bias matrix of the j-th convolutional layer, the superscript indicates the channel to which it belongs, and the subscript indicates the size of the bias matrix.⊗ is the convolution operation and f is the activation function.
After each convolution operation, the ReLU function and batch normalization are used to increase nonlinearity, prevent overfitting, and reduce the amount of computation.Finally, feature splicing is performed on the results of the three branch convolutions, and 1 × 1 convolution is used to compress the number of channels to reduce the amount of network computation.Finally, the input features after 1 × 1 convolution are fused with the concatenated compressed features to restore part of the original features [32].The calculation process is as follows: where Concat represents the fusion of features.Conv( ⌢ X ) is the data result of the 1 × 1 convolution operation after feature fusion.⊕ represents the addition operation of feature data.
X is the output of multi-scale convolution.

2) DENSE CONNECTIONS FOR IMPROVED FEATURE UTILIZATION
In order to improve the effect of convolutional neural network, the most common method is to increase the depth of the network and expand the width of the network [33].As the network deepens, the problem of vanishing gradients becomes more apparent.In order to solve the above problems, many scholars have proposed corresponding improvement schemes by changing the training objectives and improving the network structure.Such as Residual Neural Network (ResNet) [34] based on residual learning, Highway Network [35] and so on.Although the above methods have certain differences in structure, the main structure still follows the feedforward mode of short-circuit connection between the front layer and the back layer in the convolutional neural network.Feedforward mode can ensure maximum information transfer between convolutional layers.Each densely connected layer concatenates the outputs of all the previous layers, and then the output features of this layer are directly passed to each subsequent layer to ensure the feedforward characteristics of the network.This connection method allows each convolutional layer to utilize the gradient of the loss function and the initial input information, which reduces the risk of gradient disappearance while deepening the network.In addition, dense connections also enhance feature transfer, which makes more efficient use of the information extracted by each convolutional layer and the initial input information.
Therefore, in order to improve the utilization of seismic data features, a dense connection method is adopted between multi-scale convolutions.The densely connected structure is shown in Figure 4.
In the dense connection, let the input of the i-th layer be X i in and the output be X i out , the specific calculation is as follows: where ] is stitching all the features in front of the i-th layer.H i represents a non-linear map combination of batch normalization and ReLU operations.

3) BILSTM FAULT IDENTIFICATION BASED ON RECURRENT NEURAL NETWORK
The size of the final output feature of the module is controlled at 32×32 through MCD as the input of BiLSTM.If the cyclic neural network is used to process the seismic profile, it is generally multi-dimensional, and the amount of calculation is very large.Here, the ReNet network is used for processing [37], as shown in Figure 5.
The feature map output by MCD is divided into several sub-images, each sub-image is expanded by a long-short memory module unit, and then a horizontal bidirectional scanning is performed.As a single time point, each image is input into the long and short-term memory module in turn to extract lateral temporal features.Finally, the horizontal and vertical features are stitched together.In general, the spatial features extracted from the seismic profile and fault data through a multi-scale densely connected module are used as the input to the encoding-decoding structure.Multiple BiLSTMs scan the feature images horizontally and vertically to extract different time series features.After the extracted features are fused, the feature map is restored by sampling to make it the same resolution as the input.Finally, the softmax function is used to classify the data on each seismic profile to obtain the probability distribution of whether each position is a fault.

4) STRUCTURE OF MCD-ABILSTM
Through the above analysis and research, the designed deep learning network must meet the following two characteristics: First, it can autonomously extract fault features from training data, complete the training of fault identification model, and realize end-to-end input and output.Second, feature extraction is the foundation and the key.Feature extraction occupies the most resources in the structure and operation of the model.Since faults have local, global and temporal features, the designed network must be able to extract these features and make full use of them.Therefore, we must take a series of improvement and optimization measures in specific modules according to seismic profile data and fault characteristics.In this regard, based on the advantages of CNN and RNN, this paper constructs a deep learning network for fault feature extraction.The network is mainly composed of two parts: MCD module and ABiLSTM module.The beginning and end are the input of seismic profile data and the output vector of seismic fault features extracted by the model, as shown in Figure 6.
The main structure of the MCD module is dense connection.ABiLSTM adopts the classical encoding-decoding image segmentation structure as a whole.In addition to data reconstruction and denoising, the encoding-decoding structure is often used as a framework for image segmentation algorithms.For example, FCN, U-Net network and other networks use the encoding-decoding structure.FCN network is a framework for semantic segmentation proposed   Finally, in order to ensure the effect and robustness of the network, FCN adopts a skip-level structure between convolutional layers of different depths, such as FCN-8s.The specific structure is shown in Figure 7.
Although FCN has taken a series of measures to successfully apply convolutional neural network image classification to image segmentation, there are still many shortcomings.FCN does not consider the relationship between data features, so it lacks spatial consistency, resulting in rough segmentation results.FCN tends to ignore the details of the data.In this regard, Olaf Ronneberger et al. proposed the U-Net network in 2015 [39].The U-Net network was originally used to solve the problem of biological image segmentation, and its overall structure is divided into two parts, up and down sampling, as shown in Figure 8.

C. CHOICE OF LOSS FUNCTION
In image segmentation tasks, cross-entropy is generally used as the loss function to train the network.The cross-entropy loss function is effective when the training sample set is balanced.It can effectively reduce the overall classification error.The overall cross-entropy for binary classification is calculated as follows:  where L is the loss value, y (i) is the input value, ŷ(i) is the predicted value.Fault identification of seismic profiles is a binary classification problem.If the fault is marked as 1, the background of the non-fault is 0. In the mask data, the data volume of faults and non-faults is completely unbalanced, and the data volume of faults is much smaller than that of non-faults.If the conventional cross entropy is used as the loss function of the model in this paper, the loss value during training is very low, and the model converges quickly.However, the effect of fault identification is not ideal, and many small faults cannot be identified.The main reason is the large proportion of non-faults.The effect of background classification during training is very high, and the effect of fault identification cannot prevent the direction of model convergence at all.In the end, the model makes predictions arbitrarily.Therefore, the weighted cross entropy is used as the loss function of the model in this paper, and different weights are set for faults and non-faults to amplify the loss value of faults to achieve a balance between faults and non-faults.The expression for weighted cross entropy can be expressed as: where α is the proportion of fault data in the entire data set, which can be set by simple statistical calculation, γ is the attention parameter, (1 − ŷ) γ is the modulation coefficient, by which the weight of the background is reduced, so that the model is more focused on the faults that are difficult to classify.

III. DATA SET CREATION A. CREATION OF INITIAL LABEL DATA
On the basis of conventional method processing, this paper is based on the seismic data in the central part of the study area, combined with the results of fault interpretation as the data set for this deep learning method research.The fault identification good effect is 16 × 16, with a total of 500 profiles.Each interpretation profile contains a different number of seismic traces, and the number of sampling points for each trace is also different.The sampling interval is 2ms.In this regard, in order to reduce the amount of calculation and unify the input size, we use the largest rectangle to intercept the amplitude data of the seismic profile, which is convenient for subsequent data processing and feature extraction.In order to further reduce the computational complexity, we normalized the extracted seismic data and displayed it in the form of a single-channel grayscale image.
First of all, due to the cumbersome process and large computational load of the conventional method, the staff only made a sparse interpretation (16 × 16) considering labor and time costs, so the seismic profile only contains part of the fault label data.Second, this paper adopts the BiLSTM module, which can extract the time series and global features of the fault, regardless of the mirroring and rotation operations on the cropped data.In this regard, the dataset is augmented only by the tailoring of seismic profiles.During the cropping process, we set the cropped window size to 401 × 401, and set the horizontal step size and vertical interval to half of the cropped window length.The specific process is shown in Figure 9.
After image cropping, each seismic profile is divided into several subsections, and the entire dataset is expanded from 500 to 6000.The input to the deep learning model becomes a single-channel grayscale value.The size of the mask data matrix of the fault is the same as that of the seismic data, where the fault is marked as 1 and the non-fault is marked as 0, as shown in Figure 10.In Figure 11, the tomographic mask data (a) is traversed using the template (b).In the process of moving template b, when the position of the fault (the position of 1 in the data) coincides with the position of the center point of template (b), the area occupied by (b) is called the result of the expansion of the fault under the action of the template.Since both the mask and the template are data of 0-1 structure, after finding the position of the fault, it is only necessary to perform the convolution operation on (a) and (b). Figure 4-11(c) shows the result of dilation (convolution).The actual operation process and effect are shown in Figure 12.

IV. SEISMIC PROFILE FAULT IDENTIFICATION A. EXPERIMENTAL ENVIRONMENT AND EVALUATION INDICATORS 1) EXPERIMENTAL ENVIRONMENT
In the process of model training and testing, the model not only requires a large number of matrix operations, but also ensures that the computer maintains a stable state during the iterative convergence of the model.Therefore, a high-performance GPU capable of processing graphics computing power and stable performance is required.For this, Dell servers are used.In terms of software configuration, the experimental environment is completed under CentOS7, using the current mainstream programming language Python3.6.6 version.The code for the model implementation is based on Keras and pytorch, and the results are displayed using the regular OpenCV library in python.The hardware and software configuration of the server is shown in Table 1.

2) EVALUATION INDICATORS
In order to quantitatively evaluate the fault identification method proposed in this paper, precision (Precision, Prec), recall (Recall, Rec) and F1 value are used to quantitatively analyze the results.The precision indicates the proportion of the number of tomographic pixels correctly detected to the total number of detected tomographic pixels.The recall rate represents the ratio of the number of tomographic pixels that are correctly detected to the number of tomographic pixels that should be correctly detected.F1 is a comprehensive evaluation index of precision and recall.The three indicators are defined as follows: where TP represents the number of pixels for which the fault is correctly detected, FP represents the number of pixels that are predicted to be faults in non-faults, FN represents the number of pixels that the actual fault is not detected as fault.

B. MODEL TESTING AND ANALYSIS 1) MODEL STABILITY TEST
The validity and stability of MCD-ABiLSTM are tested before the comparative experiments of fault identification methods.We randomly selected 500 sections and cropped them into 6000 samples.5800 of them are used as training set and 200 are used as test set.Every 50 iterations of the model, one seismic section in the test set is selected for testing, in order to test the validity of the model.Before the test, the parameters of the model are set, and the main parameter settings are shown in Table 2.
During the training process of the model, the test effect is shown in Figure 12. Figure 12(c) shows the effect of the identification test with 50 iterations after initializing the network parameters.Since the model has not undergone extensive training, the results are not informative.Figure 12(d) shows the effect of the fault test after 100 iterations of the model.The approximate distribution and shape of the faults can be seen from Fig. 12(d).However, the fault line is thick, the fault edge is blurred, and the degree of fineness is not high.In addition, a small amount of white noise (misidentification area of fault) appeared in the identification result.Figure 12(e) shows the fault identification results after 200 iterations of the model.From the identification results, the effect is improved compared with 50 and 100 iterations.The identified fault lines become thinner and white noise disappears.However, the fineness of the fault edge needs to be improved.Figure 12(f) shows the test results after 500 iterations of the model.The test results show that the identification effect is very close to the mask data image.The above experiments show that in the training process of the model, the fault identification effect increases steadily with the increase of the number of iterations.It shows that the model constructed in this paper has good stability and validity.

2) TESTS WITH DIFFERENT SAMPLE SIZES
The experiments in the previous stage tested the validity and stability of the model, and the experiments in this stage 124122 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.used different sample sizes to test the identification effect of the model.In this way, the performance of the model under small samples is evaluated, and the impact of the small sample data set on the performance of the model is verified.For this, we randomly selected 5% (290 subsections), 10% (580 subsections), 20% (1160 subsections), 40% (2320 subsections) and 80% (4640 subsections) in the original dataset subsection) samples as the training set.The number of iterations in the experiment was set to 500.The results of the experiment use precision, recall rate and F1 value as the evaluation indicators of the test.The test results are shown in Table 3 and Figure 14.It can be seen from the evaluation indicators that as the number of samples increases, the precision, recall and F1 value increase.The effect improves significantly as the number of samples increases.When the number of samples is 5%, the precision is 59.57%, and when the number of samples is 80%, the precision is 92.19%, an overall improvement of 32.62%, see Figure 15 for details.
It can be seen in the effect map of identifying faults that when the number of samples is 5% of the original, the fault area is relatively thick, very rough and discontinuous, and the wrongly identified white noise area is large, as shown in Figure 15(b).When the number of samples is increased to 40%, the general shape and direction of the identified faults are similar to the mask data image, and the misidentified area is also reduced a lot, as shown in Figure 15(e).For 80% of the sample training set, the model constructed in this paper has achieved good results, as shown in Figure 15(f).The experiments at this stage fully demonstrate that the model constructed in this paper can fully extract the characteristics of seismic sections and faults, and is capable of processing smaller sample data sets.

3) PERFORMANCE COMPARISON OF DIFFERENT METHODS
In order to further test the performance of MCD-ABiLSTM, the experiments in this stage use the current mainstream   In the experiment, 6000 sample data sets of 500 seismic sections in the study area were also used after cutting.Among them, 5800 seismic sub-sections are the training set, and 200 seismic sub-sections are the test set, which were tested in the same environment.In order to ensure good experimental result, the number of iterations of the experiment was set to 500 times.The test data results are statistically obtained to obtain the precision, recall rate and F1 value of the five network methods, as shown in Table 4 and Figure 16.From the data of evaluation indicators, the overall effect of fault identification of the model constructed in this paper is better than that of the current mainstream segmentation network.The precision, recall and F1 value reached 93.13%, 95.41% and 94.21%.
In addition to the analysis of the statistical results of the evaluation indicators, five method models were selected to compare the fault identification effects, as shown in Figure 17.As can be seen from 17(b), although the overall distribution and trend of the faults identified by FCN are similar to the mask data image, the identified faults are discontinuous.In addition, there are a large number of white noise points in the results of FCN identification, indicating that the simple convolution operation of FCN cannot extract the overall characteristics of the fault, and it is far from meeting the refinement requirements of fault interpretation.Figure 17(c) shows the fault identification effect of U-Net.It can be seen from the figure that the fault identified by U-Net is thinner than that of FCN, and the white noise is much less than that of FCN. Figure 17(d) and Figure 17(e) are the fault identification results of U-Net++ and Deeplab V3, respectively.The results show that the effects of the two methods are comparable, and the effect is significantly better than that of FCN and U-Net.The identified faults have no obvious white noise points, and the local refinement degree is relatively high and continuous.Although U-Net++ and Deeplab V3 are effective in identifying faults with obviously discontinuous reflection events, they have poor performance in identifying small faults with insignificant reflection axis dislocations.For example, in Fig. 17  continuity of the fault space, there are fewer misidentified white noise points.The effect is better than U-Net++ and Deeplab V3.

C. EXPERIMENTAL SUMMARY
The experiments in this paper verify the stability and effectiveness of the MCD-ABiLSTM constructed in this paper by comparing various methods and methods, and demonstrate the potential of this method in fault identification.First, the stability of the model is verified by step-by-step iteration and sequential analysis.Second, the fault identification ability of the method to deal with samples of different data volumes is verified.In the experiments, the models were trained and tested using 5% to 80% of the samples.The test results show that when the sample size is 80%, the precision, recall and F1 value of MCD-ABiLSTM reach 92.19%, 94.29% and 89.94%, respectively.The recognized image results can well determine the distribution and trend of faults.Finally, in the comparative experiments of various methods, the method proposed in this paper is superior to the current mainstream deep learning fault identification methods, no matter from the data statistics of the evaluation indicators or the results of fault identification.This fully shows that the multi-scale spatial features of seismic profile and fault data combined with global and time series features have great advantages in fault identification.
Although the method proposed in this paper has achieved certain results, there are still some flaws.First, in the process of fault identification, the model inevitably incorrectly identifies the data around the fault as a fault, resulting in a coarser identification result than the manual fault division.In the later experiments, automatic processing must be carried out later to make the displayed results closer to the actual situation.Second, the method proposed in this chapter is only optimized from the model, which improves the effect of fault identification.If the data is optimized, it is possible to further improve the effect of fault identification.

V. CONCLUSION
This article proposes a fault recognition method based on MCD-ABiLSTM.First, in deep learning fault identification, the labels of the fault and the background labels are extremely unbalanced.In response to this problem, we use the fault label inflation method to effectively alleviate this imbalance.In addition, weighted cross-entropy is adopted as the loss function in the model, which further alleviates this imbalance.Second, The MCD-ABiLSTM method is proposed for fault detection in seismic profiles, which addresses local, global, and temporal morphological features.This method utilizes multi-channel convolution with varying kernel sizes and depths to extract multi-scale seismic profile features and enhances the model's sensitivity to small targets.Convolutional neural networks primarily focus on spatial features, with less emphasis on temporal dimensions.However, seismic profiles exhibit both spatial and temporal characteristics.To address this, a combination of Bi-directional Long Short-Term Memory based on Recurrent Neural Networks is employed to sequentially extract spatial and temporal features from seismic data, improving the representation of seismic attributes in both space and time and enhancing fault detection accuracy.Finally, a comparison is made with other methods such as Full Connect FCN, U-Net, U-Net++, and Deeplap V3.The comparative results demonstrate that our method achieves better results in terms of efficiency and accuracy.

FIGURE 1 .
FIGURE 1. Fault label data.(a) is the seismic grayscale profile with faults (red in the figure is the fault).(b) is the label mask data corresponding to the interrupt layer in Figure 1(a).The 0-1 data in (b) make up the mask labels for seismic faults.

FIGURE 3 .
FIGURE 3. The structure of multi-scale convolution.

FIGURE 4 .
FIGURE 4. The structure of densely connected networks.

FIGURE 5 .
FIGURE 5.The network structure of the recurrent neural network for image segmentation.(modified according to ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks).The 32 × 32 feature map is divided into 16 × 16 2 × 2 sub-images.Then a recurrent neural network is used to scan to extract a 16 × 16 feature map.Then, the feature map is divided into 8 × 8 sub-images of 2 × 2 for longitudinal scanning, and 8 × 8 feature maps are extracted.

FIGURE 8 .
FIGURE 8.The structure of U-Net.Similar to other segmentation networks, U-Net uses convolutional layers and pooling layers to replace the fully connected layers in convolutional neural networks.Downsampling is used to extract features from the tomographic data, upsampling is used for feature concatenation and a 1 × 1 convolution in the last layer solves the classification problem of each data in the array.In addition, the U-Net network includes a contextual information shrinking path and a symmetric expansion path for precise localization.The combined use of the two paths ensures that the network can be trained end-to-end, ensuring that the input and output arrays are of the same size.

FIGURE 10 .
FIGURE 10.Seismic Fault Label Mask Map.(a) is seismic subsection, (b) is manual interpretation result and (c) is label mask map.

FIGURE 11 .
FIGURE 11.Dilation process of fault labels.(a) is the mask data of the fault, (b) is the template data, (c) is the expanded mask data.

FIGURE 12 .
FIGURE 12. Expansion process and results of seismic fault labels.

FIGURE 13 .
FIGURE 13.Fault identification effect of different iterations.(a) is the human interpretation result, (b) is the inflation of the label mask data, (c) is the result of 50 iterations, (d) is the result of 100 iterations, (e) is the result 200 iterations, (f) is the result of 500 iterations.

FIGURE 14 .
FIGURE 14. Seismic fault identification effect with different sample sizes.

FIGURE 15 .
FIGURE 15.Fault identification results for different sample sizes.(a) is the result of manual interpretation, (b) is the identification result of 5% sample size, (c) is the identification result of 10% sample size, (d) is the identification result of 20% sample size, (e) is the identification result of 40% sample size result.(f) is the identification result of 80% sample size.

FIGURE 16 .
FIGURE 16.Fault identification test results for different methods.
(d), U-Net++ identifies discontinuous faults at the right position of the seismic profile.In Figure 17(e), Deeplab V3 has misidentification in the upper right corner.Figure 17(f) is the identification result of the MCD-ABiLSTM constructed in this paper.The results show that the faults identified by the model are better overall and more refined locally.While ensuring the shape and

FIGURE 17 .
FIGURE 17. Test results of different methods.

TABLE 3 .
Seismic fault identification test results with different sample sizes.

TABLE 4 .
Fault identification test results for different methods.