End-to-End Deep Learning Architecture for Separating Maternal and Fetal ECGs Using W-net

Fetal cardiac monitoring and assessment during pregnancy play a critical role in the early detection of the potential risk of fetal cardiac problems, thus allowing for timely preventive measures and healthy births. It is necessary to continuously monitor the fetal heart for this purpose. Methods of fetal cardiac monitoring by extracting maternal and fetal electrocardiograms (ECGs) from maternal abdominal ECGs have been extensively investigated. However, the extraction of a clear fetal ECG is a major challenge because fetal signals are typically dominated by maternal ECG signals and noise. Most existing methods for fetal ECG extraction involve several steps, such as extracting and removing the maternal ECG and then extracting the fetal ECG. To address the complexity of this process, we propose a novel method for effectively decomposing a single-channel maternal abdominal ECG into a maternal ECG and fetal ECG without using multiple steps by employing an end-to-end deep learning network architecture using W-net. Model training is performed using a simulation dataset. Then, a fetal ECG is extracted from a real maternal abdominal ECG. The performance of the proposed architecture is compared with that of other state-of-the-art deep learning models on the basis of the detection of QRS complexes. The proposed model shows higher precision and recall values and F1 scores. This demonstrates that the proposed model can effectively extract a fetal ECG from a single-channel maternal abdominal ECG. The model is expected to contribute to commercial applications for long-term maternal and fetal monitoring.


I. INTRODUCTION
One of the major objectives of prenatal care is to conduct fetal monitoring to detect abnormalities, reduce mortality, and ensure healthy births. Fetal cardiac monitoring and assessment during pregnancy are used for the early detection of the potential risk of fetal cardiac problems, thus allowing for timely measures to prevent unexpected events that may compromise the health of the fetus. Ultrasound-based methods and electrocardiograms (ECGs) are used for fetal cardiac monitoring [1,2]. A fetal ECG (fECG) measures the electrical activity of the fetal heart, which is one of the most important biosignals for the detailed evaluation of cardiac structure and function [3]. An fECG is measured using an invasive method, in which electrodes are directly attached to the fetal scalp, or a noninvasive method, where the fECG is extracted from the maternal abdominal ECG (maECG) signals measured by a noninvasive sensor attached to the mother's abdomen. The invasive method is the most reliable, but it has the disadvantages of invasive measurements and risk to fetal health. In addition, it can only be applied in cases of ruptured membranes during labor. Conversely, the noninvasive method has the advantage that it can record the fetal heart rate using uterine contractions, thereby providing safe and continuous monitoring. However, it is difficult to accurately measure the changes in the fetal heart rate owing to maternal ECG (mECG) signals and noise. Thus, a method is required to obtain clear fECGs [4][5][6].
The measurement of an fECG from an maECG is a promising method for fetal monitoring. Therefore, studies have actively investigated the extraction of an fECG from an maECG. The most commonly used methods include adaptive filtering [7], blind source separation [8,9], and template subtraction [10,11]. In general, algorithms for extracting an fECG involve multiple steps, such as preprocessing, extraction, the removal of the mECG from the maECG, and the extraction of the fECG from residual maECG signals. As the extracted fECG signals generally have a low signal-to-noise ratio (SNR), several studies have proposed methods for increasing the SNR by improving the fECG signal quality.
In recent years, methods based on deep learning architectures, such as recursive neural networks, autoencoders, and convolutional neural networks (CNNs), have been extensively used to remove noise from signals or images [12][13][14][15]. Additionally, these methods have been widely used for removing noise from adult ECGs, detecting arrhythmia, and extracting fECGs [16][17][18][19][20][21][22][23]. Ting et al. [18] obtained maECG signals from multiple electrodes attached to the mother's abdomen. These signals were converted to a spectrogram by applying the short-time Fourier transform. Then, the spectrogram was sent to a 2D CNN for detecting the fetal heart rate. Their method was implemented on a field-programmable gate array platform. However, this method requires at least four channels for extracting maECG signals. Thus, several electrodes must be attached to the mother's abdomen, making the measurement process complex. Fotiadou et al. [19] effectively denoised a single-channel fECG using a fully encoder-decoder CNN framework. However, as this method is used after mECG signals are completely removed, it is difficult to directly extract an fECG from only the maECG. A recent study used a residual U-net architecture to remove mECG signals and enhance fECG signals [20]. However, this method is limited in that it does not accurately remove mECG signals when the amplitudes of the fECG and mECG signals are similar. This issue can be resolved with AECG-DecomposeNet using two U-net architectures in series, one to extract an mECG and remove it from the maECG, and the other to effectively extract the fECG [22]. However, this method requires two networks. Another study proposed a model for simultaneously extracting mECG and fECG signals using a generative adversarial network, and this model effectively detected the location of fetal QRS complexes [23]. However, to increase the accuracy in detection of the fetal QRS complexes position, a part of the test data must be used, but the model structure is complex and takes a long time to train.
Thus, to solve the shortcoming of these existing studies, we propose an algorithm for the simultaneous extraction of fECG and mECG signals from a single-channel maECG using an end-to-end deep learning neural network architecture. In the proposed architecture, W-net is employed to remove the mECG features from each layer. After extraction, the mECG signal is reconstructed from one end of the decoder layers and the fECG signal is reconstructed from the other end. In addition, we use only the simulation data as training data. We confirm that the fECG and mECG can be well separated from a single-channel maECG through our proposed model in real data.

II. METHOD
An maECG is a complex mixture of an mECG and fECG. Although it is highly challenging to extract an fECG from an maECG using a single channel, it is convenient for the mother and fetus and can be used to effectively monitor the fetal status. The proposed model is used to extract an mECG and fECG via an end-to-end process without multiple steps using a singlechannel maECG as the input. The proposed model has a Wshaped architecture. In the encoder layer attached to the middle of the architecture, mECG features are removed at every encoder step to extract opposite-facing fECG features. Thus, only fECG features are effectively extracted. Finally, fECG and mECG signals are reconstructed using the decoder layers at each end. In the next subsection, we describe the proposed neural network architecture.

A. PREPROCESSING
The resampling rate for the maECG signals was 250 Hz. A bandpass filter was applied between 3 Hz and 90 Hz [23]. Next, maECG signal windows were divided into 1024 samples and Z-score normalization of the maECG signal [15] was performed.

B. W-NET ARCHITECTURE
The U-net architecture was modified for the simultaneous decomposition of an maECG into an fECG and mECG. The proposed neural network architecture is illustrated in Fig. 1.

1) TWO U-NET STRUCTURES
mECG and fECG features were extracted at multiple resolutions, and two U-net structures were combined for reconstructing the signals. The channel sizes for each layer were 16, 32, 64, 128, and 256; therefore, the features were extracted in the encoding section at multiple scales. In the network for extracting the fECG, the channel size was set as 512 in the bottleneck block area (indicated by the black box in Fig. 1). Hence, the structure of this network was different from that of the network for extracting the mECG. In addition, as the features of the QRS signals of the mECG and fECG were different, the kernel sizes of each encoder were set as 35 and 4 in the networks for extracting the mECG and fECG, respectively. This is because high-resolution signal features can be obtained using a small kernel size and more global signal features can be extracted using a large kernel size [14,24]. As fECG signals contain more high-frequency components compared to mECG signals [25], the kernel size for the fECG extraction network is smaller than that for the mECG extraction network to effectively separate the fECG. Zero padding was used to maintain the signal amplitude, and the stride was set as 1. All convolution and deconvolution layers were subjected to batch normalization. The leaky rectified linear unit activation function was used, and the value of alpha was set as 0.01.

2) SUBTRACTION
The mECG and fECG features were extracted from the two decoder steps. To effectively extract the fECG features from the maECG and effectively remove the mECG signals, the mECG features were subtracted at each encoder step to extract only the fECG features. tanh was used as the activation function during the subtraction.

C. TRAINING PARAMETERS
The mean absolute error (MAE) was used as the loss function to optimize the proposed network. We selected this function because its loss value is smaller than that of the mean square error cost function. Furthermore, in a recent study, the MAE was used for the maximal elimination of outliers to reconstruct signals and optimize the model [22]. We utilized the Adam optimizer. The training parameters were a learning rate of 0.001, decay of 0.25, batch size of 64, and epoch of 60. Model training was performed using an Intel core i-710750H CPU with 16 GB RAM and a GeForce RTX3060 GPU. The model code is available at https://github.com/lightjin619/fecg.git.

1) SIMULATION DATA
Simulation signals were used for model training. The FECGSYN dataset [3] by PhysioNet provides maECG data for various situations, which reflect the data obtained in actual practice. It comprises the data of 10 subjects, with signals obtained from 34 channels (32 maECGs and two mECGs). These data are suitable for model training because they are divided into mECG signals, fECG signals, and noise. There are five different noise levels (0, 3, 6, 9, and 12 dB) and five types of signals that imitate real-world situations. This dataset was independently simulated five times. Each simulation was performed using a 5-min signal at a sampling rate of 250 Hz. The various contexts of the simulation data are as follows.

2) FETAL ECG DATABASE
We compared the performance of the proposed model with that of state-of-the-art deep-learning-based fECG extraction algorithms using set A of the 2013 PhysioNet/Computing in Cardiology Challenge database (PCDB) [26]. This dataset contains the R-peak points of fECG signals and four maECGs sampled at a rate of 1 kHz. It comprises the data of 75 subjects. For comparative analysis with AECG-DecompNet and Res-Unet, which are recently developed algorithms, only the datasets used by Zhong et al. [21] and Rasti-Memandi et al. [22] were employed. The selected datasets are presented in Table I. In addition, a comparative evaluation was performed using real data from the abdominal and direct fECG database (ADFECGDB) [27], which consists of direct fECG data. It contains four-lead maECG signals and one set of direct fECG signals obtained via invasive measurement through the fetal scalp. The signals have a measurement length of 5 min at a sampling rate of 1 kHz.

3) EVALUATION METHOD
The results of ECG R-peak detection were used for evaluation. To detect the R-peak, we used the adapted version of the Pan and Tompkins algorithm which is a commonly used algorithm for ECG R-peak detection [11,21]. The F1 score, recall, and precision were calculated to obtain the model performance. The equations for these parameters are as follows: where true positive (TP) indicates the number of correctly detected R peaks, false negative (FN) is the number of failed detections, and false positive (FP) is the number of detected R peaks whose location is different from the actual R peak by more than 50 ms [11]. Fig. 2 shows the simulation results. The mECG and fECG were effectively extracted using the proposed model. Table II shows the median of the F1 value and the interquartile range in all channels. The performance of the proposed model was similar to that of other classic algorithms such as Extended Kalman Filter (EKF) [11] and Template Subtraction based on Principal components analysis (TSPCA) [11] when there was almost no noise (cases 0, 1, and 2). Even for a high level of noise (cases 3 and 4), owing to uterine contractions, the proposed model showed high performance, demonstrating good results of model training. Fig. 3 shows the mECG and fECG extracted from the PCDB. The proposed method was effective in separating the fECG and mECG. Fig. 4 shows the mECG and fECG extracted from the ADFECGDB. The comparison of the ECG signals that were directly measured from the fetal scalp with the fECG extracted from the maECG confirmed that the location of the QRS peaks was accurately extracted. Table III shows the comparison of the performance of the proposed model and state-of-the-art deep learning models on the PCDB and ADFECGDB. The mean recall, precision, and F1 score calculated from the fECG extracted from the maECG are shown for both datasets. For the PCDB, the F1 score of W-net was 1.49% higher than that of AECG-DecompNet. The precision values for both networks were similar. However, the recall value for the proposed model was 2.22% higher than AECG-DecompNet. A comparative evaluation was performed  with an asymmetric CycleGAN model. In the case of the ADFECGDB, the performance of the proposed model was better than that of RCED-net and Res-Unet, which are existing architectures composed of a single-frame network. However, the F1 value of the proposed model was approximately 0.59% lower than that of the CycleGAN model.

IV. DISCUSSION
We proposed a method for effectively extracting mECG and fECG signals from single-channel maECG signals using a W-net architecture without multiple steps. The main purpose of this study was to successfully separate fECG signals from an maECG using an end-to-end deep learning network architecture and detect the location of the QRS complexes of fECG signals with high accuracy. Most existing methods use a two-step process for extracting fECG signals, in which the mECG signals are extracted first and then the fECG signals are extracted from the residual signals.
In the proposed network architecture, mECG and fECG signals were separated in a single step. In particular, as shown in Table III, although we used a single-frame network for real data, the performance of the proposed architecture was better than that of AECG-DecompNet. Moreover, the performance of the proposed model was better than that of RCED-Net [20] and Res-Unet [21]. In particular, as shown in Fig. 5, the proposed model effectively overcame the limitation of the two models, i.e., they were unable to remove mECG signals when there was an overlap between the QRS locations of mECG and fECG signals [20]. This was because, similar to the architecture for the effective removal of backgrounds in image processing [28], maternal features were subtracted in the encoder section, thereby removing mECG signals from the output used to extract fECG signals. In the results obtained using the ADFECGDB, as shown in Table III, the performance of the proposed model was worse than that of the asymmetric CycleGAN owing to the difference in the model training process. In the training process of the asymmetric CycleGAN, a few of the datasets used for evaluation were   used as training datasets [23]. However, in the proposed model, the dataset used for evaluation was not used for training. Nevertheless, the performance of the proposed model was close to that of the asymmetric CycleGAN, indicating that the proposed model was superior in terms of the training process. However, as shown in Figs. 3 and 4, the shapes of the fECG signals were not completely extracted and the signals contained noise. Thus, the proposed method cannot extract the shapes of the P and T waves. This problem can be resolved by training the proposed model using real datasets and not just simulation datasets.

V. CONCLUSION
We proposed an end-to-end deep learning model to effectively extract mECG and fECG signals from single-channel maECG signals. mECG and fECG signals could be effectively separated and extracted from simulation and real datasets. In addition, the results of R-peak detection in fECG signals showed high F1 scores and precision values. Therefore, the proposed deep learning framework is an effective tool for the long-term simultaneous cardiac monitoring of the mother and fetus. The use of this model in software that implements human computer interface is expected to contribute to commercial applications. In subsequent research, we plan to investigate a method for minimizing the distortion in the shapes of the fECG signals, which is identified as a limitation of this study.