A Multi-Task Group Bi-LSTM Networks Application on Electrocardiogram Classification

Background: Cardiovascular diseases (CVD) are the leading cause of death globally. Electrocardiogram (ECG) analysis can provide thoroughly assessment for different CVDs efficiently. We propose a multi-task group bidirectional long short-term memory (MTGBi-LSTM) framework to intelligent recognize multiple CVDs based on multi-lead ECG signals. Methods: This model employs a Group Bi-LSTM (GBi-LSTM) and Residual Group Convolutional Neural Network (Res-GCNN) to learn the dual feature representation of ECG space and time series. GBi-LSTM is divided into Global Bi-LSTM and Intra-Group Bi-LSTM, which can learn the features of each ECG lead and the relationship between leads. Then, through attention mechanism, the different lead information of ECG is integrated to make the model to possess the powerful feature discriminability. Through multi-task learning, the model can fully mine the association information between diseases and obtain more accurate diagnostic results. In addition, we propose a dynamic weighted loss function to better quantify the loss to overcome the imbalance between classes. Results: Based on more than 170,000 clinical 12-lead ECG analysis, the MTGBi-LSTM method achieved accuracy, precision, recall and F1 of 88.86%, 90.67%, 94.19% and 92.39%, respectively. The experimental results show that the proposed MTGBi-LSTM method can reliably realize ECG analysis and provide an effective tool for computer-aided diagnosis of CVD.


I. INTRODUCTION
Cardiovascular disease is the leading cause of death around the world [1]. The total number of deaths caused by CVD increased from 12.3 million (25.8%) in 1990 to 17.9 million (32.1%) in 2015. Deaths caused by CVD are more prevalent at certain ages and are increasing in most developing countries [2], [3]. It is estimated that up to 90% of CVD may be preventable [4], [5]. Most CVDs can be diagnosed by ECG data. ECG is time-series physiological data. The electrical signal is generated by the sinoatrial node, which passes through the atrioventricular node and Purkinje fibers, and finally reaches the ventricle [6].
The automatic analysis of ECG is of great significance to the early detection and treatment of CVD. With the development of AI, more and more deep learning [7] methods are applied to address medical data [8]- [10], such as feedforward neural network [11]- [14], and the Recurrent Neural Networks (RNN) [15] ( Long Short-Term Memory (LSTM) [16], [17], Gate Recurrent Unit (GRU) [18]). However, most of previous studies used the MIT-BIH-AR database [19]- [21], which only has 48 patients. And they only used one-lead ECG data for model training. Since the same person's ECG data fragments are highly correlated [14], [22], this leads to high accuracy. Once other databases are used or are actually used in clinical practice, the performance of their models will decline dramatically and are not robust.
The multi-lead ECG can provide a full evaluation of the cardiac electrical activity (includes arrhythmias, myocardial VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ infarction, conduction disturbances) [23]. Each lead reflects the electrical signal changes in different parts of the heart. The diagnosis of CVD often needs to be combined with the change of electrical signals in different parts of the heart. This means that each lead signal and correlation among the leads have the same importance for intelligent analysis of ECG.
The accuracy of these methods is not high, and these studies are simply predicting a few types of CVD categories [23], [24]. These methods use the depth of the model in exchange for the performance improvement of the model, without exploring the essential characteristics of the ECG. Previous works based on multi-lead ECG analysis did not pay attention to the correlation among different lead information.
The main contributions of this work can be concluded as follows 1. The number of leads for existing ECG data is not uniform, so we design a Multi-Task Group Bi-LSTM (MTGBi-LSTM), which uses any multi-lead ECG (2-lead or more) learning to simulate the assistant diagnostic process of cardiologists. In the training phase, the model can change the number of its groups according to the number of ECG leads input. This is very promising and influential for the clinical environment. 2. We propose a Group Bi-LSTM module, a combination of Global Bi-LSTM and Intra-Group Bi-LSTM, which can learn each lead individually and fuse the global information of different leads, especially it can sparse autoencoding ECG data. Its effectively enhances the expressiveness of signal representation with respect to ECG waveform features. 3. Last but not least, this is the first time that a multi-task framework has been applied to the CVD-assisted diagnosis. In addition, an attention mechanism for the temporal characteristics of ECG is proposed, which provides discriminative feature representation for multi-task module. And then a dynamic weighted loss function is proposed to solve the unbalanced proportion of positive and abnormal diseases.

II. METHODS
For multi-lead ECGs, we propose a Multi-Task Group Bi-LSTM model framework for deep mining of ECG timing features implicit in each lead and between individual leads. On this basis, attention mechanism is added to enable the model to retain key information with emphasis on the premise of guaranteeing comprehensiveness. In addition, through multi-task learning, the model can fully mine the association information between diseases and obtain more accurate diagnostic results. The model architecture is shown in Fig. 1 and Table 1.
The different lead data of the ECG is divided into different groups for Res-GCNN, and the number of leads of the ECG is the number of groups that need to be grouped. Such as, the ECG data is input to the top convolution layer in the size of 1 × 1900 × 12. 12 is the number of leads of ECG, this resulted in the number of Res-GCNN being 12.
The Res-GCNN module of each group is composed of contraction path and expansion path, as show in Fig. 2. And more structural details are in Table 1. In the contraction path, a deep residual network with three residual blocks [25] is designed to encode the single lead data of ECG. The first two residual module has three continuous convolutions layers with kernel size of 1 × 1, 1 × 3, and 1 × 1, and each convolution layer is followed by a rectified linear unit ReLU [26] and Batch Normalization [27] to improve sparsely. In the extended path, the last layer of the last residual block uses three filters of different sizes, which are 1 × 3, 1 × 3 dilated convolution (1x3DC, dilation rate is set to 2), 1 × 3 DC (dilation rate: 3), and an avg pool to convolute and pool the input data. The design of Res-GCNN module reduces the network parameters and the thickness of the feature map, and makes the feature map keep more high-order features at different levels, enriching the expressive ability of the network [28]. Finally, all sub-layer outputs within each group are cascaded and transmitted to the Group Bi-LSTM section.

B. GROUP Bi-LSTM MODULE
Group Bi-LSTM is divided into two parts, the global Bi-LSTM and the intra-group Bi-LSTM. The Group Bi-LSTM Module can dynamically change the number of its groups according to the number of ECG leads input, and combine  the global Bi-LSTM and intra-group Bi-LSTM autoencoder signal representation to provide high-resolution feature information for the final CVD-assisted diagnosis and recognition.

1) THE GLOBAL Bi-LSTM
The global Bi-LSTM updates the cell states by combining the long-term state information of the up and down time, after receiving the feature information of all the leads n i=1 X it at the current time, and synchronously sends the hidden state h gt of the output to the each intra-group Bi-LSTM. There is only one global Bi-LSTM in the Group Bi-LSTM Module.
In Fig. 3, the global Bi-LSTM removes or adds information to the cell states through three ''gates'' structure (forget gate, input gate and output gate) [29]. Gate is a way to allow information to pass selectively, including a sigmoid neural network layer and a pointwise multiplication operation. The feature sequence X = (X 11 , . . . , X it ) i ∈ (1,2, . . . , n) after Res-GCNN is used as the input of global Bi-LSTM.
The function of forget gate is to selectively discard the feature information of cell state. After inputting the spatial features of the ECG, we hope to pay more attention to the characteristic information of P wave, QRS wave and T wave. In the whole ECG, some relatively flat features may not affect the classification results, so we need to selectively discard them to extract the essential feature information of ECG.
where X it is the input value in LSTM, that is, the latent discriminative structural feature maps is extracted by the i-th Res-GCNN module. n is the number of leads of ECG. C t−1 , h gt−1 represent the cell state at the previous moment and VOLUME 8,2020 the output of the previous cell, respectively. W f , b f are the parameters to be learned by LSTM. σ represents the Sigmoid activation function. The function of the input gate is to control whether the feature information input at the current time is integrated into the memory unit. When it receives a certain feature of the ECG, it may be important for the prediction of the entire ECG or not.
where W i , W c , b i , b c are the parameters to be learned by LSTM. C t represents the state of the cell at the current moment. ϕ represents the Tanh activation function. The rest of the symbols have the same meaning as Eq. (1). The purpose of the output gate O t is to generate a hidden layer unit h gt from the memory unit C t . However, not all information in C t is related to the hidden layer unit h gt . C t may contain a lot of information that is useless to h gt . Therefore, the role of O t is to determine which parts of C t are useful for h gt and which parts are useless.
where W o , b o are the parameters to be learned by LSTM. h gt represents the output of the cell at the current moment.

2) THE INTRA-GROUP Bi-LSTM
The intra-group Bi-LSTM only accepts the feature map information of a single lead X it . After introducing the inter-lead feature information h gt from the global Bi-LSTM through the global access structure, updates the state of the cell by combining the ECG information at the time before and after. The input of ECG data has n-lead, and the Group Bi-LSTM Module has n Intra-Group Bi-LSTM. In Fig. 4, the intra-group Bi-LSTM removes or adds information to the cell state through three ''gate'' structures. The forward LSTM in the intra-group Bi-LSTM first accepts the hidden state of the forward LSTM in the global Bi-LSTM at the current time, and then combines with the current input sequence feature data and the hidden state of the previous moment to remove and update the cell state at the current time. Another reverse LSTM is similar.
The intra-group Bi-LSTM network computes a mapping from an input sequence X = (X i1 , . . . , X it ) i ∈ (1,2, . . . , n) to an output sequence h = (h 1 , . . . , h t ) by calculating the network unit activations using the following equations iteratively from t =1 to t: where X it represents the input value in the intra-group Bi-LSTM, in other words, ts the latent discriminative structural feature maps is extracted by the i-th Res-GCNN module.
It must be noted that it is distinguished from n i=1 X it , the input of Group Bi-LSTM module, in Eq. (1). h gt represents the hidden state of the forward LSTM in the Global Bi-LSTM at the current time, that is, the output state of the cell in the global Bi-LSTM in Eq. (1). σ , ϕ represents the Sigmoid and the Tanh activation function, respectively. represents the element level product operation.
are the parameters to be learned by LSTM.

C. THE ATTENTION MECHANISM
It must be noted that we employs the attention mechanism [30] on every Bi-LSTM (including the Global Bi-LSTM, Intra-Group Bi-LSTM). The Attention mechanism of this model includes two parts of calculation process. One part is about the process of calculating the probability distribution of attention, the other part is the process of calculating the final feature based on attention distribution (Fig. 5).
F represents the sum of the final hidden layer state values for each independent direction in Bi-LSTM, which is called the final state of Bi-LSTM.a denotes the attention probability distribution of the hidden layer unit state to the final state at all times, and component a t denotes the attention probability of Bi-LSTM state h t to the final state at t times. h t is obtained by adding the states of the each independent directions at that time.F nn denotes the ECG feature vector weighted by attention in Bi-LSTM. 1) The attention probability of the output data for the final state at the moment t. Enter the hidden state of the Bi-LSTM network layer at each moment into the attention module to generate a weight vector. The weight matrix is got by the following formula: The formula uses the softmax function as the calculation method of attention probability distribution. N denotes the number of input sequence elements. U is the weight matrix. F represents the sum of the final hidden layer state values for each independent direction in Bi-LSTM.h t represents the sum of two-way hidden layer state values at the moment t.
2) F nn , the final feature based on attention distribution, is expressed as: N denotes the number of input sequence elements. a t represents the attention probability of the output data for the final state at the moment t. h t represents the sum of hidden layer states in two independent directions at the moment t.

D. DYNAMIC WEIGHTED LOSS
Considering that in the field of ECG research, since most subjects have normal heartbeats and a small proportion of lesions, the ECG dataset is generally unbalanced. As shown in the distribution of ECGs in Table 3, N class accounted for 61.10% of the complete data set, SA, AA accounted for 18.14% and 7.72%, respectively, while the remaining 12 categories accounted for only 13.04% of the total. Therefore, there is an urgent need for a robust and stable classification method that can cope with different data sets.
In order to solve the problem of this unbalanced data set, most researchers use data augmentation techniques to expand the size of data sets [31]. However, this may take more training time, and for the examples that are difficult to classify (hard examples), the loss corresponding to the training will be very small, resulting in a small gradient change in the inverse calculation, and the convergence of the parameters is limited. Compared with other examples, we need to make this sample produce relatively large loss values.
Our method uses raw input data without adding any extra data. To increase the loss of the hard example, we adjusted the weights of the hard example and easy example. We recommend using the new dynamic weighted loss function defined below. First, we define the set of M heartbeat labels in the i-th batch as B i , and the j-th label in B i as Y i,j .
where Y i,j represents different categories in different tasks, Y i,j ∈ {N, AR, MI, VH, AH} in task one. Y i,j ∈ {N, SA, AA, JA, VA, HB, AMI, IMI, LMI, LVH, RVH, BVH, LAH, RAH, BAH} in task two. Then, the loss weight of the k-th class in the i-th batch, c i,k is calculated as follows: Among them, M is the batch size and ξ is set to 0.01 to prevent the loss weight equal to 0 when B i contains only one class. Finally, the weighted cross-entropy loss function of the first batch is calculated, as shown in formula (18).
whereỸ i,j are the predictive probability of the j-th training instance in the first batch, W is the weight matrix of all layers, and λ is the L2 regularization parameter, which is set to 0.01 in our experiment.

E. MULTI-TASK LEARNING
The single-task model focuses on a single task, ignoring other information that may help optimize metrics among diseases. The module trains two tasks related to ECG prediction in parallel, and the feature representation of one task can also be used by other tasks. It promotes the two ECG tasks to learn together. By sharing the network structure before the multi-task module [32], two tasks are trained in parallel using the shared representation. Each task keeps the relevant output on its own path. This resulted the parameters of ECG classification model are reduced, the phenomenon of over-fitting is avoided to some extent, and the generalization ability of the model is improved.
Combining the feature vectors learned from the group Bi-LSTM module. The final fully connected layer respectively generate the category distribution in each time step of two tasks, and at the same time get the loss function of the two VOLUME 8, 2020 tasks. Combining these two loss values according to different weights, the Adam optimizer algorithm is responsible for minimizing the joint loss value. The final classification loss is as follows: where λ is a parameter that interpolates between the task1 and task2 losses, L task1 , L task1 represent the loss functions of task 1 and task 2, respectively.

F. DATASET
The Chinese Cardiovascular Disease Database (CCDD), the largest ECG database all over the world, is developed by the Chinese Academy of Sciences [33]. More than 18,000 recordings in this database are collected from real world with high quality, each ECG record corresponds to one patient, which has been labeled by professional doctors. The records in the database are all 12-lead ECGs, each with a duration of about 10s, digitized at a rate of 500 sampling points per second. More details of screening patient records were contributed in supplementary information 3. Table 2 shows the data distribution of the ECG data sets after screening. Each ECG has two different label values, which are used as the label values of two tasks. Task1 was labeled as arrhythmia, myocardial infarction, ventricular hypertrophy, atrial hypertrophy and normal rhythm. Each ECG tag was labeled as normal rhythm and 14 subtypes of diseases corresponding to the four diseases in Task 1, including sinus arrhythmia, heart block, anterior myocardial infarction, left ventricular hypertrophy, etc. These 15 subtypes were labeled as Task2 label. The specific ECG distribution is shown in Table 3 and Table 4.

III. RESULTS AND DISCUSSION
In this section, we will present the results of the above experiments to verify the effectiveness of MTGBi-LSTM in multi-lead ECG classification tasks and its advantages over existing CNN-based or RNN-based methods.

A. PERFORMANCE OF MTGBi-LSTM
The model training process was evaluated by using the validation set. The classification accuracy of ECG validation set has been basically smooth and stable after 60000 iterations (Fig. 6). The model has stronger information extraction and fitting ability and its performance approximates the optimal. The results in Table 5 show that the accuracy, precision, recall, and F1 of MTGBi-LSTM reached the best scores of 0.8886, 0.9067, 0.9419, and 0.9239, respectively. Fig. 7 shows a confusion matrix for the classification results of the MTGBi-LSTM model on the clinical ECG test set. The normal and 14 abnormal rhythms in ECG data set are classified and the best recognition results are obtained. This comparative experiment proves the validity of the MTGBi-LSTM  framework proposed by us, which can deeply mine the essential features of ECG and accurately predict it.
The results obtained from the MTGBi-LSTM model are analyzed in detail. As shown in Table 5, the recognition rate of 14 abnormal diseases in the test set reached 85.92%. Among them, the recognition rate of arrhythmia was 87.95%, and the rate of being misclassified as normal category was 4.44%. The model can screen arrhythmias effectively.
The classification and recognition rate of ventricular hypertrophy was 75.77%. The diagnostic rate of being misclassified as normal category was 9.6%. The experimental results show that the model can detect the status of organic lesions in time and warn the subjects to do further examination to find out the potential causes.
ECG data of myocardial infarction and atrial hypertrophy are too little and the data distribution is extremely unbalanced. For both types of identification, the total recall reached 85.92%, only 7 abnormal records are considered normal by the MTGBi-LSTM model. It urges the subjects to do more detailed examinations in order to determine whether they have cardiovascular disease and has early warning effect.
In many cases, the location of the disease is ambiguous and the abnormal rhythm is very similar. Because of the lack of information about subjects, limited signal duration, cardiologists cannot draw reasonable conclusions from ECG. Similar factors also exist in the analysis of ECG which is partially misclassified.

B. ABLATION STUDY OF FRAMEWORK COMPONENT
In order to intuitively show the contribution of the different module to the classification performance, we compared various Bi-LSTM variant models. Baseline Bi-LSTM is a combination of Res-GCNN and common Bi-LSTM. The Group Bi-LSTM is based on the Base Bi-LSTM structure and replaces the common Bi-LSTM with the Group Bi-LSTM module. AGBi-LSTM adds attention mechanism based on Group Bi-LSTM. A&LG-LSTM adds dynamic weighted loss based on AGBi-LSTM, and MTGBi-LSTM adds multi-task learning based on A&LGBi-LSTM. In Table 5, the F1 score that are evaluated under multiple comparative case studies with same testing ECG sample data.

1) GROUP Bi-LSTM IMPROVES PREDICTION
The results show that the performance of Group Bi-LSTM is better than that of Baseline Bi-LSTM, in specific, precision, recall and F1 score of Group Bi-LSTM group increased to 0.8720, 0.9224 and 0.8965, respectively. The performance of Group Bi-LSTM is more excels and is a powerful model of feature learning.
The design of Group Bi-LSTM enables the model to deeply exploit the features of single lead and the relationship between each lead of ECG. This means that there is also information flow in the ECG between different leads at the same time. its effectively enhances the quality of feature expression between different ECG leads and the expressiveness of signal representation with respect to ECG waveform features, thus leading to accurate and reliable estimations for CVD. This really is something worth celebrating, a milestone.

2) ATTENTION MECHANISM IS USEFUL
The performance of AGBi-LSTM is better than that of Group Bi-LSTM. This shows that the attention mechanism and dynamic weighted loss increases the classification performance. By multiplying the weight vectors in attention mechanism, the influence weights for local temporal features are generated, which can highlight the main waveform features of ECG, and merge the ECG temporal features of each iteration as the main waveform features. Attention probability distribution is used to control the influence of elements in input sequence on elements in output sequence. While retaining VOLUME 8, 2020 more valuable information, reducing the impact of irrelevant or weak correlation information on output data.
The MTGBi-LSTM model combines attention mechanism to selectively learn the final feature map, breaking the limitation of the traditional neural network structure relying on the internal fixed length vector in encoding and decoding, and enhancing the robustness of the model.

3) DYNAMIC WEIGHTED LOSS HAS SIGNIFICANT EFFECT
The results in Table 5 show that the classification performance of the MTGBi-LSTM model has been significantly improved. The F1 values of JA, VA, and LVH have increased by about 4 points. AMI, IMI, RVH, LAH, and, BAH have increased by about 10 points. In particular, the F1 value of LAH increased from 0.4 to 0.5238.
At the same time, we also noticed that the F1 values of LMI, BVH, and RAH have dropped, and there are only about 10 data in the test data set. Manually checking for inconsistencies found that the MTGBi-LSTM classification error overall seemed very reasonable. In some cases, because ECG data records are turbulent, it is not possible to draw reasonable conclusions from the ECG data.
We find that the results are improved when using dynamic weighted loss. It is proved that the dynamic weighted loss function solves the problem of the imbalance of normal and abnormal diseases in the field of intelligent assisted diagnosis.
Multi-Task learning mechanism is important. For the symbol λ of the multitasking learning mechanism, we evaluate the four values of λ to verify the classification accuracy on the test set. As λ decreases, the prediction accuracy increases gradually, and all models trained using multitasking learning mechanisms are superior to those that are not trained (Table 6). Because it provides the best prediction accuracy on the test set, in our complete framework, choose λ = 0.2 as the training model.
The local minimum of different tasks may be located at the low point of different local concave planes. The task1 increases the inter-CVD variations by drawing feature maps extracted from different diseases apart, while the task2 reduces the intra-CVD variations by pulling feature maps extracted from the same disease together. Through different forces between two tasks, the hidden layer can jump out of the local minimum, reduce the risk of model over-fitting, and make the classification accuracy of the model higher.

C. EFFECTIVENESS FOR MULTI-LEAD ECG
We evaluate the ECG data of 3-lead, 8-lead, and 12-lead to prove that the MTGBi-LSTM model can intelligently analyze the ECG data of any number of leads. It also demonstrates that the 12-lead ECG can provide a higher quality assessment analysis.
We performed a comparative experiment on the same ECG data set, as shown in Table 3 and Table 4. The labels and the number of records are the same, except that the number of leads used is different. The 3-lead ECG data is selected from I, II, III and the 8-lead ECG data is selected from II, III, V1, V2, V3, V4, V5, V6.
As seen in Table 6, when L num = 3, L num = 8 and L num = 12 (λ = 0.2) are compared, the precision, Recall, and F1 of the model increase with the number of ECG leads.
The 3-lead based MTGBi-LSTM model predicts both normal and arrhythmia categories, and the predictions for MI, VH, and AH are significantly worse. The ECG information collected by the 3-lead is relatively simple, and the information on the changes of the ST segment and the P wave is not obvious.
The 8-lead based MTGBi-LSTM model clearly achieved better performance than the 3-lead. When L num = 8, the selected 8-lead are the basic leads of the ECG data, and the remaining four leads of the twelve leads are linearly derived from the eight basic leads. The ECG information collected by the 8 lead is rich, and the MTGBi-LSTM model can extract more highly discriminative ECG essential features.
This shows that our MTGBi-LSTM model can evaluate any multi-lead ECG (2-lead or more) and the 12-lead ECG data based MTGBi-LSTM model achieves the best performance.

D. PERFORMANCE COMPARISON
CNN can stimulate low-dimensional local features implied in ECG waveforms into high-dimensional space, and the subsampling of a merge operation commonly used in CNN can inevitably extract peak points in a given merge window. In previous work [14], [22], [35] (Table VII), the researchers used deep feedforward neural network to classify ECG and achieved high performance.
As a comparison, results from RNN including those in [21], [23], [41] are listed in Table 7. The advantage of RNN modeling is that the circular network can store a certain length of context information, RNN can handle ECGs of any length of time. When the neural activation unit completes some non-linear mapping, the RNN network can continuously update the network state, so RNN can learn long-term dependencies.
Furthermore, although both the CNN-based method and the RNN-based method obtain good results, those study has several important limitations. The CNN can only learn completely fixed time-varying weights, and it can only accept fixed length ECG input. CNN cannot understand the complex time characteristics in ECG data, ignoring the correlation between ECG data before and after.
The RNN-based algorithm simply superimposes the number of layers of LSTM or GRU, or combines with tricks in deep learning to improve model performance. These algorithms use the depth of the model in exchange for the performance improvement of the model, without exploring the essential characteristics of the ECG. Previous works based on multi-lead ECG analysis did not pay attention to the correlation among different lead information. The accuracy of the previous works is far from the requirement of technical reliability in the actual clinical environment.
Different lead electrocardiogram is a projection of the vector cardiogram in different directions. The 12-lead ECG can provide a full evaluation of the cardiac electrical activity, and the detection rate of standard 12-lead clinical ECG is significantly higher than single-lead ECG. Single-lead ECG of different diseases has a similar fluctuation trend. For example, in the single-lead ECG signal with normal sinus beats as the main component, the similarity of adjacent beats is high, when arrhythmia or noise disturbance occurs, the similarity of adjacent beats generally decreases. At this time, it is difficult to determine whether it is arrhythmia or low signal-to-noise ratio. In [14], [21], [34]- [41], the input dataset is limited to single-lead ECG records, which provides limited signal compared to a standard 12-lead ECG, and it remains to be determined if those algorithm performance would be similar in 12-lead ECG.
The most commonly used data set currently used to design and evaluate ECG algorithms is the MIT-BIH arrhythmia database [19], which consists of 48 and a half hour 2-lead ECG data records. The beat cycle of the database may belong to the same patient, and different beat data may have the same trend. So the model evaluated using the MIT-BIH-AR database is not robust, and the data sample reported in the MIT-BIH-AR database are insufficient and leads to the overfitting problem.
As the amount of digitized ECG data increases, the performance of deep neural network-based machine learning models can approach and possibly exceed human diagnostic capabilities. After evaluating the largest known CCDD database in the world, MTGBi-LSTM model obtains the most advanced performance on more than 17,000 12-lead ECG records, which proves the effectiveness of our algorithm.
Here, we focused on a central challenge in ECG diagnostics, and proposed the intelligent assistant diagnosis model of CVD based on Res-GCNN and GBI-LSTM hybrid structure. It can accept any multi-lead ECG. We propose a group of Bi-LSTM structures that can Computer-Assisted diagnose ECGs with any number of leads. At the same time, attention mechanism is added to each Bi-LSTM structure to break through the limitation that traditional Bi-LSTM structure relies on internal fixed length vectors when encoding and decoding. The most important thing is that MTGBi-LSTM model explores the characteristic expression of different ECG leads for the first time, which is of great significance for the auxiliary diagnosis of multi-lead ECG.

IV. CONCLUSION
In this paper, we propose a Multi-Task Group Bi-LSTM framework to classify various diseases of ECG in order to realize intelligent assistant diagnosis of CVD. This model employs a group Bi-LSTM and Res-GCNN to improve the learning ability of time and space. Furthermore, the model can deeply exploit the features of single-lead and the relationship between each lead of ECG. Attention mechanism and multi-task learning technology are used to enhance the recognition ability of the model, and ultimately to achieve a variety of disease classification. According to the results shown, compared with other classifiers, the F1 of the experimental model in this paper is up to 0.9239 and the results confirm the high accuracy and generalization ability of the proposed model. Experimental results confirm the MTGBi-LSTM is efficient and powerful, and could match state-of-the-art algorithms.
The intelligent assisted diagnosis of electrocardiogram improves the diagnostic efficiency of cardiologists, and we hope that this technology can be combined with online equipment to serve as an early screening tool for cardiovascular disease in areas with limited medical resources.