Siamese Convolutional Neural Network for Heartbeat Classification Using Limited 12-Lead ECG Datasets

The Electrocardiogram (ECG) is a low-cost exam commonly used to diagnose abnormalities in the cardiac cycle. Over the years, the scientific community has investigated the automatic classification of ECG signals driven by advanced Machine Learning (ML) techniques. Despite recent scientific advances, annotating large and diverse datasets to support the training of ML techniques is still very time-consuming and error-prone. Indeed, ML techniques whose training does not require extensive and well-annotated datasets are becoming even more prominent. Therefore, it is possible to correctly identify and classify abnormalities in the cardiac cycle (e.g., rare cardiologic disturbs) using limited data available in ECG datasets. However, the classification of heartbeats from digital tracings of ECG signals containing 12 leads from imbalanced datasets is challenging due to many existing heart diseases. This study investigates the few-shot learning paradigm based on Siamese Convolutional Neural Networks (SCNN), popular in imaging classification problems, to classify 12-Lead ECG heartbeats using a few training samples with supervised information. The proposed SCNN model presented an accuracy of up to 95% in a public dataset based on the hold-out validation method, implemented for different combinations of similarity and loss functions. Besides, using the 7-fold cross-validation method, the model presented a mean area under the curve of 89%. We also compared the class-by-class classification results with those of similar methods available in the literature, obtaining the same or better results based on performance metrics such as accuracy, precision, recall, and specificity.


I. INTRODUCTION
The healthcare systems of low-middle-income countries have many deficiencies due to low investment and poor distribution of doctors among the country's regions. For instance, Brazil, a continental-size middle-income country, according to the latest medical, demographic survey [1], has a ratio of 2.18 doctors per 1,000 inhabitants in the national ter-The associate editor coordinating the review of this manuscript and approving it for publication was Tony Thomas. ritory. However, the northeastern region has a balance of only 1.42 doctors per 1,000 inhabitants. Besides, considering developed countries, according to a 2019 study conducted by the Association of American Medical Colleges, there is a ratio of 353 people per physician in the United States. However, only 2.4% are specialists in the field of cardiology [2]. Indeed, low investment and poor distribution of doctors may negatively impact developing and developed countries' diagnosis and treatment of diseases (e.g., cardiovascular diseases). Cardiovascular diseases are the most common cause of death in the world [3]. For instance, in Brazil, they represent the leading cause of disability retirement and hospitalization expenses. However, only 4.1% of medical specialists in Brazil are cardiologists, and this scarcity compromises the analysis of simple tests such as the Electrocardiogram (ECG) [1]. Thus, higher investment and better distribution of doctors among the country's regions can result in more regular visits to a cardiologist, improving diagnosis and treatment. Indeed, it is possible to diagnose cardiovascular diseases prematurely using ECG tracings, preventing stroke and heart attack complications due to early interventions.
The ECG at rest is a simple, non-invasive, and inexpensive test that records the heart's electrical activity over a short period (approximately 10 seconds). The recording can be done by 12 leads, combining the position of electrodes on the limb region and the front of the chest. The differences in shapes and frequency of the ECG waves allow the identification of different cardiovascular diseases, such as cardiac arrhythmias or heart muscle problems.
To speed up the triage process in medical centers that perform remote ECG reports, researchers have been developing a set of computational algorithms to automatically classify ECG signals as to the state of normality or abnormality in cardiac electrical activity. Deep learning is an example of a Machine Learning (ML) approach widely adopted to classify ECG signals automatically.
Considered an open problem in the context of deep learning with ECG signals [4], the academy and industry recognize the class imbalance as an obstacle in developing effective deep learning models with a high amount of parameters by making the training phase harder. ML practitioners can avoid this problem using data augmentation techniques such as the Synthetic Minority Oversampling Technique (SMOTE) [5]. Recently, a new approach called few-shot learning [6] has been popularized and stands out in imaging processing problems. This approach tries to circumvent the necessity of large and diverse datasets by using prior knowledge to improve the models' convergence to an acceptable solution. A solution can use the previous knowledge in three main ways: augmenting the training dataset, restricting the solution search space, and modifying a similar task solution to fit the new problem.
Given the challenge in addressing imbalanced multi-class problems [7], this study investigates the usage of few-shot learning based on a Siamese Convolutional Neural Network (SCNN) model [8], [9] for the classification of heartbeats from digital tracings of ECG signals containing 12 leads from imbalanced datasets. The study presents an approach to classify 12 lead ECG heartbeats automatically using an SCNN model. We used an open ECG dataset and tested different model configurations, using a simple decision process to classify the signals.
The article has the following structure. Section II discusses related works. Section III presents preliminaries on the ECG dataset, few-shot learning, SCNN, and similarity and loss functions. Section IV describes the proposed SCNN model and experimental results. Finally, Section V discusses the results reported in Section IV. Section VI presents threats to validity, while Section VII concludes the article.

II. RELATED WORK
Deep learning is a relevant topic addressed by many researchers for biomedical applications. For instance, Tng et al. [10] analyzed the performance of recurrent neural network-based models for predicting histone lysine crotonylation. Le and Nguyen [11] proposed a deep learning model to identify flavin mono-nucleotides interacting residues.
Besides, in the scientific literature, several papers explore deep learning techniques to classify ECG signals from digital tracings. For example, Acharya et al. [12] trained two eleven-layer Convolutional Neural Networks (CNN) to classify ECG signals as normal or with coronary artery disease. In one of the networks, for training, they used 95,300 2-second segments, 15,300 normal, and 80,000 altered; in the other, 38,120 5-second segments, 6,120 normal, and 37,000 altered. All signals were obtained by lead II of 40 normal patients from the Fantasia [13] database. They also used seven more records of patients with coronary artery disease from the St. Petersburg Institute of Cardiology Technics 12-lead arrhythmia database [14]. Although having the same structure, the authors trained the two networks with segments of different lengths. In such a study, the accuracy obtained was 95% with two-second samples and 95.1% with fivesecond samples.
Using a 10-layer CNN, Baloglu et al. [15] could detect ten different classes of myocardial infarction from 12-lead signals found in the PTB Diagnostic ECG [14] database. They used one hundred forty-eight signals with myocardial infarction and 42 healthy signals. The signals went through a wavelet transform-based pre-processing step for noise and baseline wander removal and then through an R-wave detector to extract a stretch of the ECG signal corresponding to only one heartbeat. The authors trained each lead separately on the neural network in the proposed approach, resulting in an average accuracy of 99.60%.
In another related work, Yildirim et al. [16] used a different approach to detect 17 classes of cardiac arrhythmias. For training, the authors used 1,000 10-second segments sampled from signals of 45 individuals from the MIT-BIH Arrhythmia database [14]. This work follows the hypothesis that there is only one type of arrhythmia in each 10-second segment and uses longer traces to capture changes in signal characteristics over time. The CNN classifier developed in this work obtained an average accuracy of 91.33%.
Ribeiro et al. [17] used a residual neural network model with 12-lead ECG signals to identify six types of cardiac disorders: first-degree atrioventricular block, right bundle branch block, left bundle branch block, sinus bradycardia, atrial fibrillation, and sinus tachycardia. In such a work, the authors used a private database obtained through the Telehealth Network of Minas Gerais (RTMG), containing more than 2 million and 300 thousand 10-second segments of ECG signals. The authors used natural language processing techniques to reuse the signal classes from medical reports. In the study, they compared the trained network's diagnosis with those given by pairs formed: two cardiology residents, two emergency department residents, and two medical students. The network obtained a more consistent result than the results provided by all the pairs, with the F1 score being 80% and specificity above 99%.
Besides, few-shot learning recently found its way into classifying ECG signals. For example, Liu et al. [21] developed a few-shot learning method to detect arrhythmia in ECG signals by pre-training a model on an auxiliary dataset and using a meta-transfer learning scheme to improve the learning of the unseen classes. Yang et al. [22] used a Siamese Neural Network (SNN) based on the ODENet to classify 10 seconds segments of ECG signals into five classes. In another study, Li et al. [23] published a paper proposing a similar approach. The authors propose an SCNN to classify single lead ECG heartbeats into four classes under a limited dataset constraint.
Despite the high accuracy of implemented models, most of the related work resorts to public datasets for classifier training, such as the MIT-BIH Arrhythmia [18] and the INCART, available in the PhysioNet [14] repository. Public datasets usually contain long signals from few patients, implying a high dependency between observations. This fact is not considered in accuracy calculations and contributes negatively to the fact that such measures tend to be too optimistic [19]. In addition, few datasets make available the resting ECG signals in 12 leads, making it difficult to detect diseases whose diagnosis depends on the evaluation of signals in multiple leads, such as ventricular fibrillation and myocardial infarction [20]. Another problem arises because more severe diseases tend to occur less frequently, thus having little representativeness in datasets with few patient ECG records.

A. ECG DATASET
We collected the ECG signals from the open-source St Petersburg INCART 12-lead Arrhythmia dataset [14] available on the Physionet database. The dataset consists of 75 ECG recordings extracted from 32 Holter records. Each recording is 30 minutes long in this dataset, containing the 12 standard leads sampled at 257 Hz. The heartbeat annotations were produced automatically by an algorithm and manually corrected later.

B. FEW-SHOT LEARNING
Few-shot learning is a machine learning paradigm that allows supervised learning algorithms to learn from a limited number of examples. Among its main uses, this paradigm is suitable when [6]: 1) the model needs to learn rare cases; 2) the cost of collecting and annotating a robust dataset becomes too high; and 3) it is necessary to make the machine learn like a human being. Researchers usually categorize few-shot learning algorithms according to the context in which prior knowledge of the problem applies: data, model, and algorithm. Using previous data knowledge seeks to improve the dataset of a model to achieve a satisfactory generalization function. Thus, it may be necessary to convert an existing dataset into a new type of information to decrease the training complexity of another model [31], [32]. It may also be necessary to classify unlabeled or weakly labeled samples to increase the data for training [33], [34] or generate data similar to the original dataset artificially [35], [36].
In the model context, few-shot learning algorithms seek to limit the solution search space, as this facilitates convergence to a satisfiable function. Designers can combine models that solve specific parts of a problem with parameter sharing to solve a more generic problem (Multitask Learning) [37], [38]. They can also simplify the search space by looking for a function capable of mapping the samples to a feature space in which it is easy to differentiate the dataset classes using a similarity function (Embedded Learning or Metric Learning) [39], [40]. Other techniques use generative models and likelihood functions (Generative Modeling) [41].
Designers also use the few-shot learning methods to guide model parameter development. Some approaches include: • adapting a series of parameters θ 0 from a model performing one type of task to parameters θ from another similar task [42], [43]; • refining training parameters according to their performance [44], [45]; and • learning an optimization function to adjust model parameters during training [46], [47].

C. SIAMESE CONVOLUTIONAL NEURAL NETWORK
A Siamese neural network, developed initially to verify handwritten signatures in images [48], is composed of twin networks with the same weights and architecture. Siamese neural network uses concepts related to few-shot learning. Each of these twin networks accepts a different set of inputs. The intent is to produce an embedding function that maps VOLUME 11, 2023 those inputs into a d-dimensional space where the value of a similarity function f is low for inputs of the same class and high for inputs of different classes [6]. Traditionally, designers train neural networks in a fixed number of classes. Adding or removing these classes is a problem, requiring designers to retain the neural network to accommodate those changes. A Siamese neural network addresses this problem since it learns to compare the two inputs and check whether they are similar. So, adding a class becomes as simple as adding another scenario to compare with the samples [49]. As an embedded learning algorithm, the network maps inputs to a feature space where it is easier to discriminate different classes. Because it is composed of a set of networks with the same parameters, it is unlikely to map similar data to very different locations in the feature space. With this, for a coherent mapping function, the similarity function should have low values for samples of the same class and high values for samples of a different class.
The use of convolutional layers combined with Siamese neural networks makes it even more interesting since convolution is an operation that can filter the input and highlight patterns contained in the data segments. In this way, it is possible to train a convolutional layer to represent essential features of its input in its output [49].

D. SIMILARITY AND LOSS FUNCTIONS
Concerning the similarity functions, we used the L1 distance, L2 distance, Mean Squared Error (MSE), and Root Mean Squared Error (RMSE): where A and B are the vectors and N is the size.
Regarding the loss functions, we tested the Binary Cross-Entropy (Equation 5) and Contrastive Loss (Equation 6) [50]. We designed the two-loss functions with different objectives: Binary Cross-Entropy for classification problems and Contrastive Loss for metric-based problems. We can reduce the similarity problem to a binary classification problem with ''Same'' or ''Different'' classes. Binary cross-entropy is a commonly used loss function in this type of model, even with specialized loss functions (L): where y is the expected output, d is obtained output, margin is a parameter defined as the constant 1 (given that d ranges from 0 and 1), and max is a maximization function. Previous studies applied these similarity functions, so we experimented with them to analyze the models' performance. For instance, Koch et al. [49] used the weighted L1 distance and a variation of binary cross entropy as a loss function (Equation 5). Besides, Nandy et al. [51] highlighted the L1 and L2 as examples of relevant similarity functions. Chopra et al. [52] used the Euclidian distance (L2) and Contrastive Loss (Equation 6) as similarity functions.

A. PROPOSED SCNN ARCHITECTURE
Up to 8 layers compose the proposed SCNN, as illustrated in Fig. 1. We defined the number of layers and their disposition through empiric experimentation. We tested four similarity and two-loss functions, resulting in eight combinations. Thus, we used the following similarity functions: L1 distance, L2 distance, MSE, and RMSE. Besides, we experimented with the Binary Cross-Entropy (Equation 5) and Contrastive Loss (Equation 6) [50].
We also experimented with many network architectures to improve the performance of classifications. Thus, we tested the number of layers, convolution filters, and the size of the filters. Despite being the most popular activation function, the rectified linear activation function (i.e., RELU) may suffer from a problem called the dying RELU problem. When using the RELU, neurons can, under certain conditions, enter a state of perpetual inactivation where it gives no output for any input and produces no gradient, making it essentially ''dead,'' as it has no contribution to the neural network anymore. To mitigate this problem, we used a variation of the RELU, known as Leaky RELU (Equation 7): where X is an arbitrary input. There is a slight positive slope in this activation function when the neuron is inactive, making recovery possible from a dying state. This is a relevant solution to address the RELU problem.
Besides, we used Max Pooling to return the maximum value of each region. Table 1 summarizes the detailed parameters of each layer for the proposed SCNN architecture to enable the reproduction of the model presented in this study. We did not apply specific methods for tuning. However, we defined the parameters through extensive experimentation to achieve relevant results. Readers can access our source code in our public repository. 1

B. PRE-PROCESSING
We filtered each ECG signal using a Discrete Wavelet Transform (DWT) approach with Daubechies 4 as the mother   wavelet. This approach works by applying a DWT to the signal and discarding the resulting wavelet components representing low and high-frequency noise. We employed a second-order Butterworth bandstop filter with a 50 Hz cutoff frequency as an additional step to reduce powerline noise. Fig. 2 presents original and filtered ECG signals, illustrating filtering results based on our approach. We collected heartbeats samples from the filtered signals by extracting segments located around the pre-annotated R waves. The extracted ECG segments contain 65 samples before and 103 samples after the R wave, totaling 169 points per heartbeat. We can recalculate this sample range for signals of other datasets by adjusting its values following the dataset's frequency and using a simple rule of three. We perform this procedure for all 12 standard leads and concatenate the collected heartbeats of each R wave annotation into a single signal with 2,028 samples. Fig. 3 presents an example of an ECG signal with the 12 leads concatenated based on our approach.
In this study, we selected seven kinds of heartbeats (i.e., seven classes) from the nine available types of heartbeats in the INCART dataset [14]. We removed two types of heartbeats due to unclassified or unspecified annotations. We split the heartbeats according to Table 2. Therefore, our description evidences the highly imbalanced nature of the dataset, motivating the use of the few-short learning approach, defined as an SCNN architecture.

C. EXPERIMENTAL SETUP
In the first evaluation step, the dataset was split into a 75-15-10 ratio in a stratified form, as each split has nearly VOLUME 11, 2023 the same proportion of samples of each class. Therefore, for each epoch (a total of 50), we carefully split the data into 75%, 15%, and 10%, for training, validation, and testing, respectively. During training, each instance in the division would produce two pairs of signals, one formed by the piece and another randomly selected example from the same class, another created by the sample, and a randomly selected sample from a different category. This way, the input to the model is equally distributed between positive (same class) and opposing (another type) pairs.
We implemented the models with the Python programming language and Keras framework, running on an Nvidia RTX 2060 GPU, an Intel(R) Core(TM) i7-10875H CPU, and 32GB of ram. In the training stage, the ADAM optimizer was used with a learning rate of 0.001 and batch size of 128 samples, running for 50 epochs. We also obtained those values after empirical experimentation. As described in the previous section, we used binary cross-entropy and contrastive loss as loss functions.
A sample of each class was manually selected to form a reference set. The selection of this reference sample is essential to the quality of the predictions, as it has to contain the most significant characteristics of its class. We formed the pairs by the target sample and each reference sample from the reference set and associated with its class. We fed them into the model, assigning the resulting class to pair with the highest output similarity.
In the second evaluation step, we used the k-fold crossvalidation technique to improve confidence in the performance of the proposed approach. The k-fold cross-validation technique is relevant to evidence of a model's performance when using limited-size datasets.

D. EXPERIMENTAL RESULTS
We trained ten models of each combination of the loss function and similarity functions to minimize the effect of random outcomes on the results of the experiments. Then, we employed error plots to show the accuracy and loss of the models on the validation dataset. On those plots, the lines on the graph are the average accuracy of the ten generated models, while the error bar is its standard deviation. Fig. 4 illustrates that the models that used Contrastive Loss as their loss function have a lower standard deviation value during the training process. In particular, this value is the lowest when we pair the Contrastive Loss function with the MSE or RMSE similarity functions. Those models may have achieved convergence to a similar accuracy value with a relatively high frequency. However, the models that used Binary Cross-Entropy showed a high accuracy variation. This variation was exceptionally high when paired with MSE or RMSE similarity functions, contrasting with what happened when combined with Contrastive Loss. Fig. 5 shows that the loss value of the models with Contrastive Loss tends to fluctuate less than those with Binary Cross Entropy. All models' loss stagnated after close to 45 epochs, denoting that improvement from adding more training epochs could happen but is unlikely. However, finetuning the training parameters can still be done to reach better solutions. We conducted ten executions to reduce the random factor of assembling the pairs and weights of the network. Table 3 and Table 4 present the average metrics after ten executions for the models with Contrastive Loss and Binary Cross Entropy, respectively. In such a presented scenario, the MSE and RMSE models coupled with Contrastive Loss achieved overall better quality metrics when compared with the other researched combinations, with metrics such as 95.6% and 95.9% of accuracy and 96.1% and 94.9% precision. The results obtained from the Binary Cross Entropy models were very close to each other, with the model using the L1 distance as a similarity function obtaining slightly better results.
To improve confidence in the performance of our approach, we applied the k-fold cross-validation with k = 7. Thus, the proposed SCNN model presented a mean accuracy of 86%. Fig. 7 presents the Receiver Operating Characteristic (ROC) curve plot for each of the seven folds during the 7-fold cross-validation. The ROC plot shows that the model   presents a high discriminatory capacity for classifications. The proposed model achieved a mean Area Under the Curve (AUC) of 89%, given the AUC of the fold 1 (75%), fold 2 (97%), fold 3 (86%), fold 4 (98%), fold 5 (98%), fold 6 (83%), and fold 7 (86%).
In addition to the ROC plot, we analyzed the Precision-Recall (PR) curve to understand how well our proposed SCNN model deals with the minority classes. Fig. 6 presents the PR curve plot for each of the seven folds during the 7-fold cross-validation. The PR curve plot also shows that the model can properly classify instances from the minority classes.
We also conducted a per-class analysis for the most accurate model of each loss function: a model using Binary Cross Entropy combined with the L1 similarity function; and a model using Contrastive Loss with the MSE similarity function. For the remainder of this section, we refer to those models as the ''Binary Cross Entropy Model'' and the ''Contrastive Loss Model.''    8 shows a heatmap of the results obtained for each class using the Binary Cross Entropy model. The values of cells represent the proportions of predicted classes concerning the number of elements of the actual class (each line sum up to 1). According to the heatmap, this model achieved outstanding results in the classes with a large number of samples (N, R, and V) and, surprisingly, with the low sampled j class. However, in classes with a small sample count, it was subpar. The model often mislabeled the ''F'' class, for example, with either the ''V'' or N label and the ''S'' class as a normal heartbeat.
The precision of the classification of the ''S'' class is particularly intriguing, as most classifications were false positives far exceeding the number of samples of that class, making its precision value plummet (Fig. 5). On the other hand, the ''A'' class classification achieved a low number of false positives while having a not-so-high recall value.
For the Contrastive Loss model, the results are slightly worse in comparison to the previous model when looking at normal heartbeats (N) classification but are better everywhere else (Fig. 9). Gains from using Contrastive Loss as the loss VOLUME 11, 2023 In general, the classification of classes with fewer samples is considerably better. The classification of the ''J'' class achieves a recall value of 100% but has many more false positives, especially with the misclassification of the ''F'' class. Still, the ''F'' class classification recall rose sharply compared to what we achieved using the Binary Cross Entropy Model, with a value of 68.04% versus the 39.73% shown previously. Similarly, metrics for the ''S'' class classification are better, with a much higher recall and precision due to a reduced number of false positives and an increased number of true positives.

V. DISCUSSION
An SCNN model for heartbeat classification from ECG signal tracings is relevant to support clinical practice. This neural network learns to embed samples in feature space under a similarity function instead of classifying them. Thus, it can handle unknown classes and classes with low sample numbers better than traditional neural networks. Our model is relevant for a clinical application scenario due to the evident lack of large databases to define large training sets. For instance, if there is little data related to a rare cardiovascular disease, our SCNN model can still support clinical decision-making in the context of medical diagnosis. Therefore, we tested eight models of Siamese neural networks. The models were built with the same layer configuration but with different loss and similarity functions. The models that used Contrastive Loss as the loss function achieved overall better results than those using Binary Cross Entropy. As a specialized loss function, the use of Contrastive Loss seemed to improve the classification results of classes containing a small number of samples like the ''F,'' ''j,'' and ''S'' classes, going from 39.73%, 90.22% and 12.50% recall to 68.04%, 100%, and 62.50% respectively. Besides, we tested the overall proposed model using the k-fold crossvalidation method, showing that the model can perform well when classifying unseen data.
Compared with similar literature models, this work presented great results, especially when classifying heartbeats of the ''F'' class. This classification achieved values of 65.35% precision and 68.04% recall that far exceed values found in other works. The classification of the classes with a high number of samples was in line with what was found in other works, with precision and recall values well above the 95% mark. The classification of the ''A'' class, while worse than what was achieved with other methods, was still solid.
Therefore, we compared similar literature models with our SCNN approach. We only considered solutions based on the same database (i.e., INCART) and performance metrics (i.e., precision and recall). Table 5 compares this work with some of those shown in the literature. The class A represents atrial premature beats, F represents a fusion of ventricular and normal beats, j represents nodal (junctional) escape beats, N represents normal beats, R represents right bundle branch block beats, S represents supraventricular premature or ectopic beats, and V represents premature ventricular contractions.
Concerning the classification of the highly sampled classes (''N,'' ''R,'' and ''V''), the proposed models achieved values that are comparable to those of other authors. A pleasant 5372 VOLUME 11, 2023  surprise was the classification of heartbeats in the ''F'' class, with the Contrastive Loss model achieving better results than the best model listed with 64.35% precision and 68.04% recall in comparison with 23.58% precision and 11.07% recall. While not good, the classification of the ''S'' class heartbeats using the Contrastive Loss model is on par with the values found in other works. During our research, we did not identify related studies classifying the heartbeats of the ''j'' class.
One issue identified with the use of SCNNs combined with the proposed decision process is that its results are susceptible to the quality of the reference set. A reference set composed of miss-labeled, highly noisy, or ill-conditioned signals negatively impacts the quality metrics of the proposed models when trained in noisy datasets, as we can identify similarities between the noised reference and noised target sample.
However, further investigation on using this network architecture for ECG signal classification is encouraged, as it achieved this result with a relatively simple architecture. A denser network architecture or well-known signal processing model combined with a more thorough tuning of its hyperparameters may improve the results significantly. A more robust preprocessing step and an automatic reference signal selection could also be employed to reduce the influence of noisy signals on the network results.
We also recommend the investigation of changes in the input format. Using a vertical stack of the 12 ECG leads instead of a horizontal concatenation would allow for the use of 2D convolutions, rendering possible interactions between the ECG leads during the convolution process that is not possible otherwise. It is also possible to employ a more complex decision process by combining the output of a trained SCNN with other machine learning algorithms. Finally, as this type of neural network only learns the embedding, this training process can be easily used to obtain a feature extractor module that can be used for other kinds of neural networks, making it readily reusable. VOLUME 11, 2023

VI. THREATS TO VALIDITY
This study presents threats to validity due to the use of publicly available databases and the performance regarding classification accuracy. The publicly available databases may not cover some specific disease stages. For instance, the beginning of myocardial infarction is tough to identify and is scarce in the existing databases.
Besides, Siamese neural networks focus on allowing a quick addition and removal of classes to the model since this process consists of the insertion or removal of examples, respectively. Given this, the model developed does not present a classification accuracy as high as the conventional models (training and validation cycles with pre-defined classes). Still, Siamese neural networks add flexibility to insert new classes without retraining the model. This flexibility is a relevant feature to reduce the impacts of problems regarding scarce available data in databases (e.g., related to specific disease stages). Thus, if new data is available, the existing Siamese neural network enables an easy path toward the scalability of proposed diagnostic solutions.

VII. CONCLUSION
This article discusses the use of SCNN with 12-Lead digital ECG signals. The results are promising, with most models achieving accuracy values over 90% and 86% using the hold-out and k-fold cross-validation methods, respectively. The models with the best performance used Contrastive Loss as the loss function, which we expected, as it is a loss function specially designed for problems such as this one. The positive results of the models that used the MSE and RMSE functions are unexpected, as those are not widely identified in the literature because of their higher computational cost. An advantage of this model is that it's possible to integrate new classes into the dataset without retraining the model with an especially designed decision process and an accurate prediction. We also recommend a class-by-class analysis and a more robust decision process, with the addition of fine-tuning the models' parameters.
The publicly available databases (e.g., Physionet) include only a small amount of data regarding rare cardiovascular diseases, limiting the use of conventional machine learning approaches. Therefore, our proposed SCNN model is especially relevant for clinical decision support in diagnosing rare cardiovascular diseases in which little data is available for training. He is also a Researcher with the Postgraduate Program in Informatics (PPGI), Computing Institute, UFAL. His current research interests include control systems, digital signal processing with an emphasis on biomedical signals, ventricular assist devices, and modeling and simulation of biological systems with a focus on the cardiovascular systems.