ECG Biometric Recognition: Review, System Proposal, and Benchmark Evaluation

ECGs have shown unique patterns to distinguish between different subjects and present important advantages compared to other biometric traits. However, the lack of public data and standard experimental protocols makes the evaluation and comparison of novel ECG methods difficult. In this study, we perform extensive analysis and comparison of different scenarios in ECG biometric recognition. We consider verification and identification tasks, single- and multi-session settings, and single- and multi-lead ECGs recorded with traditional and user-friendly devices. We also present ECGXtractor, a robust Deep Learning technology trained with an in-house large-scale database, and evaluate it with detailed experimental protocol and public databases. With the popular PTB database, we achieve Equal Error Rates of 0.14% and 2.06% in single- and multi-session verification. The results achieved prove the soundness of ECGXtractor across multiple scenarios and databases. We release the source code, experimental protocol details, and pre-trained models in GitHub to advance in the field.


I. INTRODUCTION
Biometric data are widely used in recognition systems due to their ability to uniquely identify subjects through their biological or behavioural traits [1], which are intrinsic to human beings, differently from traditional recognition tools such as passwords and tokens.Among the most popular biometric traits, we encounter facial features [2], fingerprints [3], iris [4], gait [5], handwriting [6], and speech [7].However, these traditional biometric traits are vulnerable against Presentation Attacks (PAs) [8]- [11] and digital manipulations [12].
The electrocardiogram (ECG) is a graph that reproduces the electrical activity of the heart, obtained by placing electrodes over suitable parts of the body.Even if its deployment in real applications is not as popular as most established biometric traits, ECG presents interesting advantages for biometric recognition.First of all, ECGs provide higher security as the signal is measured inside the body, which is therefore difficult to simulate or copy [13].ECGs allow liveness detection, as they can be captured from living subjects only, and provide useful additional information related to psychological states and clinical status [14].The possibility to capture ECGs from fingers [15] or through wearable devices [16] simplifies their acquisition and increases the acceptability of ECG signals for commercial and public applications [17].
However, biometric systems based on ECG have not yet reached the same level of technological maturity and acceptance compared to other biometric traits, mainly because of the lack of public databases [18].In addition, it is not easy to evaluate the improvements of novel proposed approaches, as different databases and experimental protocols are usually considered.Also, multiple ECG signals collected over time from the same subject present a certain variability between them, e.g., due to mental, emotional, or physical changes, or permanently due to changes in lifestyle or individual characteristics [19].Extensive studies recognize that heart rate variability may be induced by heart diseases, mental and physical variations, and other factors including drugs, medications, and diet [20].Hence, experimental protocols that consider for each subject ECG signals recorded during different time sessions are necessary to assess effectiveness and robustness of ECG-based recognition systems.However, a large number of previous studies performs experiments based on single-session analysis [17], [21], [22], which are not very informative of the system performance in realistic scenarios.In this study we address the issue of ECG variability over time, investigated through multi-session experiments where we specify the distance in time between ECGs belonging to the same subjects and recorded in different sessions.
In this study we consider two different biometric recognition scenarios based on ECG signals: i) verification and ii) identification.We develop a comprehensive technology, ECGXtractor, based on Deep Learning (DL) systems and trained with a large-scale in-house database, to investigate both scenarios.In particular, we make use of an Autoencoder to extract discriminative features from ECG signals.The Autoencoder is successfully applied to a variety of ECG signals featured with different properties.It is important to note that our Autoencoder is not specifically trained to extract features suitable for biometric recognition tasks.Hence, the extracted features may also be used in additional applications, e.g., the prediction of health conditions.
The main contributions of this study are: • In-depth analysis of state-of-the-art DL approaches for ECG biometric recognition, highlighting the key aspects of the scenarios considered in current real applications.or "off-the-person" technologies, i.e., sensors that require conductive paste or gel when attached to body surfaces, or do not require any special preparation of the subject with objects or surfaces [18].We present to the research community a clear and structured experimental protocol and benchmark, that allows to easily reproduce our experiments and overcome some drawbacks existing in other studies.
• We release the source code, experimental protocol details, and the trained weights of the ECGXtractor approach in GitHub1 to advance in the field.To the best of our knowledge, this study simultaneously addresses for the first time a variety of key aspects, which includes: i) the specific recognition task, ii) the variability of ECG signals over time, iii) the number of leads available, and iv) the modality of recording.
The remainder of the paper is organised as follows.Sec.II summarizes previous studies carried out in the field of ECG biometric recognition.Sec.III explains the details of our proposed approach: ECGXtractor, which set the foundations for our experiments.Sec.IV describes the main details of the databases considered in the study.Sec.V and Sec.VI respectively refer to the tasks of verification and identification, including both experimental protocols and results.Finally, Sec.VII draws the conclusions of this study and points out some lines for future work.

II. RELATED WORKS
In the literature, many studies have been conducted on ECG biometric recognition, considering different experimental settings depending on the task (i.e., verification or identification), the number of ECG recordings per user and leads available, the type of recorder (i.e., "on-the-person" or "off-the-person"), and other additional aspects related to data pre-processing and feature extraction.Because of that, standard experimental protocols are missing and difficulties arise when comparing novel approaches with the state of the art.
In this sense, Ingale et al. evaluated with multiple databases the effectiveness of various techniques applied in different phases of ECG biometric recognition, i.e., feature extraction, signal filtering, segmentation, and matching.This study was motivated by the fact that most of the methods proposed in the literature fails to report standard metrics [18].
With regard to the properties of ECG signals, in the literature a variety of lead configurations and modalities of recording have been considered.According to [19], a single lead contains sufficient information to support biometric recognition.However, some studies adopted multiple leads in an effort to improve performance [23], [24].Furthermore, an increasing number of portable devices, such as fingertips or wearable devices with dry electrodes, allows to record "off-the-person" ECG signals in a non-intrusive way."Offthe-person" recording provides more noise and variability compared to traditional "on-the-person" recording [18], but it is considered more reasonable and aligned with industrial requirements when using ECG signals for biometric recognition [25], [26].
In this section, we analyze state-of-the-art ECG biometric systems based on DL technologies.In general, DL systems extract features from ECG signals in a convenient and reliable manner, and provide better results compared to traditional handcrafted systems, e.g., based on Support Vector Machines (SVMs) [27]- [29].In Table I we summarize the studies discussed in this section, reporting their main characteristics and performances.We observe that PTB [30] and ECG-ID [31] are the most popular databases in the literature and, for the former, several studies only focus on its subset of healthy subjects.In addition, multi-session acquisition analysis is not always addressed in the literature: some studies fail to report any information about it, or consider subjects provided with single ECG signals.
The first DL system that we discuss is the Cascaded Convolutional Neural Network (CNN), proposed in [21].This system is composed of two CNNs: i) F-CNN, for feature extraction, and ii) M-CNN, for feature comparison.Single heartbeats extracted from single-lead ECG signals were given as input to the F-CNN.Match/non-math binary outcomes were provided by M-CNN for the comparison of features extracted from paired ECG signals.For each heartbeat in evaluation, the subject whose template provided the highest matching scores was the predicted identity.The Cascaded CNN was evaluated on five databases and achieved up to 100% of accuracy when trained and tested with the FANTASIA database [35], which contains single-session ECG signals from 40 healthy subjects.This architecture is more scalable than other identification systems, as it does not require to re-train the model when new subjects are considered.
In [33], a Residual CNN was separately trained twice for identification with two different sets of 90 and 48 subjects.Single-lead databases and single-session (or multi-session but recorded during the same day) ECG signals were employed.Accuracy of 100% was achieved when considering multiple heartbeats for each subject.The limitations of studies that only consider single-session databases were pointed out in [17].In that study, single segments were extracted from singlelead ECG signals and provided as input to different CNNs.The accuracy decreased from single-to multi-session anal- ysis: accuracies of 100% and 99.33% were achievable with two different databases in single-session, while 97.28% was the highest accuracy achievable in multi-session.Finally, an ensemble of state-of-the-art pre-trained deep neural networks for identification was proposed in [34].Segments of three consecutive heartbeats were extracted from ECG signals and provided as input to the system.By taking advantages of both transfer learning and ensemble learning, such system achieved an accuracy of 99.66%, also considering multisession recordings.
Regarding verification, we have already described the work of Ingale et al. [18].In that study, some databases provided single ECG signals for their subjects and multi-session acquisition was not taken into account.In [22], a system composed of a CNN extracting features from multiple singlelead QRS complexes combined together was proposed.Equal Error Rates (EERs) of 1.05% and 2.26% were achieved when considering QRS complexes extracted respectively within 300 s and 150 min from a 24 hour-long recording.Finally, we consider the parallel multi-scale one-dimensional residual network proposed in [32], trained with a special loss function to extract features that improve the generalization ability and achieve more stable results across databases.In that study, the input data of the neural network were heartbeat vectors, composed of two single-lead heartbeats randomly selected from each subject.The authors achieved a 0.59% EER on the subset of healthy subjects contained in the PTB database.For most of these subjects, only a single ECG recording was provided and, hence, multi-session experiments could not be performed.
As emerged in this section, there are no common experimental protocols to realize and compare state-of-the-art technologies for ECG biometric recognition.The various DL systems proposed in the literature require as input heartbeats segmented and/or combined according to different procedures.Some of these procedures may be particularly suitable for specific databases and do not generalize well with others.Also, DL systems are evaluated on different scenarios featured with proper experimental settings, which makes any comparison difficult.Furthermore, to the best of our knowledge, the most realistic scenario of multi-session biometric verification is not sufficiently investigated in the literature.Existing studies on biometric verification do not specify whether the processed ECG samples are recorded in the same or in different sessions [18], [32], or consider databases providing only a single ECG signal for each subject [22].
To overcome all these issues, we provide in this study an in-depth analysis and benchmark of different experimental settings for ECG biometric recognition.We propose ECGXtractor, a DL method able to successfully perform biometric recognition in multiple scenarios, and with multiple databases.We make our system publicly available, so that it will be easy for the research community to reproduce our experiments.Moreover, a significant focus of our study is dedicated to the relevant scenario of multi-session biometric verification.III.PROPOSED METHOD Fig. 1 and Fig. 2 describe the pre-processing and feature extraction of ECGXtractor, our proposed approach for ECG biometric recognition.The source code and pre-trained models are available in GitHub 1 .Multiple ECG databases have been considered in this study to investigate how different properties of ECG signals may affect performance in ECG biometric recognition.For this reason, a set of preliminary operations is required to mitigate the discrepancies existing between ECG time signals recorded with different sensors.In particular, these preliminary operations allow us to train our DL system with a large-scale in-house database, and exploit the generated knowledge with multiple smaller databases.
A. Pre-processing ECG signals from different databases are recorded with frequencies of 1 KHz or 500 Hz.We downsample all of them to 500 Hz.We also apply Finite Impulse Response Filters to our ECG signals, to maintain frequencies between 0.7 and 90 Hz and remove those frequencies around 50 Hz, noisy due to power supply.Subsequently, for each ECG signal we identify its r-peaks through the reliable method ecg peaks, implemented in the neurokit2 toolbox [36], and discard its first and last r-peaks.Then, we build single segments, i.e., single heartbeats centered around each r-peak.According to a previous work [21], and considering the average heartbeat length of 0.8 s, we fix the length of our single segments to 0.32 s before and 0.48 s after the r-peak.In case of multi-lead ECG signals, we identify r-peaks on the signal recorded with Lead I and build single segments containing the multi-lead recording of single heartbeats.In our proposed method, the following approaches are studied: • template segments (templates along the paper), i.e., ECG segments with the shape of single heartbeats, obtained from the processing of all the single segments contained in an ECG signal.• summary segments, i.e., ECG segments with the shape of single heartbeats, obtained from the processing of ten consecutive single segments contained in an ECG signal.
For template generation, we adopt a procedure similar to the one presented in [21], here described for a generic ECG signal: 1) At first, all the segments representing single heartbeats are identified by means of the segmentation described above.These single segments contain 400 time samples for each lead.
2) The element-wise average of the identified single segments is computed, separately for each ECG lead acquired.
3) The five single segments presenting the smallest Euclidean distance from the element-wise average segment are identified.4) The five identified single segments are element-wise averaged, separately for each lead, to obtain the final template.
This procedure aims to minimize the effect of noise contained in ECG signals.The same steps are applied to generate summary segments, with the only difference that blocks of ten consecutive single segments are considered in step 2) instead of entire ECG signals.As a consequence, multiple summary segments and only one template can be obtained from single ECG signals.
Finally, the amplitude of the ECG segments, regardless of whether they are templates, single segments, or summary segments, is normalized to a fixed value of 2 mV, multiplying every time sample by the ratio between 2 mV and the current amplitude of the segment.The operation, performed separately for each lead, aims to eliminate the amplitude variability existing between the different databases.In Fig. 1 we provide a summary of the described operations.

B. Feature Extraction
To extract features from ECG segments (i.e., template, single segment, and summary segment), we consider a DLbased Autoencoder, i.e., a neural network composed of two parts: i) an encoder, that reduces the size of the input data and learns its encoded representation, and ii) a decoder, that attempts to reconstruct the input data from the encoded representation.In the context of ECGs, Autoencoders can be applied to many tasks, such as noise reduction and heartbeat type classification [37] or lower dimensional representation and biometric recognition [38].In this study, we make use of the Autoencoder for the latter task.
Our Autoencoder (Fig. 3) consists in a modified version of the Variational Autoencoder proposed in [39] for feature extraction and synthetic heartbeats generation.We only maintain the convolutional layers from the original architecture, applying small changes to them and discarding the fully connected and variational components.We observed that the features extracted with the original architecture were not intended for biometric recognition.single segment, or summary segment.Our encoder is composed of four groups of convolutional, batch normalization, ReLU activation, and max pooling layers.After the last max pooling layer, a further convolutional layer with two output channels is added, to reduce the size of the latent features that will be extracted.Our decoder presents an almost symmetric architecture, to reconstruct the ECG segment provided as input to the encoder.It is important to point out that features are extracted independently from each lead, i.e., numerical values from different leads are never combined to generate features.Hence, the same pre-trained Autoencoder is applied for feature extraction for both 1-Lead and 12-Lead ECG segments, which is a considerable advantage for real applications.We exploit the Autoencoder to extract features from each segment of interest, i.e., templates, single segments, and summary segments.According to the architecture of our Autoencoder, two channels of 25 temporal features each are extracted from each lead of the segment provided as input.Experimental trials showed the advantages of considering two channels instead of one.Hence, the size of the features generated from each segment is 25 × l × 2, where l is the 1L-and 12L-Siamese CNN number of leads considered.In our experiments, we consider single-lead segments (l = 1) and 12-lead segments (l = 12).We observe that, by building templates or summary segments and extracting features from them, we achieve a considerable data minimization compared to the size of original ECG signals, without losing data usefulness.The extracted features are considered in both recognition tasks investigated in the study: verification and identification.In Fig. 2, the key aspects of both ECG biometric recognition tasks are presented.

C. ECG Biometric Verification
In ECG biometric verification, features extracted from different leads are combined together during their processing.For this reason, two Siamese Convolutional Neural Networks (Siamese CNNs) are trained for the task of verification, one for single-lead ECG signals (1L-Siamese CNN) and one for 12-lead ECG signals (12L-Siamese CNN).In case of singlelead ECGs, we consider signals recorded with Lead I, as this is the typical lead available in ECG databases and the most basic lead measured by smartwatches and other wearable devices [40].The two Siamese CNNs require as input pairs of feature vectors extracted from two different ECG segments, and predict if such segments belong to the same subject or not.1L-and 12L-Siamese CNNs are depicted in Fig. 4. For each pair of feature vectors, the Siamese component generates two vectors of 1024 features.The Euclidean distance is calculated for each of the 1024 features, and the resulting vector is processed to provide final match/not-match decisions.
The two Siamese CNNs are trained and evaluated with genuine and impostor pairs of features extracted from different ECG segments.For each subject considered during training, we build their template with the first available ECG and extract single segments from their remaining ECGs.By matching templates with single segments of same and different subjects, we generate genuine and impostor pairs for training.We consider single segments less reliable than templates and summary segments to represent the intra-user variability.For this reason we employ single segments only during training, to provide our system with the knowledge derived from potentially more challenging pairs, and not during evaluation, where summary segments and templates are considered respectively in singleand multi-session scenarios.
We evaluate 1L-and 12L-Siamese CNNs in both single-and multi-session acquisition scenarios.In single-session scenario we have the constraint to dispose of only one ECG signal for each subject.Hence, we divide the considered ECG signal in blocks of ten consecutive single segments, and generate a summary segment from each block.By randomly matching summary segments of same and different subjects, we create genuine and impostor pairs for evaluation.We consider multisession verification the most important scenario of this study, as it represents the most realistic situation for commercial and widespread biometric recognition technologies.In multisession verification the system performance may be negatively affected by intra-user variability, occurring when biometric traits of the subject change over time [41].To investigate this aspect, we generate enrolment templates with the first ECG signal of each subject, and probe templates with the other available ECGs acquired in other time sessions, to create realistic genuine/impostor pairs for evaluation.

D. ECG Biometric Identification
To assess the validity of the features extracted with our Autoencoder and compare our DL system with others studies in the literature, we also design experiments in the scenarios of single-and multi-session identification.We consider the same trained singular Siamese component of our 1L-or 12L-Siamese CNN for the verification task as described in Sec.III-C, according to the number of leads of input data, and include at the end a new fully connected layer with output dimension that varies according to the number of subjects considered in each experiment.This is the only layer to train for the identification task.For all the experiments of identification we consider summary segments generated from blocks of ten consecutive single segments, so that we have enough samples to train and evaluate our system.
IV. DATABASES Four different databases have been considered in this study.They contain single-and multi-lead ECG signals, recorded with "on-the-person" or "off-the-person" modality.With these four databases, we evaluate the impact of different ECG properties on biometric recognition.We use an in-house database [14], [42], and three public databases widely used in the literature, to compare the performances achieved in our experiments with other studies.The main characteristics of the databases are reported in Table II.

1)
In-house Database: The first database is an in-house collection of 295,649 12-lead ECG signals recorded with the Philips 12-lead machine (https://philips.to/36CPabZ)from a cohort of 122,622 subjects at La Princesa University Hospital (Madrid, Spain), with approval by the Clinical Ethics Committee.We exclude from the study all the ECGs not recorded during "sinus-rhythm".Then, we divide the 138,706 remaining ECGs in two groups: i) those belonging to subjects with only a single ECG recorded (single-session acquisition, 55,967 ECGs from 55,967 subjects, age 54.88 ± 20.15, 52.92% women), and ii) those belonging to subjects with two or more ECGs TABLE II: Summary of the main characteristics of the considered databases.On-TP = "on-the-person", Off-TP = "offthe-person", SR = Sinus Rhythm.2) PTB: This database is collected from Physikalisch-Technische Bundesanstalt (Germany) with a non-commercial prototype recorder [30].The database contains between one to five ECG signals from 290 subjects (209 men, mean age 57.2).In our experiments, we consider signals related to both 12 standard leads and single Lead I. We identify two notdisjoint sets of subjects: i) those with multiple recordings (113 subjects), and ii) those considered healthy (52 subjects).In the literature, both sets have been considered for experiments.PTB is one of the biggest public 12-lead ECG databases suitable for multi-session acquisition analysis.
3) ECG-ID: This single-lead (Lead I) database is collected with limp clamp electrodes that imitate the scenario of user interaction with practical identification systems [31].ECG signals are recorded from 90 volunteers (44 men, aged from 13 to 75).The number of ECGs for each subject varies from 2, collected during one day, to 20, collected over 6 months, except for a subject with a single ECG signal that we exclude from the analysis.In multi-session analysis, we consider the first two ECGs for each subject to compare our results with other studies.
4) CYBHi: This is an example of "off-the-person" database, where ECG signals are recorded with dry Ag/AgCl electrodes [43].ECG signals present virtually the same morphology of Lead I derivation of a standard 12-lead medical ECG.The database contains two datasets: i) short-term, with multiple ECG signals recorded in a five minutes-period from 65 healthy participants (49 men, age 31.1 ± 9.46) subject to different external stimuli, i.e., low and high arousal videos, and ii) long-term, with two ECG signals recorded with three months-distance from 63 healthy participants (14 men, age 20.68 ± 2.83).We consider ECG signals from the long-term dataset to address the most challenging scenario.CYBHi presents numerous challenges compared to other databases, due to the lower signal-to-noise ratio of its ECG signals.

V. EXPERIMENTAL WORK: VERIFICATION TASK A. Experimental Protocol
We consider our large-scale in-house database, divided in two groups as specified in Sec.IV.Data from the first group (single-session acquisition) are used to train the Autoencoder.Data from the second group (multi-session acquisition) are used to perform experiments in ECG biometric recognition, as they provide multiple ECG signals for each subject.In this study we perform cross-dataset evaluation, as we train ECGXtractor with the in-house database and evaluate it with several additional databases.
At first, we train our Autoencoder with the single segments extracted from the ECG signals contained in the first group, composed of 55,967 ECGs that we divide into training and validation sets, with 80:20 ratio.Up to three single segments of each ECG signal have been considered, having in total 133,575 single segments in the training set and 33,357 single segments in the validation set.We use mean squared error as loss function, with Adam optimizer and initial learning rate of 0.001.At each epoch we evaluate the loss function on the validation set.We halve the learning rate if the function does not decrease for two consecutive epochs, and we stop the training if the function does not decrease for six consecutive epochs.
Then, we focus on the training and evaluation of our 1Land 12L-Siamese CNNs, for which we consider the same settings specified for the training of the Autoencoder, with cross-entropy as loss function.We employ the ECG signals contained in the second group, composed of 26,007 subjects that we divide into training, validation, and test sets, according to 70:10:20 ratio.The same test subjects are considered for both single-and multi-session acquisition scenarios.
Given that subjects are randomly selected to generate impostor pairs, we evaluate our two Siamese CNNs ten times for each scenario and database considered.The results reported in this study are the averages of the values obtained in the ten executions of each specific scenario and database.Details of the experimental protocol considered for the training of the Siamese verification system, and the final single-and multisession evaluation are provided in Table III and discussed in the following.We note that in single-session evaluation, enrolment and probe segments are obtained from the second session of each subject, for a better comparison with multi-session scenario.Also, in multi-session evaluation we consider only one genuine comparison for each subject, as many subjects only have two ECG signals, and both enrolment and probe templates are obtained from whole signals.We provide next more details regarding the training of the Siamese verification system and the final single-and multi-session evaluation: • Training: In total, we consider 54,351 genuine and 271,755 impostor pairs to train our two Siamese CNNs (i.e., 1-Lead and 12-Lead scenarios).We use the same pairs for both Siamese CNNs, with the only difference consisting in the number of leads.To sum up, we underline that in single-session scenario only experiments with summary segments can be performed, as we dispose of a single ECG signal for each subject.Nevertheless, in multi-session scenario we dispose of multiple ECG signals for each subject, and we observed that templates instead of summary segments provide better performances.

B. Experiment 1: Single-Session Verification
In Table IV we report the performance achieved in terms of EER, along with the number of genuine and impostor pairs tested.It is important to highlight that only the inhouse database has been considered for training ECGXtractor.Therefore, in Table IV we can also analyze the generalization ability of ECGXtractor to other databases and sensors.We observe better EERs when we consider 12-lead instead of 1-lead ECG signals, i.e., 1.28% vs 3.27% EER for the inhouse database and 0.14% vs 0.42% EER for PTB.Also, the two Siamese CNNs are able to provide perfect 0% EER for the ECG signals belonging to healthy subjects of PTB, and 1.52% EER with ECG-ID database, better compared to the results of [32] and [18] presented in Table I.We observe that our two Siamese CNNs, trained with the in-house database and evaluated with other databases, provide a remarkable generalization ability in cross-dataset evaluation.
Our 1L-Siamese CNN shows performance degradation when evaluated with comparison pairs obtained from CYBHi database (6.98%).The availability of only a single lead, the high variability between heartbeats recorded during the same session, and the low signal-to-noise ratio, peculiar of this database, may be the causes of the performance degradation.We observe that [18] achieves values ranging from 2.3% to 9% EER for CYBHi.However, that study considers a larger set of subjects, some of them provided with ECG signals recorded under different conditions, and the protocol adopted to generate comparison pairs is not clearly specified.
In general, worse values of EER for CYBHi database are common in the literature, compared to EERs achievable with other databases.For instance, high EERs are observable in [44] (up to 26.38%) and [45] (15.37%).

C. Experiment 2: Multi-Session Verification
In Table V we report the performance achieved in terms of EER, along with the number of genuine and impostor pairs tested, and the average distance in days between the ECG signals considered for each subject.Again, as described in the single-session experiment, only the in-house database has been considered for training ECGXtractor.Therefore, in Table V we can also analyze the generalization ability of ECGXtractor to other databases and sensors.
Analysing the results of this scenario, we observe that they are generally worse than those achieved in single-session verification, for both single-and multi-lead ECG signals.
In particular, the largest performance degradation affects the single-lead scenario for PTB database, with a worsening of EER from 0.42% to 5.12%.
The exception to this trend is represented by ECG-ID, that provides a very low EER of 0.15% for multi-session verification.We highlight that the ECG signals considered for each subject are recorded during the same day.Moreover, the possibility to average the entire amount of heartbeats to subject of PTB is correctly identified, also when considering the subset of 113 subjects.This result is consistent with those achieved in the literature [17], [32] and reported in Table I.

C. Experiment 2: Multi-Session Identification
In Table VII we also report the accuracy achieved for multi-(and mixed-) session.As in the case of verification, the performance decreases between single-and multi-session, with accuracy that goes from 100% to 96.46% for PTB.Higher accuracies in multi-session scenario are achieved in recent studies specifically designed for identification: 99.66% for PTB [34], and 100% for ECG-ID [33].We remind that identification is not the main focus of this study, and that these results are achieved with an architecture of our ECGXtractor trained for verification with the in-house database, while other databases are used for evaluation.No fine tuning is considered here, and only minor adaptations of our system have been made, proving the potential of ECGXtractor to extract discriminative ECG-based features for different tasks and scenarios.

VII. CONCLUSION
In this article we have investigated ECG biometric recognition through the proposal of ECGXtractor, a novel DL method, exhaustively evaluated across multiple scenarios and precise experimental settings.We release in GitHub the trained weights of our DL system, along with the material used to carry out our experiments 1 .This study aims to overcome the major drawback of ECG biometric recognition, i.e., the lack of standard experimental protocols, by making available a general public benchmark evaluation so that everyone can replicate it and compare their results with us.The proposed ECGXtractor method is robust in cross-dataset evaluation, with the following best EERs achieved in multi-session verification: 1.97% for the in-house database, 2.06% for PTB, 0.15% for ECG-ID, and 5.44% for CYBHi.Moreover, ECGXtractor performs perfectly when the subset of healthy subjects from PTB database is considered, in both single-session verification and identification.
In addition, the evaluation conducted with the "off-theperson" CYBHi database may present interesting implications.ECG signals like those of CYBHi are the easiest to record in widespread applications, and studies involving "off-theperson" databases may favour the diffusion of ECG biometric recognition technologies.The major drawbacks of these signals are the high level of noise and imprecision, that generally lower recognition performance compared to traditional ECGs.In this sense, a strategy analysed in this study and requiring further investigation is the fine-tuning of ECGXtractor with specific ECG signals presenting the characteristics of the signals of interest.For CYBHi, we were able to decrease the EER in multi-session verification from 7.97% to 5.44%.
Another future work consists in the protection of sensitive information that may be contained in the different types of ECG segments considered in this study [14].As we observed, the Autoencoder of ECGXtractor was not trained specifically for recognition tasks.Hence, the features extracted from ECG segments are generic, and we expect that they can be successfully exploited in other applications, revealing sensitive information such as age, sex, or medical pathologies [47].The risk assessment related to this aspect and eventual countermeasures shall be further investigated.

Fig. 1 :
Fig. 1: Graphical representation of the pre-processing operations of ECGXtractor performed on ECG signals to obtain template segments, single segments, and summary segments for ECG biometric recognition with normalized amplitude.Template segments are generated from all the single segments identified in the original ECG signal.Summary segments are generated from blocks of ten consecutive single segments (color image).

Fig. 2 :
Fig.2: Graphical representation of the feature extraction considered in ECGXtractor, performed on the different types of segments to carry out ECG biometric verification and identification.Features are extracted from the latent feature representation of an Autoencoder and, in case of verification, the creation of pairs of features (i.e., genuine-genuine and genuine-impostor pairs) is required before any further operation (color image).

•
Proposal of ECGXtractor, a new DL-based system suitable for different tasks based on ECG time signals, composed of a general feature extractor that can be used for different recognition tasks, i.e., identification and verification.The feature extractor consists in an Autoencoder trained with a large set of 166,932 heartbeats collected in an in-house database and extracted from the ECG signals belonging to 55,967 subjects.

TABLE I :
Comparison of different deep learning approaches for ECG biometric recognition.Some studies do not specify if their experiments are conducted according to single-or multi-session acquisition analysis.In such cases, we expect that multi-session is not considered.It is also possible that for some subjects only a single ECG signal is available in the database.Acc = accuracy, CWT = Continuous Wavelet Transform, EER = Equal Error Rate.
Our Autoencoder requires as input a time signal that represents an ECG segment, i.e., template,

•
Single-session Evaluation: For each subject, three genuine pairs and fifteen impostor pairs are generated by randomly matching summary segments of the same and different subjects.We verify that each generated pair contains different summary segments, and that the same pair is not considered multiple times.To evaluate our Siamese CNNs with the in-house database, we consider the test subjects not previously used for training and validation.With the in-house and PTB databases, we can evaluate both 1L-and 12L-Siamese CNNs.To obtain comparable results, the same genuine and impostor pairs

TABLE III :
Experimental protocol considered for the training of the Siamese verification system, and the final single-and multi-session evaluation.Multi-session Evaluation: For each subject we select two distinct ECG signals and generate an enrolment template from the first session, and a probe template from the second one.Then, we create genuine pairs by matching the two templates of each subject, and impostor pairs by matching the enrolment template of each subject with five probe templates of different subjects.The same subjects considered for the evaluation of single-session verification are used to evaluate our Siamese CNNs in multi-session.Also, the same comparison pairs are used to evaluate 1L-and 12L-Siamese CNNs with in-house and PTB databases. •

TABLE IV :
Description of the different evaluation sets and performances achieved in the scenario of single-session ECG biometric verification.EER = Equal Error Rate.

TABLE V :
Description of the different evaluation sets and performances achieved in the scenario of multi-session ECG biometric verification.EER = Equal Error Rate, FT = Fine Tuning.

TABLE VII :
Description of the different evaluation sets and performance achieved in the scenario of single-, multi-, and mixed-session ECG biometric identification.