Multi-feature Sparse Representations Learning via Collective Matrix Factorization for ECG Biometric Recognition

Electrocardiogram (ECG) signal is a promising biometric trait, and many methods have been proposed for ECG biometric recognition. However, it is challenging to design a robust and precise method to improve the recognition performance of ECG signals with noise and signal variation. We present a multi-feature sparse representations learning model via collective matrix factorization for ECG biometric recognition, MSRCMF for short. First, we extract one-dimensional local binary pattern (1D-LBP), shape and wavelet features of ECG signals and then obtain their sparse representations. Second, to extract discriminative information and preserve the intra- and inter-subject similarities, we leverage the collective matrix factorization on multiple sparse representations and the label information to obtain the latent semantic space. At last, we can recognize the ECG signals in the learned semantic space. Extensive experiments on four ECG databases show that MSRCMF can achieve competitive performance compared to state-of-the-art methods.


I. INTRODUCTION
E LECTROCARDIOGRAM (ECG) signal is a novel biometric trait as the identity of people, and ECG biometric recognition has been widely received attention [1]. Compared to other biometric trains, such as fingerprints, faces, iris, gaits and veins, ECG signals have several distinctive advantages as follows [2], [3]. 1) Liveness detection. Only living persons include ECG signals, which can prove the intrinsic liveness. 2) High security. Due the continuity and irregularity of ECG signals, it is difficult to counterfeit them comparison with the fingerprints, faces, iris and so on. 3) Hybrid information. ECG signals provide non only the information of identity recognition, but also disease detection and heart status. 4) Small data size. ECG signals are onedimensional data, and most of the other biometric traits are two-dimensional images. In addition, the good-quality ECG signals could be captured from a finger, and it is suitable and acceptable for the user as a biometric modality, which is crucial for improvement of security and privacy [4].
ECG signal is a promising biometric trait, and many methods have been proposed for ECG biometric recognition [5], [6]. However, there are still some issues that need to be further perform this technique robustly and precisely for the following reasons: 1) Noise. There are noises produced by ECG signal-acquisition devices, such as the noises from power line interference, baseline wandering, and electrode motion artifacts [7]. In addition, ECG signals are acquired on the skin's surface by sensors, and it is inevitable with contact noise. 2) Signal variation. ECG signals are easily influenced by the external factor, and the waveforms of ECG signals change over time [8]. Owing to noise and signal variation, it is a challenging problem to design a robust and precise method.
Sparse representation learning (SRL) can efficiently remove noises, and many methods based on sparse representation have been proposed for ECG biometric recogni-  [9]- [12]. SRL can represent a test samples by a linear combination of dictionary elements. However, the noise and sample variation can destroy the linear structure of dictionary elements, and the performance of SRL degrades when data are corrupted. Therefore, a challenging problem is how to design a robust method to improve the performance of SRL with noise and sample variation.
Recently, many multi-feature learning methods are proposed for biometric recognition [13], [14], [15], [16]. Different features from the same subject can provide complementary information, and the multi-feature learning can improve the recognition performance. Different features have different meanings, and the concatenated features may not be necessary to improve recognition performance. Therefore, a key problem of multi-feature fusion is how to learn a unified representation from multiple features.
The collective matrix factorization (CMF) can transform the original space into the latent semantics space and remove redundant information, and many CMF-based methods are proposed for biometric recognition [17] [18] [19]. However, the existing CMF-based methods learn semantic representations from cross-domains, without considering multiple features. Hence a challenging problem is to design CMFbased learning method from multiple features.
To tackle the above issues, we design a multi-feature sparse representations learning model via collective matrix factorization for ECG biometric recognition, which can leverage the collective matrix factorization on multiple sparse representations to obtain the latent semantic space. A simple illustration of our framework is shown in Fig. 1. The proposed learning method includes the training and testing phases. In the training phase, we leverage the CMF-based factorization technique to fuse multiple sparse representations and find a shared semantic space. In the testing phase, we can calculate the similarity between the probe and enrolled features in the shared semantic space.
The contributions of MSRCMF are summarized as follows: 1. We propose a multi-feature sparse representations learning model via collective matrix factorization for ECG bio-metric recognition, which can obtain the latent semantic space from multiple sparse representations, and is intrinsically different with current CMF-based frameworks.
2. To further enhance the discrimination of learned features, we integrate label information to preserve the intra-and inter-subject similarities.
3. We use an iterative optimization strategy to solve the optimization problem in MSRCMF, and the time complexity is linear to the database size.

II. RELATED WORK A. ECG BIOMETRIC RECOGNITION
There are fiducial-based, non-fiducial-based, and hybridbased approaches for ECG biometrics recognition [3].
Fiducial-based approaches use the feature of time domain, amplitude waveform, and intervals [20]. Venketsh et al. [21] used nine feature of ECG signals in the spatial domain for ECG biometric recognition. The challenge in fiducialbased approaches is that amplitudes and intervals of fiducial points are sensitive to noise. Therefore, these approaches are generally applicable to ECG signals in a relatively noisefree environment. Non-fiducial-based approaches usually extract the entire waveform morphology of ECG features. Hejazi et al. [22] extracted the autocorrelation coefficients for ECG biometric recognition. Tantawi et al. [23] extract the discrete wavelet coefficient of R-R intervals for ECG identification recognition. Louis et al. [6] extracted the multiresolution local binary patterns of ECG signals for identification recognition. Hybrid-based approaches simultaneously fuse fiducial-and non-fiducial-based features of ECG signals. Wang et al. [17] fused the multi-scale differential feature and one-dimensional local binary patterns to generate the base feature for identification recognition. Lim et al. [24] fused the hybridized the discrete waveform transforms and heartrate interval for ECG biometric recognition. Bassiouni et al. [25] presented a comparative analysis of the authentication performance of ECG biometric systems.
Recently, deep learning for ECG biometric recognition is a hot research area. AlDuwaile et al. [26] designed a small convolutional neural network to achieve better generalization capability by entropy enhancement of a short segment of a heartbeat signal for ECG biometric recognition. Labate et al. [27] presented a deep-ECG, a convolutional Neural Networks biometric approach for ECG signals. Srivastva et al. [28] proposed a novel ensemble of the state-of-the-art pre-trained deep neural networks for ECG biometric recognition. Da et al. [55] used two separate convolutional neural network techniques to extract useful representations for ECG biometrics. Hammad et al. [29] developed a secure multimodal ECG biometric system that used convolution neural network (CNN) and Q-Gaussian multi support vector machine (QG-MSVM) based on a different level fusion. Eduardo et al. [30] proposed an ECG-based biometric recognition approach using a deep autoencoder for feature learning.
Deep learning shows good performance based on largescale training data for massive hyperparameters tuning. However, ECG biometric databases are relatively small, making it difficult to train a deep learning model.

B. SPARSE REPRESENTATION LEARNING
Assume that X t = [X t 1 , X t 2 , ..., X t C ] ∈ R dt×n represents training samples, C is the total number of classes, X t i = x t i1 , x t i2 , ..., x t ini ∈ R dt×ni , and n = C i=1 n i . A test sample y can be reconstructed by a sparse linear combination of training samples X t as: where w p = 0, ..., 0, x t i1 , x t i2 , ..., x t ini , 0, ..., 0 ∈ R n is a sparse coefficient vector with only nonzero elements associated with the i th class. It should be noted that the advantages of representing the test sample as a linear combination of training samples have been explored in [31]- [33].
According to sparse representation learning [33], we can obtain the sparse representation coefficient of sample y by solving the following optimization problem: where λ 1 is a regularization coefficient, w p ∈ R n is a sparse representation coefficient, and • 1 is the L 1 norm.

III. PROPOSED METHODOLOGY A. NOTATIONS
is the training set of multiple sparse representation matrices, where D is the number of multiple features, each sparse representation matrix X t = [x t 1 , x t 2 , ..., x t n ] ∈ R dt×n contains n samples, and d t is the dimension of the t th feature. Let L ∈ R p×n denotes the ground-truth label matrix, where p is the number of subjects, L ki = 1 if the i th base feature belongs to subject k, otherwise L ki = 0. {Y t } D t=1 represents testing sets with D different sparse representation, and each sparse representation matrix Y t = y t 1 , y t 2 , ..., y t q ∈ R dt×q consists of q samples. A test sample y i = y 1 i , y 2 i , ..., y D i contains D features, and our objective is to identify the class that sample y i belongs to.

B. PREPROCESSING
First, to remove the noise of ECG signals, we use the wavelet threshold denoising method to obtain the useful signals. Second, the R peaks of ECG signals are located by the Pan-Tompkins algorithm [34], and we can divide ECG signal into heartbeat segmentations by setting each heartbeat fixed length before and after the R peak. Then, we normalized all each heartbeats between 0 and 1. At last, all heartbeats from the same subject are superposed, and ECG templates are obtained by the mean heart value.

C. MULTIPLE FEATURE EXTRACTION
We extracted the shape, wavelet, and 1D-LBP features from ECG segments, respectively. We extract 30 shape features from heartbeat, which include 15 temporal distance features, six amplitude distances, and 10 morphological descriptors [35], [36]. The discrete wavelet coefficients of the heartbeat are obtain as wavelet features, and we chose Daubechies wavelet Db3 with five levels of decomposition to obtain the heartbeat feature value [37]. The 1D-LBP method [38] can effectively extract binary codes from ECG signals by comparing each sampling point with its neighbors. We can obtain the multiple sparse representation matrices from the extracted features by the sparse representation learning.

D. FORMULATION
CMF techniques can remove redundant information and obtain the latent semantic representation. Specifically, a matrix X t can represented by two rank-r matrices U t and V t via CMF as where X t ∈ R dt×n , U t ∈ R dt×r , V t ∈ R r×n , and r is the dimension of the learned latent semantic space, r d t , and each column vector v i of V is a latent factor vector of a sample in the latent semantic space .
Intuitively, different sparse representations of a sample all describe the same subject, justifying that they should have the same semantics. Based on this, we can find that different sparse representations share the same latent semantic space learned by CMF. In the light of this, we have the following cost function for each sparse representation: where λ and γ are the balance parameter, and • F denotes the Frobenius norm. In the original space, different sparse representations of one sample have the same semantic labels, so the representations of one sample should also contain the same label in the common latent semantic space. With this in mind, we define where L ∈ R p×n is the ground-truth label matrix, G ∈ R p×r is the label projection matrix, and p is the number of classes. VOLUME 4, 2016 Combining Eq. (4) and Eq. (5), we have the following objective function: By solving Eq. (6), we can obtain the learned semantic matrix V of training samples. Furthermore, to obtain the projection matrices by the learned semantic representation on the training set, we need the specific function to map testing samples from the original space to the semantic space. Thus, a project matrix is needed and can be formulated as follows: where W t is the projection matrix, γ is the balance parameter. Combining Eqs. (6) and (7), we have the following overall objective function: Where D is the number of sparse representations.

E. OPTIMIZATION
The objective function in Eq. (8) is non-convex with multiple variables U t , W t , V , G. To tackle the optimization problem, we can solve it by varying any one of these matrix while fixing the others until convergence or the maximum number of iterations is reached as following steps.
Step 1: Fixing others and solving U t . Fix the other variables, we can obtain U t by solving the following minimization problem: To solve the Eq. (9), we set the derivative w.r.t. U t to zero, and can obtain U t by a closed-form solution as: where I ∈ R r×r is the identity matrix.
Step 2: Fixing others and solving G . Fix the other variables, we can obtain G by solving the following minimization problem: Similar to the optimization of solving U t , we can obtain the analytical solution of G as follows: where I ∈ R r×r is the identity matrix.
Step 3: Fixing others and solving V : By fixing other variable and setting the derivative of Eqn. (8) w.r.t V to zero, we can obtain: Step 4: Fixing others and solving W t : By fixing other variable, we can obtain W t by solving the following minimization problem:: By setting the derivative of Eqn. (14), we can obtain the analytical solution of W t as follows: We alternately update all variables until it converges, and Algorithm 1 provides an overall procedure of the above optimization scheme.

Algorithm 1 Optimization Algorithm of MSRCMF
Input: Training data matricx X t of t th feature, parameters α, β, λ, γ, label information matrix L, and the total iterative number c, t = 1, 2, ..., D. Output: Latent basis factor U t , latent semantic representation V , label projection matrix G, project matrix W t . 1: Randomly initialize U t , W t , V and G ; 2: for k=1 to c do Fix others and update U t using Eq. (10); Fix others and update G using Eq. (12); Fix others and update V using Eq. (13); Fix others and update W t using Eq. (15); End for. 3: return: U t , W t , V and G.
After c iterations, the order of magnitude of the overall computational complexity in the training stage is O(n(r 2 + Ddr + d 2 + Cr)c), where C is the number of classes, r is the dimension of the latent semantic space, n is the size of the training set, and d = D t=1 d t , d t is the dimension of the t th feature, D is the number of features. It is noted that the time complexity is linear to the database size.

F. MATCHING
We divide the testing set into the enroll and probe set, which are used as matching templates and query samples, respectively.
First, as illustrated in Fig. 1, testing set is preprocessed and segmented, we obtain the ECG heartbeats. Second, we extract 1D-LBP, shape and wavelet features of heartbeats D t=1 , X probe = X t probe D t=1 , respectively. Third, we extract the high-level semantic representations of the enroll and probe set, denoted as W t X t enroll and W t X t probe , respectively. At last, we use the Euclidean distance to calculate the similarity between the probe and enrolled samples as is the j th probe sample, d ij is the distance between the i th enroll and the j th probe sample. We can recognize the probe sample by comparing d ij to the set threshold.
MITDB is widely used in ECG biometrics and heart disease, and includes 48 two-lead recordings of 47 subjects with a 360 Hz sampling frequency and 11 bits per sample resolution. In our experiments, we choose each one recording from 47 subjects.
PTBDB contains 549 recordings from 290 subjects with healthy subjects and heart diseases patients. Each subject from PTBDB has 1 to 5 recordings from 12-lead standard and three Frank leads, ranging between 38.4 and 104.2 seconds. In our experiments, 248 subjects with one recording are chosen, and each recording is longer than 100 seconds.
ECG-ID database contains 310 recordings of 90 volunteers, which has 44 male and 46 females with age from 13 to 75. Each recording of ECG-ID includes ECG lead I, recorded for 20 seconds and digitized at 500 Hz with 12-bit resolution over a nominal ±10 mV range [41]. The number of records from each volunteer varies from 2 to 20, and the acquisition time is from one day to over 6 months. In our experiments, we utilize 89 subjects with two recordings, and the volunteer 74 has only one record, so we discarded it.
UofTDB was specifically created for biometrics, and was captured from the thumbs of both hands with 200 Hz and 12 bit resolution. It is the largest off-the-person public database right now. ECG data in UofTDB covers six sessions, which are named as S1, S2, S3, S4, S5 and S6, and are captured from 1020 subjects between 18 and 52 years. All sessions are acquired under five postures: sitting, standing, exercising, supine and tripod. The first session includes 1012 subjects, from 100 subjects selected to participate in follow-up sessions over a period of six months. In our experiment, we consider 46 subjects in a sitting posture, who participate in all five sessions (S1, S2, S3, S4 and S6).
In MITDB, ECG-ID and PTBDB databases, each subject select one session, and we randomly take 60% of sessions per subject as the training set, 30% for enroll set, and 10% for probe set. We choose two recordings of each subject in ECG-ID, which are called as T1 and T2, respectively. To test our method for multisession data, in ECG-ID, we consider one recording of each subject as the training set, and the other recording is used as the testing set. In UofTDB, we consider S1 as the training set, and S2, S3, S4 and S6 as testing set. In all databases, the heartbeats are segmented by setting 800 milliseconds time length. Experiments are performed on a PC with a 3.60-GHz Core i5-5200U CPU and 8 GB RAM.

B. PERFORMANCE METRICS
To evaluate the performance of the proposed approach, we perform the identification mode and authentication mode. In the identification mode, we use heartbeat recognition rate = N _correct_beat N _probe_beat (17) where N _probe_beat is the total number of probe heartbeats, and N _correct_beat is the number of correctly identified probe heartbeats. Like most work in the literature [3], [5], [43], we also use the subject recognition rate as the identification criterion, which is computed by several continuous heartbeats of voters. For this criterion, each segment includes several continuous heartbeats, and we define subject recognition rate = N _correct N _sample (18) where N _sample is the total number of probe segments, and N _correct is the number of correctly identified probe segments. In all experiments, we use three heartbeats as a segment, which are acquired in about 2-4 seconds, and it is acceptable for practical application.
In the authentication mode, we perform a similarity measure with a segment and all other segments from the same class and different classes. The equal error rate (EER) is the equality of the false acceptance rate (FAR) and false rejection rates rate (FRR). 20) where N F A is the number of false acceptance, N IRA is the number of impostor attempts, N F R is the number of false rejection, N GRA is the number of genuine user attempts.

C. EFFECTS OF PARAMETER SETTINGS
First, we vary λ, α, β, and γ in range of [0,10] with step size of 0.01, and obtain the best parameter values at the same VOLUME 4, 2016 time. When λ = 2.46, α = 0.25, β = 0.12, and γ = 0.45, we can achieve the best performance. Second, we evaluate the performance influence of the number of iterations c, and the result is shown as Fig. 3.  Fig. 2, we can see that the heartbeat recognition performance begins to reach better value when c reaches 60, which means that our approach can converge.
At last, we evaluate the performance influence of the dimension r of the semantic space, and the result is shown as Fig. 3.  As we can see from Fig. 3, on all databases, the ranges of r are different when it achieves much better heartbeat recognition rate on the four databases.

D. PERFORMANCE OF PROPOSED METHOD
First, we evaluate the performance of MSRCMF and baseline methods for ECG biometrics. We adapt shape, wavelet, 1D-LBP and single-feature MSRCMF as the baseline methods. Table 1 lists the results of heartbeat recognition rates with different baseline methods. As shown in Table 1, the performance of MSRCMF is superior to all baseline methods on the four databases, which demonstrates that the CMF-based multi-feature sparse representation with label information has high discrimination. For example, the heartbeat recognition rates of MSRCMF is 95.67%, 86.48%, 87.64%, and 87.23% on MITDB, PTBDB, ECG-ID, and UofTDB, respectively.
Second, we compare the performance of MSRCMF with multi-feature biometrics methods, such as kerSMBR [44], RMRLRJS-C [45], JSSLCP [3], and RCNMF [46]. Table 2 lists the heartbeat recognition performance of multi-feature methods. From Table 2, we can see that MSRCMF has better performance than the other multi-feature methods, in part because MSRCMF uses multi-feature sparse representations and semantic embedding with label information, which has a better discriminative feature in semantic space.
Furthermore, we compares the heartbeat recognition performance with the multisession records. Table 3 lists the heartbeat recognition performance of our method and other methods in different scenarios.
From Table 3, compared to the baseline methods, our method achieves the best results on ECG-ID databases in different scenarios. Compared with deep methods, the performance of our method is comparable with MobleNet in [47], and is worse than Small CNN in [26]. The multisession data have more signal variation, and our method is superior to the baseline methods. The multi-feature sparse representations of the multi-session data are complementary, and the collective matrix factorization can remove redundant information, so our method can enhance the discrimination of learned representations for multi-session data. In addition, our method integrates label information to preserve the intra-and inter-subject similarities, which can improve performance for multi-session data. At last, we validate the robustness of MSRCMF. Gaussian noise with different variance σ 2 is added on four databases, and we compare the robustness of MSRCMF, RCNMF, shape, wavelet, and 1D-LBP. The heartbeat recognition rates with different noise levels are shown in Fig. 4.  It can be seen that MSRCMF with the influence of Gaussian noise can perform more stably than the other methods.

E. COMPARISON WITH STATE-OF-THE-ART METHODS
To further evaluate the performance of our method, we compare the proposed method with the state-of-the-art methods on four databases, including the methods of [ [56], which are deep neural network-based ECG biometric methods. The performance comparison is shown in Table 4. From Table 4, compared to the non-deep methods on all four databases, MSRCMF has the best recognition performance. On MITDB, our method achieves the best recognition rate compared to the deep learning methods. On ECG-ID, PTBDB and UofTDB, MSRCMF slightly worse than the methods in [26] [51] and [56]. Therefore, we can see that MSRCMF performs competitively with state-of-the-art methods on all four databases.
It is worth noting that MSRCMF and our previous work in [57] are different. First, our previous work in [57] extracted multiple LBP histograms as feature descriptors from segmented ECG signals, and MSRCMF obtain the sparse representations as features descriptor. Second, our previous work in [57] only constrained the semantic matrix V , and MSRCMF used Frobenius norm to constrain all dynamic matrices, including the latent basis factor U , semantic representation V , label projection matrix G, project matrix W . Third, all matrix variables of the objective function are nonnegative during the computing procedure in our previous work, and all matrices in MSCMF are not necessary nonnegative. Therefore, MSRCMF and our previous work are entirely different in the feature extraction, object function and computing procedure.

F. TIME COST ANALYSIS
In this section, we give the time cost analysis of the proposed method. The time costs are showed in Table 5. Duo to the unavailable source code of the other methods, we only show the average time of our proposed method in training, preprocessing, feature extraction, and matching of each heartbeat.  From Table 5, we can see that the training feature extraction, and matching of our method is fast, and it is feasible time-consuming in practical application.

V. CONCLUSIONS AND FUTURE WORK
We have proposed a multi-feature sparse representations learning model via collective matrix factorization for ECG biometric recognition, which can extract discriminative information and preserve the intra-and inter-subject similarities in the latent semantic space. MSRCMF can obtain the latent semantic space from multiple sparse Representations, and is intrinsically different with current CMF-based frameworks. Extensive experiments on four ECG databases show that MSRCMF can achieve competitive performance compared to state-of-the-art methods. In future work, we will explore a new ECG recognition approach, which can incorporate the novel loss function into deep learning to further improve the performance.