Can a Smartband be Used for Continuous Implicit Authentication in Real Life

The use of cloud services that process privacy-sensitive information such as digital banking, pervasive healthcare, smart home applications requires an implicit continuous authentication solution, which will make these systems less vulnerable to the spoofing attacks. Physiological signals can be used for continuous authentication due to their uniqueness. Ubiquitous wrist-worn wearable devices are equipped with photoplethysmogram sensors, which enable us to extract heart rate variability (HRV) features. In this study, we show that these devices can be used for continuous physiological authentication for enhancing the security of the cloud, edge services, and IoT devices. A system that is suitable for the smartband framework comes with new challenges such as relatively low signal quality and artifacts due to placement, which were not encountered in full lead electrocardiogram systems. After the artifact removal, cleaned physiological signals are fed to the machine learning algorithms. In order to train our machine learning models, we collected physiological data using off-the-shelf smartbands and smartwatches in a real-life event. By applying a minimum quality threshold, we achieved a 3.96% Equal Error Rate. Performance evaluation shows that HRV is a strong candidate for continuous unobtrusive implicit physiological authentication.


I. INTRODUCTION
Implicit continuous authentication is required for cloudoriented services that grant access to the privacy-sensitive information domains such as mobile banking, pervasive healthcare [1], [2]. Smartphones, computers, smartwatches, and Internet-of-Things (IoT) devices become more dependent on these services. It is expected that the number of IoT devices will be more than 75 Billion in 2025 [3]. However, these services are vulnerable to attacks once users authenticate. For example, a smartphone can be forgotten logged-in the privacy-sensitive services and information can be stolen by the attackers. A straightforward mechanism can be asking a password to the user frequently. However, this is not very pleasant for service users. Continuous authentication should be implicit. Face-based systems can be tricked by using presence attacks, such as printing the face of the victim on a paper. Storing the face pictures of the users also create a privacy concern [4]. Furthermore, fingerprint, which is The associate editor coordinating the review of this manuscript and approving it for publication was Emanuele Lattanzi . another prominent traditional biometrics modality along with the face-based systems, can be easily manipulated [5] and fails on liveness detection tests. On the other hand, biosignals are difficult to temper with, and they inherently have liveness detection features [6]. Heart activity is unique to individuals and biosignal authentication research has started investigating this signal [6]. One of the most essential properties of this signal is the Heart Rate Variability (HRV). Although research in the past mostly focused on the connection between HRV and different types of health disorders [7], the validity of using HRV for biometric recognition is supported by the fact that the physiological and geometrical differences of the heart in different individuals display certain uniqueness in their HRV features [8].
High-end wearable systems are expensive and provide low comfort for the users, which limit their wide range application. Recently, smart bands and smartwatches became widely adopted by consumers. These devices are equipped with a rich set of sensors such as accelerometer, heart rate monitor, and skin conductance. These advances create an opportunity to build a continuous implicit authentication system. However, these devices are prone to activity-related errors [9], unlike full lead ECG systems. Modality specific artifact detection and removal mechanisms should be developed for accurate measurements. A solution suitable for IoT connected devices should be context-independent because every service may require different types of behavior. Therefore, systems that only work in certain scenarios, such as while typing or walking, are very limited in terms of the application area. The physiological parameters like hidden heart-related biometrics are more suitable for this purpose due to their uniqueness and activity independence [10], [11]. We propose an unobtrusive, low cost, activity-independent continuous authentication system with smartbands. We implemented our solution on Empatica E4 Smartband [12], Samsung Gear S, and Gear S2 [13], which are equipped with a photoplethysmography (PPG) sensor.
Let's think about a scenario, where John has to make money transactions through a mobile banking smartphone application. He logins to the system via two-factor authentication. After he successfully makes the transaction, he forgets to logout. Then, an attacker withdraws all the cash from John's account by using his session. If John has a continuous biometric authentication system, he would not have this problem. Such a system can be also used in services like city bikes and electric scooter rental services which have become popular in recent years. The literature on continuous authentication with physiological signals gathered from smartbands is limited. The effectiveness of HRV features derived from PPG sensors of smartwatches and smartbands is still unknown for continuous authentication [4]. We show that HRV can be a promising candidate for biometric authentication. Our proposed system enhances the state of the art as follows: • One of the previous studies employed mean heart rate per minute which requires longer recordings and achieves approximately five minutes [14]. The proposed system uses more sophisticated HRV features derived from inter-beat intervals which are extracted from the raw PPG sensor data.
• Most of the previous works with smartbands only focused on continuous authentication while using a laptop, a smartphone, an ATM and used lifestyle related metrics. Our solution is not activity dependent. For example, a user may use this system during any activity such as live streaming, on social media, cycling, presentation or working in an office.
• Our solution is comfortable, unobtrusive, seamless and works without interrupting the ordinary pattern of any activity.
• Real-life data contains artifacts. We evaluate the effect of data quality on system performance.
In Section II we provide background information on the smartwatch framework and heart rate variability. In Section III, the related work on continuous authentication and the comparison with our work in terms of its novelty are presented. In Section IV, we describe the proposed system for continuous authentication with smartwatches using heart rate variability. In Section V, we explain the conducted experiment for the proposed system. In Section VI, we provide the results of our system. In Section VII, we present the conclusion and future works of our study.

II. BACKGROUND
A. SMARTBAND FRAMEWORK Smartbands (some times called wristbands or smartwatches) are comfortable devices that can be attached to the wrist or arm. The devices used in this study are shown in Figure 1. Recently, the battery life of smartbands is increased to days. For example, the battery life of Empatica E4 is two days when all sensors are used [12]. The extended battery life of smartbands enables monitoring physiological signals gathered from individuals for long periods. Most of the wristbands are equipped with PPG which is an optically obtained signal that can be used to detect blood volume changes in the microvascular bed of tissue [15]. From the PPG signal, the time between the beats (RR intervals) can be computed. Most of the modern smartbands provide RR intervals thanks to their APIs. For example, Samsung smartwatches use the Tizen framework which has Human Activity Monitor API to gather the RR intervals. The sampling frequency of the PPG sensor can vary from 20Hz to 100Hz (64Hz for E4, 100Hz for Samsung Gear Series) in many smartbands. Cubic spline interpolation is used to detect the beats more accurately and most of the devices correct the heartbeats by using an accelerometer sensor for detecting the movements of the subject. This functionality is available in Tizen and Empatica API [12]. These devices are also equipped with Bluetooth and recently NFC chips which enable them to connect to smartphones, edge and cloud services. These short-range network interfaces can be used to check if the user is in the close proximity of the computer they are using. Even though all of the smartwatches are in consumer grade product form, Empatica E4 is mainly developed for researchers with their APIs. The price of the Empatica E4 is higher than Samsung models. VOLUME 8, 2020 B. HEART RATE VARIABILITY The variability of RR intervals is called HRV [16]. It is a very important feature for recognition of certain psychological, physiological and personal properties of an individual [16]. In the literature, Kubios [17] is a popular HRV feature extraction tool to compute the HRV. Non-linear, time domain and frequency domain features can define the variability of the heart [17]. The calculation of frequency domain and time domain features of HRV is computationally effective, thanks to Fast Fourier Transform (FFT) O(n log n) for frequency domain and O(n) for time-domain features. There exists smartphone applications that calculate HRV in real time [18], [19].

III. RELATED WORK
The behavioral and physiological biometrics from the wearable devices via sensors have become popular in individual recognition and authentication models. Some models focus on biometrics such as face [27], voice [28], fingerprints [29], [30], electroencephalography (EEG) [31]- [33], ECG [30], [34], [35] and phonocardiography (PCG), [36]. The continuous authentication field is a fast-growing field, however, the literature on a system that is not dependent on a certain type of task is limited. Most of the previous work using physiological signals have been done on laboratory-grade equipment. Some of these sensors are not available in unobtrusive devices such as smartphones, smartwatches, and smart bands. On the other hand, fingerprints and face-based authentication systems can be easily deceived. Authentication with voice has privacy issues, which requires continuous voice recording of the environment.
As a method for recognizing individuals, Elkader et al. [20] presented a sensor-based motion biometric model that is suitable for 20 sedentary and non-sedentary activities (Vacuuming, Sweeping, Walking Downstairs, Walking Upstairs, Dusting, Iron Cloth, Folding Cloth, Washing Hands, Brushing Teeth, Washing Dishes, Washing Vegetables, Dicing, Peeling Vegetables, Grating, Stirring, Watching TV, Using PC, Talking on Phone Texting on Phone, Writing with Pen). They used different combinations of 3 sensors (acceleration, gyroscope and magnetometer sensors) on 6 different body positions (dominant wrist, dominant upper arm, nondominant wrist, chest, thigh, and ankle positions). They concluded that features extracted from the combination of six sensors reach the best classification accuracy in overall (98.3%). These activities are gathered in a laboratory environment with manual segmentation of the signals. Bao et al. [37] examined the heart rate variability features gathered from body area sensor network based PPG device by applying hamming distance. They collected data from 12 subjects in a stationary position.
Another approach for implicit identification and authentication based on activity information, WearAI, Zeng et al. [21], proposed a biometric model that utilizes accelerometer and gyroscope sensors from five body locations such as left  wrist (Shimmer 6DoF IMU), right ankle (Shimmer 6DoF IMU), center right hip/torso (Samsung Galaxy S4 i9500), left thigh/front pocket (Samsung Galaxy Nexus i9250), right upper arm (Samsung Galaxy Nexus i9250)). They achieved 97% accuracy with less than 1% false-positive rate. However, in both methods, placing many sensors on the body can be disturbing for the user in daily life usage.
Acar et al. [38] used smartwatches with keystroke dynamics for continuous authentication. Musale et al. [25] proposed a continuous authentication system based on Motorola 360 Sport by using accelerometer and gyroscope features. Vhaduri and Poellabauer [22] proposed continuous user authentication scheme that uses 44 features extracted from various biometrics (calorie burn, metabolic equivalent of task (MET), heart rate and step count) using Fitbit Charge HR device and they achieved average accuracy of 87.37% with Quadratic SVM classifier in one-to-many approach and average accuracy of 93% with Quadratic SVM classifier in oneto-one approach. In their revised scheme [14], they adopted more features (65) with different feature selection approaches and 93% (sedentary) and 90% (non-sedentary) with equal error rates of 5% is obtained. However, the Fitbit framework only provides only one sample each minute and access to the raw data is not possible. A system for continuous authentication with physiological signals should be low-cost and unobtrusive, and should not be dependent on certain activity for the sake of universality. We compared the proposed system with the related work in Table 1 in terms of device and device position, features, unobtrusiveness, environment, and dependency to the activity type. Our system outperforms other studies when feature engineering complexity, activity independence and unobtrusiveness are taken into consideration.

IV. PROPOSED SYSTEM
In this section, we explain our continuous authentication scheme. In Figure 2, we show the data collection application, preprocessing for artifact detection, feature extraction, feature selection and classification units of our system. The overall multi-factor authentication diagram where a user initiates his/her session with a password, or fingerprint, is shown in Figure 3.

A. DATA COLLECTION APPLICATION
We developed a data collection application in Tizen Wearable API 2.2 [39] for Samsung Gear S and S2. The firmware for Samsung Gear S is R750XXU1BNJ7 and for Samsung Gear S2 is R732XXU2AOJ3. Some firmwares may not support RR intervals. The application collects inter-beat intervals and 3D accelerometer data and stores them as downloadable commaseparated values (CSV files). Empatica E4 has a cloud based data collection application. The physiological signals can be downloaded as a CSV file. The gathered RR intervals from two different participants are shown in Figure 6. The sampling frequency of PPG sensor in Samsung Gear s and Samsung Gear S2 is 100Hz [39]. Empatica E4 PPG sensor provides 64Hz sampling frequency [12].

B. PREPROCESSING AND ARTIFACT REMOVAL
We implemented our preprocessing module in MATLAB [40]. First, we loaded the CSV file provided from the smartbands. The signal is segmented into non-overlapping time windows of 120 seconds. According to the HRV guidelines, 2 minutes is the minimum window length for calculation of short-term HRV features [16]. Since response time is important for a continuous authentication system, we selected the minimum possible duration. Therefore, the minimum required duration of physiological data for authentication is VOLUME 8, 2020 2 minutes. The artifacts in the RR intervals are detected by checking the difference between the consecutive points. We labeled the points exceeding more than 20% of the local average of consecutive 3 beats as artifacts, and the other points as the validated RR intervals, this threshold is selected from the previous works [41]. The points labeled as artifacts are deleted. After the removal, we implemented two different techniques. The first one is to interpolate the missing data points using a cubic spline interpolation algorithm which is commonly used [17]. The second technique is to apply the minimum consecutive time and sample constraints on the remaining data to be regarded as meaningful. For example, if the minimum sample constraint is set to 5, we do not count three consecutive samples followed by a missing data point because the sequence is too short to be evaluated. In this study, we applied the former technique because it achieved better results [9]. The diagram of the preprocessing technique is shown in Figure 4.

C. FEATURE EXTRACTION
We extracted time and frequency domain heart rate variability features from the segmented time windows. We used Marcus Volmer's toolbox [42] which is implemented in MATLAB. We selected the features which are commonly used in the previous works related to heart rate variability [9], [41] and [43]. In order to compute the frequency domain features, the RR intervals are interpolated using 4Hz cubic spline interpolation, because RR intervals are unevenly sampled. We applied FFT (O(n log n)) to the interpolated windows. FFT is used to compute the discreate fourier transfrom (DFT). The DFT is obtained by decomposing a sequence of values into components of different frequencies. The computed features are shown in Table 2. The total number of extracted features is 11 for each window. For further references for HRV features we encourage readers to read [16]. These features are commonly used in HRV based applications in many domains [6], [19], [44], [44]- [47].

D. FEATURE SELECTION
We applied correlation-based feature selection (CBFS) which is implemented in Weka toolkit [48]. The importance of the features are shown in Figure 7. We report the best results gathered from feature selection. We conducted experiments 1 to 11 features. We achieved best results with 11 features.

E. HANDLING CLASS IMBALANCE
Since some of the windows are deleted due to improper placement of the devices or heavy movements. There is a class imbalance between participants. We applied the majority class subsampling to equalize the number of windows for each participant. This method is the most commonly used one in the literature [49].

F. MACHINE LEARNING
We used the machine learning classifiers shown below; • k-Nearest Neighbour (kNN) is a lazy-instance based classifier, it computes the distance of an instance to the training feature space, by using plurality voting assigns class membership [50].
• Random Forest (RF) constructs a multitude of decision trees at training time and outputting the class by computing the mode of the classes [51].
• Multi-Layer Perceptron (MLP) is a neural network classifier, by using backpropagation learns to classify instances [52].  • Logistic regression is a classification algorithm used to assign observations to a discrete set of classes [52].
• N Naive Bayes is a classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class. The class with the highest probability is considered as the most likely class [53]. The implementation of the classifiers which are available in the Weka Machine Learning software [48].
We fine tuned the parameters for different classifiers. The best performing feature set are as follows: N selected as 3 for the kNN, the number of trees is selected as 100 for the random forest and the hidden layer is selected as 1 and hidden unit as 5 for the MLP as shown in Figure 8. We created a binary authentication model for each user. The selected user's label is set to 1 and others as 0.
We applied 10 fold stratified cross-validation (the distribution of class labels are equal in each fold) for evaluating our system and fine tuned the parameters where 90% of the dataset is used for training and the rest is used for testingv by changing the folds. In order to evaluate the effect of more challenging train/test splits, we furthermore divided the dataset to 70% training and 30% test sets.

G. EVALUATION METRICS
In order to present the results of our authentication system, we provide the performance metrics used in the literature [8], [14], [20], [21], [36] . In authentication systems, there are two types of error which are False Acceptance Rate (FAR) and False Rejection Rate (FRR). These errors are depend on selection of the threshold which can be between 0 and 1 for the ML classifiers. A smaller value will cause a low FAR but high FRR. The point of equilibrium is important for such a system. This point is called Equal Error Rate (EER). The definitions are provided below: • False Acceptance Rate (FAR): It is the ratio of false acceptance divided by the total attempts.
• False Rejection Rate (FRR): It is the ratio of denied legitimate attempts to the total number of attempts.
• Equal Error Rate (EER): The common value when FRR and FAR are equal, is called EER [54].

V. DATA COLLECTION
In this section, we describe the data collection in real life and the ethics procedure. We collected physiological data from 28 people in controlled real-life settings, during a summer school for teachers. All of the participants are healthy teachers who have no prior medical condition. Before the data collection, subjects received and filled a consent. The gender of participants are 16 male and 12 female, the ages are between 25 and 35. The data collection procedure is shown in Figure 9. The duration of the total data collection is 110 minutes. The dataset has a baseline (20 minutes), lecture (40 minutes), free-time (10 minutes), examination (20 minutes) and recovery session (20 minutes). We did not use the free-time session which might create a bias on the results. The reason that we had these different scenarios is to create a daily life sequence. A system should take different emotional states into consideration, because HRV can be affected by valance and arousal. During the free-time participants were allowed to take a break from the lecture. We applied our implementation of Trier Social Stress Test [55] (TSST) which is frequently used for inducing stress. We selected questions from the mathematics Olympics (which is very hard for the normal population). We told the subjects that this is a test for measuring their intelligence, and we said that a moderate person achieves at least a 75% score. Subjects participated in every session and they did not know the objective of the study. The physiological data is gathered with different brands of commercial smartwatches (8 Empatica E4, 3 Samsung Gear S and 17 Samsung Gear S2). The total size of the dataset is 1.17GB. Each window is 2 minutes. The total number of subjects is 28. We recorded 110 minutes length sessions for each participant. The number of RR-intervals varies for each participant. The average number of RR-intervals is 7840. Ethics: The procedure of the methodology used in this study is approved by the Institutional Review Board for Research with Human Subjects of Bogaziçi University with the approval number 2018/16. Prior to the data acquisition, each participant received a consent form which explains the experimental procedure and its benefits and implications to both the society and the subject. The procedure was also explained vocally to the subject. The data collection procedure and all of the interventions in this research fully meet the 1964 Declaration of Helsinki [56]. The data is stored anonymously.

VI. EXPERIMENTAL RESULTS AND DISCUSSION
We examined the results in two different subsections. In the first one, we presented and evaluated the authentication results of different devices and the whole system performance. In the second part, by applying a signal data quality filter, we improved the performance of the system.

A. EFFECT OF DEVICE TYPE ON THE BIOMETRIC AUTHENTICATION PERFORMANCE
EER results for all 28 subjects are given in Table 3. These results are calculated by one vs. all tests for all subjects. Average EER results for four different classifiers are presented in Table 4. We also added the device type and average data quality columns to this table. Data quality presents the non-interpolated percentage of the data after the removal of artifacts. As an example, if the average data quality is  70%, the remaining 30% of the data is interpolated. Data quality along with the device type affects the EER results significantly (see Figure 10). We achieved the best performance with Gear S as 98.48% and 3.96% EER. This might be due to low sample size. The selection of classifier has also an important effect on the EER results. For example, Empatica E4 achieves 19.43% EER with kNN and 6.77% with RF classifier. The best classifier is selected as RF in terms of EER. Design of the watch strap as shown in Figure 1, PPG sensor quality, built-in processing algorithms of devices might be the factors for the difference in EER results.

B. EFFECT OF DATA QUALITY CONSTRAINT FILTER
In daily life, seamless wrist-worn devices can get noisy signals, which drops the quality of the derived features. It is not possible to collect high-quality data all the time during a day because of various reasons such as high activity level and 59408 VOLUME 8, 2020 improper use of smartwatches. After observing that the data quality has a major effect on the authentication performance, we applied a data quality constraint on our data. Suppose that the data quality of a device is 50%. This means that the other half of the data is obtained by synthetic cubic interpolation data. Therefore, we expect that when the data is compared with other participants' data, it could not be discriminated, because it lost most of the unique characteristics of the PPG data. In Figure 10, we evaluated the effect of a quality threshold on EER. As we investigate the EER results of different device types, Samsung Gear S gives the smallest error rate 3.796% when compared to Empatica E4 (6.77%) and Samsung Gear S2 (13.66%) in low data qualities. As the quality increases, while the error rates of Samsung Gear S (2.67%) and Empatica E4 (4.4%) decrease at 95% quality threshold, Samsung Gear S2 is unable to show the same progress and eventually reaches 18.557% equal error rate. None of the windows of Samsung Gear S2 has a higher than 95% quality. The performance evaluation shows that the proposed system can effectively authenticate with small and consistent error rate which makes it reliable.

VII. CONCLUSION
We proposed a scalable, unobtrusive and seamless continuous authentication system with commercial grade smartwatches and smartbands. We showed that HRV gathered from commercial grade smartwatches is a strong candidate for implicit continuous authentication. We collected physiological data from 28 participants and demonstrated the EER measures for each of the participants in a real-life scenario. We proposed state-of-the art preprocessing for signals coming from reallife data with artifacts due to the physical construction of the smartwatches. We achieved promising results by using our system (4.4% EER with Empatica E4). We showed the effect of different smartwatches. The selection of the classifier for the proposed system is very important. We applied feature based signal processing along with machine learning classifiers (kNN, RF, MLP, Logistic Regression and Naive Bayes). Even-though, Gear S2 is a newer model of Gear S, due to its leather strap, the signals coming from the heart rate monitoring unit contained higher amount of artifacts, therefore it affects the overall quality of the RR intervals and the authentication system's performance. For the authentication systems based on PPG sensors, sport straps can be a better choice, as shown in Figure 1. We showed that HRV can be used for continuous authentication without interrupting the activity of the user. We applied a signal removal procedure by using the overall RR interval quality measure, a higher quality leads to better performance after 80% quality threshold. The performance of the scheme varies between individuals. This conclusion is aligned with the literature [8], [14]. The minimum required amount of recording to apply our system for authentication is 2 minutes, once that is satisfied, authentication can be validated in seconds thanks to the sliding window approach. It logouts the user, once he/she leaves off the watch. Our system can be implemented on any wrist-worn device which can provide RR intervals without a need for the raw PPG. The proposed methodology can be used with various applications requiring continuous authentication.
This study also has some limitations. The performance of the system on the data coming from different days is still unknown. For a better accuracy assessment larger sample size is required. The sample coming from the different device brands are not same. The perception of the stimuli may affect the HRV metrics. Therefore, this system should be examined in different conditions in daily life. We halved the time required the work by the Vhaduri and Poellabauer [14]. However, the required 2 minutes might create problems in terms of user-friendliness and usability.
As future works, we plan to apply our system completely in the wild settings with more participants and longer physiological recordings and show the performance of the framework. All of the evaluations are done in the same context, therefore in different types of contexts, the system might achieve better performance. Different types of features coming from other domains might be useful, for example bispectrum or timefrequency [57], [58]. Wearable sensors creates new opportunities for authentication systems. Physiological signals that are easy to acquire can be also examined.
DENIZ EKIZ received the M.S. degree from the Department of Computer Engineering, Boğaziçi University, Turkey, in 2019, where he is currently pursuing the Ph.D. degree. His research is focused on the health-related applications of wearable technology.
YEKTA SAID CAN studied Computer Engineering in the Faculty of Engineering, Boğaziçi University, İstanbul, Turkey. He received the B.Sc. and M.Sc. degrees from Boğaziçi University, in 2012 and 2014, respectively, while working as a Researcher at TUBITAK BILGEM for two years, and the Ph.D. degree from the Department of Computer Engineering, Boğaziçi University, in 2020. He has also worked as a Teaching Assistant in there for six years. He is currently working on retrieving information from Ottoman population registers by applying computer vision methods in Koç University as a Postdoctoral Researcher. His research interests include watermarking, speech and speaker recognition, physiological signal processing, and machine learning.
YAGMUR CEREN DARDAGAN is currently a Senior Student with the Department of Computer Engineering, Boğaziçi University, Turkey. Her research interests include physiological signal processing, machine learning, and pervasive health applications.
CEM ERSOY (Senior Member, IEEE) received the Ph.D. degree from Polytechnic University, New York, in 1992. He was an Research and Development Engineer with NETAS A. S., from 1984 to 1986. He is currently a Professor of computer engineering with Boğaziçi University, Turkey. He is also the Vice Director of the Telecommunications and Informatics Technologies Research Center, TETAM. His research interests include wireless/cellular/ad-hoc/sensor networks, activity recognition, and ambient intelligence for pervasive health applications, green 5G and beyond networks, and mobile cloud/edge/fog computing. He is also a member of the IFIP. He was the Chairman of the IEEE Communications Society Turkish Chapter for eight years. VOLUME 8, 2020