Multimodal EEG and Keystroke Dynamics Based Biometric System Using Machine Learning Algorithms

Electroencephalography (EEG) based biometric systems are gaining attention for their anti-spoofing capability but lack accuracy due to signal variability at different psychological and physiological conditions. On the other hand, keystroke dynamics-based systems achieve very high accuracy but have low anti-spoofing capability. To address these issues, a novel multimodal biometric system combining EEG and keystroke dynamics is proposed in this paper. A dataset was created by acquiring both keystroke dynamics and EEG signals simultaneously from 10 users. Each user participated in 500 trials at 10 different sessions (days) to replicate real-life signal variability. A machine learning classification pipeline is developed using multi-domain feature extraction (time, frequency, time-frequency), feature selection (Gini impurity), classifier design, and score level fusion. Different classifiers were trained, validated, and tested for two different classification experiments – personalized and generalized. For identification and authentication, 99.9% and 99.6% accuracies are achieved, respectively for the Random Forest classifier in 5 fold cross-validation. These results outperform the individual modalities with a significant margin (~5%). We also developed a binary template matching-based algorithm, which gives 93.64% accuracy 6X faster. The proposed method can be considered secure and reliable for any kind of biometric identification and authentication.


I. INTRODUCTION
Numerous advancements in technology have enabled realtime applications through remote access in banking, healthcare, military, retail, enterprise, law enforcement, and many other sectors.However, in such sensitive applications, security is a vital aspect, which usually comes at the cost of convenience.Over the years, many types of authentication tokens have been established, such as a shared secret token, look-up token, out-of-bound token, event-based one-time password (OTP) token, cryptographic token, biometric token, and hybrid/multi-modal token [1].Biometric systems play an important role in the identification and authentication infrastructure.Biometric systems utilize the unique physiological and behavioral characteristics of an individual to generate authentication tokens.Nevertheless, there are several vulnerabilities in these methods due to the use of the same password for multiple accounts, brute force attacks, key logger attacks, etc.Even biometric systems are susceptible to spoofing attacks through fabrication or falsification.This is done by using photos for face recognition, recordings for voice recognition, lift tape for fingerprints, and high-resolution pictures for iris scanners, etc.
The biometric authentication systems that rely on Electroencephalogram (EEG) signals from the brain extract unique behavioral patterns to identify the user.EEG signals are obtained by recording electrical wave patterns at different parts of the brain, which is done by capturing the EEG signals from the brain using invasive or non-invasive techniques.The invasive technique, also considered as an intracranial EEG or Electrocorticography (ECoG) requires surgery to implant special needle electrodes inside the brain while the noninvasive technique simply requires placing small electrodes on the scalp [2].Such systems have many benefits: (i) they are robust against forgery and spoofing attacks as the signal acquisition is currently impossible through a remote interface and the signal depends on the state of the mind; (ii) the EEG signal has been shown to possess low variance within an individual and high variance among a group of individuals; (iii) it exists in all living people; (iv) different working states can be gathered which can be valuable for restricting authentication in the non-natural state; (v) EEG can be collected spontaneously from users without certain specific actions.However, authentication systems that solely rely on EEG signals achieve low accuracy and are prone to instability over time using multi-channel signals [3].On top of that, EEG has a low spatial resolution on the scalp, and a weak signal-tonoise ratio (SNR) [4].The emotional states at different sessions play a crucial role in EEG-based biometric systems.Arnau-González et al. [5] reported that identification accuracy can vary greatly due to different emotions.
Keystroke dynamics is a reliable and easy-to-implement biometric system that does not require additional hardware to acquire data.Keystroke dynamics analyses a user's typing patterns by recording strokes and gaps between keyboard inputs.The data are then matched with a learned pattern from the manner and rhythm of the authentic user.Keystroke dynamics produce distinctive features for each individual and can produce a very high identification rate.However, it is prone to spoofing attacks through falsification as it can be easily seen during typing and the typing pattern can be easily mimicked later.
In this paper, we propose a novel multi-modal authentication and identification system that combines EEG and keystroke dynamics with robust machine-learning algorithms.Unlike the previously proposed multimodal authentication systems, our model integrates EEG signals and keystroke dynamics into the most widely used PIN/password infrastructure.To the best of our knowledge, this is the first attempt to combine EEG with keystroke dynamics.The EEG data and keystroke dynamics were collected simultaneously while a user was typing a specific password.The EEG signals were acquired using a portable 5 channel Emotiv Insight EEG system, which is user-friendly, easy-to-setup, and a comparatively cheap EEG solution.The dataset was created with different age and gender groups to fairly evaluate the proposed multimodal system.Such a dataset is also a contribution of this research, as there are no publicly available datasets with these two modalities.Several machine learning techniques were implemented on the fused multimodal dataset, and the top 7 best-performing algorithms were evaluated.Furthermore, we also evaluated generalized and personalized models on the individual modality and the fused modalities.While the generalized models were a multi-class problem, the personalized models were trained as a two-class classifier for each user.Finally, a real-time framework with a graphical user interface (GUI) including a template matching approach was also developed.
The paper has three major contributions.Firstly, we collected a dataset combining EEG and keystroke at 10 different sessions from 10 subjects of different ages and genders.To the best of our knowledge, this is the first attempt to build this type of dataset.Secondly, we established a robust algorithmic pipeline that achieves very high accuracy for the combination of EEG and keystroke and this algorithm overcomes the limitation of individual modalities.Thirdly, we proposed a method to build efficient binary templates for fast and secured authentication in real-time.We tested our dataset and algorithm in different scenarios like closed set generalized and personalized classification for all the modalities (EEG, keystroke, and the combination of EEG and keystroke) with different feature sets and augmentation techniques.We also reported the Cumulative Matching Characteristic (CMC) curve for template matching in identification scenarios and the Equal Error Rate (EER) with Receiver Operating Characteristic (ROC) curve for authentication scenarios where our method achieved better performance for the fusion of EEG and keystroke than individual modality.
The rest of the paper is organized as follows.In Section II, the related works of this research domain are presented with comparative advantages and disadvantages.In Section III, the methodology for the proposed multimodal system, which includes experimental setup, data preparation, and processing are described.The experimental results are presented in Section IV and discussed in Section V. Finally, we leave some concluding remarks in Section VI.

II. RELATED WORKS
Several works have been done using only EEG for biometric authentication.Bai et al. [6] developed a system using the EEG signals related to visual evoked potential combined with different feature reduction techniques like Genetic Algorithm, Fisher Discriminant Ratio, and Recursive Feature Elimination and achieved an accuracy of 97.25% using Support Vector Machine (SVM).Phung et al. [7] used Shannon's Entropybased features and achieved an accuracy of 94.9%.Gui et al. [8] used Euclidean and Dynamic Time Warping (DTW) based matching technique and achieved an accuracy of 80% and 68% respectively.Zhang et al. [9] used single-channel EEG to identify 46 subjects using Rayleigh quotient feature selection and ensemble classifier with an accuracy of 95.48%.Yang et al. [10] proposed a method that utilizes the logarithm of wavelet coefficients and discrete cosine transform.Ruiz-Blondet et al. [11] utilized a discriminant function-based classifier on the event-related potential (ERP) of 50 users to achieve an accuracy of 100% in the identification scenario.Nakamura et al. [12] used a novel user-friendly EEG biometric system that can be placed inside the ear and used combined power spectral density (PSD) and autoregressive coefficients (AR) with linear discriminant analysis (LDA) to achieve an accuracy of 87.2%.Fraschini et al. [13] used functional connectivity of the brain to identify persons.Moctezuma et al. [14] designed an ERP-based identification system where they proposed a four objective optimization method and achieved 97.02% accuracy.Sabeti et al. [15] compared different feature sets and showed that correlation and spectral coherence-based features produce the best results with an accuracy of 97.36% and 97.08% respectively, for resting-state EEG in the verification scenario.However, most of the works discussed above used EEG systems that require a large number of channels like 32, 64, or 128, which is bulky, requires an hour or more to set up, and computationally expensive.
Several studies used keystroke dynamics to improve the verification and authentication process.Wang et al. [16] proposed a method named differential evolution and adversarial noise-based user authentication (DEANUA) for reducing equal error rate (EER) in smartphone-based keystroke biometrics.They extracted 146 features and selected the best features from that set using differential evolution.They also tried to increase the robustness of their system by synthesizing adversarial noise samples.Saevanee and Bhatarakosol [17] proposed a user authentication system that uses key-hold time, inter-key duration, and finger pressure as features and achieved 1% EER using finger pressure.Giot et al. [18] proposed a keystroke dynamics-based user authentication method that uses the hold time of a key, interkey time as features.
Several researchers have explored the multi-modal authentication system by combining EEG and other biometrics to add an extra layer of security.Saini et al. [19] and Kumar et al. [20] combined users' signatures with the EEG signals generated during the signature for identifying the user.They used machine learning techniques to extract features from the handwritten signature and the simultaneously recorded EEG signals.Using their multimodal framework, they have achieved higher authentication accuracy compared to that of individual biometric systems.Wang et al. [21] proposed an authentication system where they combined event-related EEG signals with a facial recognition algorithm.They have extracted features from EEG and facial data independently and then used cross-correlation to fuse the modality and produce a combined identification system.The authors suggested that by implementing deep neural networks, they could improve the accuracy of the multimodal system.Zhang et al. [22] developed a multimodal authentication system based on a recurrent neural network (RNN), called DeepKey, where they used EEG and gait signals in parallel to identify the user.They achieved a low false rejection rate (FRR) and false acceptance rate (FAR) of 0% and 1% respectively.Krishna et al. [23] incorporated EEG signal acquisition along with eye movement tracking in an augmented reality (AR) and virtual reality (VR) headset for user authentication.Bashar [24] explored the combination of brain signals (EEG) and heart signals (ECG) for authentication using wavelet domain statistical features with multiple classifiers and achieved a 90.5% F1 score.Ibtehaz et al. [25] proposed a framework called EDITH, which performs competitively using just a single ECG heartbeat (96-99.75%accuracy) and can be further enhanced by fusing multiple beats (100% accuracy from 3 to 6 beats).

III. METHODOLOGY
The proposed multimodal authentication system incorporated the user's keystroke dynamics and brain signals as illustrated in Fig. 1.The system setup and data acquisition are described below:

A. SYSTEM SETUP AND DATA ACQUISITION
This study was conducted using a personal computer for keystroke dynamics acquisition and Emotiv INSIGHT headset with 5 sensors + 2 references, a Bluetooth-based wireless headset [26] for EEG signal acquisition.A specific password was asked to be typed in the keyboard for each user repetitively 50 times to capture keystroke dynamics, and EEG (brain) signals are captured using the Emotiv INSIGHT device non-invasively.The letter in the user input was used as the start and end markers for the pre-processing and classification of EEG signal can be carried out on each password-typing event.
Then the captured EEG signal is sampled at 128 Hz using Xavier Test Bench software from Emotiv that displays realtime EEG signal, sensors' contact impedance, and battery level.The users were instructed to wear the headset on their head while keeping their eyes open and active and not to be involved in any other cognitive tasks.In this headset, EEG electrodes are oriented based on an international 10-20 system.The entire recording session is monitored where the contact impedance is greater than 1 kΩ, but less than 10 kΩ [27] that ensures similar signal throughput from all channels.EEG signals are recorded from temporal, parietal, frontal, and occipital (AF3, AF4, T7, T8, PZ) lobes of the user to ensure that the system is capturing the different important lobes of the human brain.Keystroke dynamics and EEG data were collected from 10 users for 10 sessions (days) and 50 trials per session.The user was typing a specific password "qu-ELEC371" 50 times while the EEG and keystroke dynamics were recorded.Therefore, the EEG dataset consists of 5000 samples, where each user contributes 500 samples.Since the data were acquired on 10 different days and therefore, involuntary motion artifacts, random noises, and other subjects' psychological and physiological variability were present in the different sessions.It was noticed that in the different sessions, severe motion artifacts and noises were present in the dataset.In summary, it can be said that it is a well-representation of real-life data.
For the keystroke dynamics, the raw data consisted of a time-trail of keystrokes for the characters of the password.An in-house written code was used to captures different time points of keystrokes in-synchronization to the EEG signal acquisition and extracted the required features (durations) from the timestamps.The study was approved by the ethical board of Qatar University (QU-IRB) and written consent was obtained from participants to join in the research.There were five male and five female subjects (mean age: 25 years, standard deviation: 20 years) who participated in this study.

B. DATA PREPROCESSING
EEG signal is inherently very weak in amplitude and despite different efforts to keep the acquisition environment quiet and the EEG signal motion artifacts minimum, some noises and signal deviation were observed in the recorded EEG signal.Therefore, the raw EEG signal was undergone several pre-processing steps to mitigate such errors.The signal processing steps are outlined below: Baseline correction: The baseline drift introduces unwanted amplitude shifts in the signal and this leads to poor performance in the classification model.A 6-degree polynomial curve fitting algorithm was used to approximate the baseline and it is subtracted from the raw signal to correct the baseline.
Filtering: Bandpass filtering of the EEG signal is very important to ensure the band-limited EEG signals are used for further analysis.The bandwidth of the Emotiv Insight EEG system is 0.5-63 Hz and therefore, a band-pass infinite impulse response (IIR) filter with a cutoff frequency of 0.5 Hz and 63 Hz was used for removing any potential unwanted noises.
Segmentation: As mentioned earlier, the starting and ending time-stamp of a password entry were marked in the EEG data.In-house-built MATLAB code was used to automatically identify the start and end markers and extract the EEG signal segment corresponding to a password entry.
Resampling: Different users take a different amount of time to type a full password and also typing duration can vary from trial to trial.Therefore, segmented EEG signal length can vary for different trials.However, the feature extraction process and machine learning models require equal sample EEG length for any trial independent of session or subjects.Therefore, all the signal samples are resampled to 1024 samples.This signal length is determined by observing the mean signal length in the entire dataset.

C. FEATURE EXTRACTION
Since the EEG signals and keystroke data were collected for 10 subjects while typing the password 50 times in 10 different sessions, it was possible to mimic the inter-session variability for the same subject while providing enough training data for machine learning models.For efficient training of classifiers and to extract meaningful information from raw data, feature extraction is necessary.Multimodal features (from keystroke dynamics and EEG signals) were extracted and used to train different machine learning models to identify the user in generalized and personalized experiments.

1) KEYSTROKE DYNAMICS
Four different features were calculated from the keystroke timestamps: key-hold time (H), key-down to key-down time (DD), key-up to key-up time (UU), and key-up to key-down time (UD) for each keypress, as shown in Fig. 2. The unique password used for all users was "qu-ELEC371" to show that even though everyone will type the same password, the machine learning model can still authenticate the correct user.This resulted in a total of 500 samples per user over multiple sessions.Each sample contains data for 12 keypresses including two caps lock (⇧) (one for activating capital letter mode and another for deactivating).Therefore, for each user, there are 45 features (11 features for UU, DD, and UD each, and 12 features for H).Fig. 3 illustrates and compares the key-hold time (H) features for two random subjects.As seen from Fig. 3, the H features may be stable for one subject (Subject B), while the H time for each keypress may vary for another user (Subject A).It was also noted that the special character '-' had large standard deviations for most users.This is most probably because it is

FIGURE 2. Features of keystroke dynamics, representing key-hold time (H), key-up to key-up time (UU), key-down to key-down time (DD), and key-up to key-down time (UD). FIGURE 3. Comparison of key-hold time (H) for two random subjects.
a less frequently used character on the keyboard.The typing speed and typing rhythm of each user resulted in a unique set of features for each subject.

2) EEG
For EEG signal, 13-time domain, 10 frequency domain, and 18 time-frequency features based on the feature reported in the literature were calculated for each channel and a feature called Signal Magnitude Area Total (SMAT) was calculated combining five channels.Therefore, the total number of features from the EEG signal is 206, where five times 41 plus one SMAT feature.The range of values in different channels of raw EEG signals can vary greatly due to several reasons.So, to bring all the features tothe same range, all the features were normalized to a value between 0 and 1.All the features with their definitions and relevant formulas are shown below in TABLE I.

Mean
Mean is the sum of all data divided by the number of entries.
is the i th data sample and n is the total number of samples.

Median
The median is in the middle data of the ordered set of data.

Standard Deviation
Standard deviation is the measure of variability and consistency of the sample.

Mean Absolute Deviation (MAD)
Mean Absolute Deviation (MAD) is the average distance between the mean and each data value.
25th Percentile 25th Percentile is the data value at which the percent of the value in the data set is less than or equal to 25.
75th Percentile 75th Percentile is the data value at which the percent of the value in the data set is less than or equal to 75.

Inter Quartile Range (IQR)
Inter Quartile Range (IQR) is the measure of the middle 50% of a data set.
Here, Q1, Q3 are the first and third quartiles, respectively.

Skewness
Skewness is the measure of the lack of symmetry from the mean of the dataset.

Kurtosis
Kurtosis is the pointedness of a peak in the distribution curve, in other words, it is the measure of the sharpness of the peak of the distribution curve.

Hjorth Activity
Hjorth Activity represents the variance of the amplitude of the signal.It is part of Hjorth Parameters introduced by [28].
It can indicate signal power.
Here, () is the the time-varying amplitude of the signal.
Hjorth Mobility Hjorth Mobility represents the mobility of the signal, which can indicate mean frequency.From the power spectrum, the

D. FEATURE SELECTION
To avoid overfitting and to get rid of irrelevant and superfluous features, feature selection is necessary.Therefore, the correlation coefficients between the features were calculated and highly correlated features (rxy > 0.95) were removed from any correlated pair.After removing highly correlated features, the remaining features were ranked according to the importance score obtained from the feature ranking model.Random Forest is an ensemble of decision trees, which was used for the feature ranking.It calculates feature ranking using "Gini impurity" or "mean decrease impurity" [29].The importance scores of a feature at all the nodes of a decision tree are calculated and then averaged over all trees.In a decision tree with child nodes, while partitioning at node m, the decrease in impurity at node m can be calculated by the following equation, Here, ∆  is the decrease of impurity at node m,   is the impurity value at node m,   is the weighted number of samples at node j,   is the impurity value at node j, and a is the total number of child nodes.Impurity at node m is proportion of standard deviation of power can be also inferred from this feature.It is also part of Hjorth Parameters.
Here, x'() is the 1 st derivative of the amplitude of the signal.

Hjorth Complexity
Hjorth Complexity represents the complexity, which can specify the change of the signal frequency.It is also part of the Hjorth Parameters.

Shannon's Entropy
Shannon's Entropy measures the degree of randomness in a set of data and higher entropy indicates greater randomness, and lower entropy indicates lower randomness.

Spectral Entropy (SEN)
Spectral Entropy (SEN) is the normalized Shannon entropy that is applied to the power spectrum density of the signal.
Here,   = the spectral power of the normalized frequency, N = the number of frequencies in binary.

Second Max Frequency (M2F)
The second Maximum Frequency (M2F) is the frequency at which the 2 nd maximum amplitude of the frequency spectrum occurs.

Second Max Frequency Amplitude
The second Maximum Frequency Amplitude is the 2 nd maximum amplitude of the frequency spectrum.

Second Max Relative Energy
Second Max Relative Energy is the summation of energy between M2F+ to M2F-.Then the summation was normalized by total energy below the cutoff.Here,  has a constant value of 5 Hz.

Average Band power
The average band powers of alpha, beta, gamma, theta, delta, and raw EEG signal.For this purpose, at first raw EEG signal is decomposed into alpha, beta, gamma, theta, delta bands, and then 6 band powers of EEG bands including the band power of raw EEG were computed as features.

Wavelet Features
Discrete wavelet decomposition was used to decompose the raw EEG signal into 18 frequency bands.Then average band powers of all of those bands were calculated.Here, order 8 Daubechies wavelet is used as a mother wavelet.14) Here, ψ() is the mother wavelet that is shifted at j and scaled at k and () is the original signal.calculated by "Gini Impurity Index" which is defined by the following formula, Here,  is the total number of classes, () is the probability of selecting a data-point of class q.Then the importance of a feature xk is calculated by averaging the decrease of impurity at all nodes over all trees as shown in the following formula, Here, ∆  is the decrease of impurity at node m,   is the weighted number of samples at the m th node, N is the total number of nodes where the feature   appears, and T is the total number of trees.
After ranking the features according to their importance, the performances of the classifiers were evaluated with respect to the number of features, and the performance saturation and over-fitting were observed to identify the best feature combinations.The same procedure was followed for the keystroke data.

E. IDENTIFICATION AND AUTHENTICATION 1) IDENTIFICATION BY CLASSIFICATION
In the identification setup, we classified 10 subjects using EEG, Keystroke, and the combination of EEG and Keystroke data.This type of classification is also known as generalized classification.We validated and tested several machine learning classifiers for identifying 10 subjects.A 5 fold random stratified cross-validation experiment was conducted to validate the proposed method.In this experiment, the data were split into 5 folds and the models were trained, validated, and tested 5 times.During each training time, the proportion of train and test set was 80:20.20% of the training set was used for validation.In this manner, the train, validation, and test sets are changed alternatively 5 times.The reported performance metrics are average accuracy (A), precision (P), sensitivity/recall (R), and F1 score (F1) of the test set.

2) AUTHENTICATION BY CLASSIFICATION
In the authentication setup, we set the problem as a two-class (genuine, imposter) classification problem.For this purpose, we have treated one subject as the genuine class and the other nine as the imposter class.This scheme of the dataset was also categorized into train, validation, and test folds using 5fold cross-validation.In this manner, we have tested all the subjects and reported the average test scores.This setup is also called personalized classification.While treating one subject as genuine and others as imposters, a class imbalance problem arises which reduces the performance of the classifiers.As there are very few training samples of the minority class, the classifier struggles to learn and detect the minority class.To solve this problem, we applied several data augmentation algorithms on the training data, such as Jitter, Time Warping (TimeW), Synthetic Minority Oversampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN).Brief descriptions of these algorithms are given below: Jitter: In Jitter augmentation, we tried to simulate the addition of noise in the synthetic data.For that purpose, a small amount of Gaussian noise was added without altering the actual labels [30].So, the Jitter augmentation of a signal can be defined by the following equation, Here,   ′ is the augmented n th data point,   is the actual n th data point of a signal segment,   ∼ (0, ) is a random noise point drawn from the Gaussian distribution of 0 mean and standard deviation of .
Using this strategy, we synthesized data for the genuine class to make the samples equal to the imposter class.The noise was added equally in 5 channels of the raw EEG data.Here, we used a Gaussian distribution of 0 mean and standard deviation,  = 0.05 to generate noise samples.It should be noted that this type of augmentation is only applicable to raw EEG data.

Time Warping (TimeW):
TimeW is an augmentation technique that randomly modifies the temporal position of a signal [31].It simulates the variation of the temporal location of an event in a time window.At first, data was segmented into N temporal slices (where N varies from 2 to 4) and a random slice was chosen.Then the signal in that time range is randomly shifted followed by a random compression or expansion.TimeW can be defined by the following equation, Here,  ′ () is the augmented signal, () is the actual signal segment,   ∼ (1, ) is a random noise point drawn from the Gaussian distribution of mean = 1 and standard deviation,  = 0.05.Fig. 4 shows the comparison of Jitter and TimeW.

Synthetic Minority Oversampling Technique (SMOTE):
Synthetic Minority Oversampling Technique (SMOTE) is an oversampling technique that increases the minority class samples by synthesizing new samples on the line drawn between several random existing samples [32].In order to generate new samples, at first, this algorithm randomly selects a minority sample and its k nearest neighbors.Then it draws lines between the selected samples on the feature space.Finally, new examples are generated along those lines.

Adaptive Synthetic Sampling (ADASYN):
To increase the minority samples, we used another technique called Adaptive Synthetic Sampling (ADASYN).This algorithm simultaneously focuses on the region where the density of the minority class is low and harder to learn [33].Unlike SMOTE, which generates samples randomly, ADASYN generates new samples adaptively in the mentioned regions and this strategy gives better performance.SMOTE and ADASYN both works on the extracted feature and not on the raw signals like Jitter and TimeW.Fig. 5 shows the comparison of different augmentation techniques applied to feature space.

3) IDENTIFICATION BY TEMPLATE MATCHING
For real-life identification, we analyzed our system's performance using a template matching method.In this method, at first, we calculated some binary templates from the extracted feature vectors of EEG and Keystroke data.The binary templates are suitable as they enable us to perform very quick Hamming distance calculation for template matching.It also allows various template encryption methods for security [34], [35].To quantize the real feature values in the range [0, 1], we have calculated a binary template, () of a feature  using the following equation, Here,   is the real value of feature ,   = (  ), where   is the threshold for feature  and   is the list of all values of feature .
After quantization, the train sets were considered as the gallery and test sets as the probe.For each subject in the gallery, there were several samples and all the samples were grouped.Thus, there were 10 groups for 10 subjects.Then for each sample in the probe, Hamming distances were calculated between the probe and all the samples of a gallery group.Then the lowest Hamming distance was selected from all the distances and later placed in a matching matrix.This procedure was repeated for 10 groups.Finally, a matching matrix of P×G was constructed, where P is the number of samples in the probe and G is the number of groups in the gallery.This matrix can be illustrated by the following equation, After constructing the matching matrix, all the elements in a row were sorted in an ascending manner according to their values and it was repeated for all rows.Then the matching scores were replaced with their corresponding subject index.Finally, the matrix was evaluated using the Cumulative Match Characteristic (CMC) curve and Rank (N) recognition rate [36].So, for each query in the probe set, an algorithm ranked all the gallery samples according to their Hamming distances in the ascending order, and the CMC Rank(N) identification rate is measured as follows, Here,  is the total number of samples in the probe set.Using this formula, identification rates for  = 1, 2, 3,…,10 were evaluated and a graph of identification rate vs rank was drawn.Finally, the identification rates for Rank(1), Rank(2), and Rank(3) were reported.

4) AUTHENTICATION BY TEMPLATE MATCHING
In the authentication scenario, there are only two classes (genuine and imposter).So, the matching matrix has only 2 columns.In this setup, a sample in the probe set is considered genuine if its genuine matching score is less than the imposter's matching score.In this manner, predictions for probes are generated.Finally, this method is evaluated using False Acceptance Rate (FAR), False Rejection Rate (FRR), and Equal Error Rate (EER).Here, FAR and FRR are defined by the following formulas,

𝑭𝑹𝑹 = 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒂𝒄𝒄𝒆𝒑𝒕𝒆𝒅 𝒊𝒎𝒑𝒐𝒔𝒕𝒆𝒓𝒔 𝑻𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒊𝒎𝒑𝒐𝒔𝒕𝒆𝒓𝒔 (𝟐𝟒)
For different threshold values, FAR and FRR are calculated and two curves of FAR vs threshold and FRR vs threshold are drawn.In that graph, the intersection point of two curves is observed and the error value in that point is considered as EER.So, EER is the value where FAR is equal to FRR.Finally, EER for each subject and average EER are also calculated.
All the experiments were carried out using a PC with Intel(R) Core(TM) i5-4200 CPU with a clock speed of 1.6 GHz, 4 GB RAM, and a Windows 10 professional 64 bit.All the pre-processing and feature extraction was done using MATLAB 2018b and classifiers were trained and tested using different Python libraries like Scikit-learn [37], NumPy [38], pandas [39], Matplotlib [40], etc.

F. SCORE LEVEL FUSION
Two separate classifiers were trained and tested for EEG and Keystroke data.The classifiers give probability scores for each class during prediction and the scores are summed using the following formula, Here,   (  = ) is the probability of assigning the identity  to the person in modality   .By using this formula the score for each class is calculated and the highest performing class is taken as the final prediction.This strategy can be also named "Late Fusion".Other fusion strategies likeearly fusion (feature fusion), Canonical Correlation Analysis (CCA), Discriminant Correlation Analysis (DCA), and multiplying the probability scores were also tested but they did not give satisfactory performance.So, the sum of the probability scores was taken as the best method.

IV. EXPERIMENTAL RESULTS
This section demonstrates the performance of different experiments conducted in this study.Firstly, the important feature selection results for EEG and Keystroke data were reported.The remaining section is divided into several subsections where we showed the experimental results achieved by our system in closed set identification and authentication.We showed the results of several classifiers in identification or generalized classification and then subject-wise authentication or personalized classification for all modalities.Then, the identification and authentication results using the proposed template matching method are also shown.Finally, a comparison of inference time for different methods is reported.

A. FEATURE IMPORTANCE
Fig. 6 shows the feature importance plot for EEG and Keystroke data.The most important feature for EEG is PZ_cD2, which is the band power of the level 2 wavelet decomposition of the PZ channel (parietal channel).It can be noticed that the other important features are also from wavelet decomposition (PZ_cD3, T7_cD2, PZ_cD4, etc.).For Keystroke, the most important feature is downdown8, which is the key-down to key-down time of character 8 or the key-down to key-down time between 'C' and 'Caps Lock' keys.Fig. 7 shows the performance vs the number of features for EEG and Keystroke data.It is observed that the performance starts to saturate at 54 features for EEG and at 34 features for Keystroke data.So, finally, 54 and 34 features are selected for EEG and Keystroke, respectively.

B. GENERALIZED IDENTIFICATION
Several machine learning algorithms were trained with a single modality and fused-modality for identifying 10 subjects.Stratified 5-fold cross-validation was used for performance evaluation.The test scores of 7 top classifiers for EEG with and without feature selection are shown in TABLE II.

EEG and Keystroke:
The achieved results using the fused modality are also shown in TABLE II.It can be seen from TABLE II that the RnF achieved the highest accuracy, precision, recall, and F1 score of 99.9%, 99.9%, 99.9%, and 99.9%, respectively without feature selection.But with feature selection, LDA achieved the highest accuracy, precision, recall, and F1 score of 99.6%, 99.6%, 99.6%, and 99.6%, respectively among all the classifiers.A comparison of fused modality with EEG reveals that fused modality achieved 4.1% higher accuracy, 3.9% higher precision, 4.1% higher recall, and 4.1% higher F1 score than EEG.Fusing the techniques marginally improved performance over Keystroke alone, and the result is improved by 0.4%, 0.6%, 0.6%, and 0.6% for accuracy, precision, recall, and F1 score, respectively.Therefore, it is evident from these experiments that the fused modality always performs better than the individual modalities.Fig. 8 shows the performance comparison of the classifiers using independent modality and fused modalities using a bar chart and the confusion matrix for the fused multimodal dataset.The model was able to identify subject 4 with the highest sensitivity and the most confusing subjects are subjects 3 and 5. Supplementary Fig. 1 shows the macro averaged Receiver operating characteristic (ROC) curve for the best classifier using independent modality and fused modalities.

C. PERSONALIZED AUTHENTICATION
Several classifiers were evaluated using stratified 5-fold cross-validation for classifying genuine and imposter classes in different modalities.Random Forest performed the best in terms of average accuracy, precision, recall, and F1 score.TABLE III shows the test scores for EEG, Keystroke, and a combination of them for all the subjects using the RnF classifier.

EEG:
From Table III, it can be noticed that the average accuracy, precision, recall, and F1 score without feature selection are 98.3%, 98.7%, 91.7%, and 94.6%, respectively.Using feature selection, the average scores remain almost identical.Though accuracy and precision remain high for all the classes, recall and F1 scores drop in some classes.This is the consequence of a huge class imbalance between the genuine and imposter classes.In this experiment, the ratio of genuine and imposter class is 1:9.As a result, the classifier fails to learn genuine class examples due to a small number of representative data.To get rid of this problem, we applied two types of data augmentation on the raw EEG signal and they are Jitter and TimeW.Supplementary   respectively.There is a significant improvement in sensitivity that can be observed after using SMOTE and ADASYN.For SMOTE and ADASYN, the improvement is 4.3% and 4.8%, respectively.It is evident that ADASYN performs the best among the augmentation techniques.In the case of individual subject scores, significant improvements can be seen in subjects 2, 4, 6, and 10.

Keystroke:
The performance metrics of Keystroke data for personalized classification is also reported in Table III.It can be seen from Table III that the average accuracy, precision, recall, and F1 score without feature selection are 99.3%, 99.7%, 96.7%, and 98.0%, respectively.The average performance metrics did not improve significantly using feature selection.
Hence, to increase the overall performance, different augmentations techniques were applied.SMOTE and ADASYN were applied on the Keystroke feature vectors and the results are reported in Table IV.It can be noticed from TABLE IV that the average accuracy, precision, recall, and F1 score with SMOTE are 99.5%, 98.9%, 98.1%, and 98.5%, respectively.On the other hand, the average accuracy, precision, recall, and F1 score with ADASYN are 99.4%, 98.9%, 97.5%, and 98.2%, respectively.The best average recall score is obtained by applying SMOTE augmentation which is 1.4% higher than without any augmentation.
EEG and Keystroke: Finally, we combined EEG and Keystroke and tested with and without augmentation.Table III shows the results of the fused modality with and without feature selection.It can be seen from Table III that a significant performance improvement in average metrics was achieved after combining EEG and Keystroke.The average accuracy, precision, recall, and F1 score without feature selection are 99.3%, 99.6%, 96.4%, and 97.9%, respectively.Feature selection did not significantly improve the performance.The sensitivity of fused modality is 4.7% and 4.4% higher than EEG alone for without and with feature selection, respectively.Due to the class imbalance, recall value was suffering here too.Thus, we applied different augmentations on the composite feature vectors.Table IV shows the results obtained after augmentation on the composite feature vectors of EEG and Keystroke data.The average accuracy, precision, recall, and F1 score with SMOTE are 99.6%, 99.2%, 98.7%, and 98.9%, respectively.On the other hand, the average accuracy, precision, recall,

D. IDENTIFICATION BY TEMPLATE MATCHING
As previously stated, a matching matrix is calculated for identifying 10 subjects and from that matrix, CMC curve, Rank(1), Rank(2), and Rank(3) recognition rates are obtained.Fig. 9 shows the CMC curve for EEG, Keystroke, and the combination of EEG and Keystroke.In comparison to EEG, this is 4.9% higher with all features and 10.04% higher with feature selection.In comparison to Keystroke, the improvement is even more (17.42%with all features and 17.86% with feature selection).Thus, it can be summarized that the performance is better with all features and EEG performs better than Keystroke.So, for template matching identification, a large number of features is necessary to achieve good performance.

E. AUTHENTICATION BY TEMPLATE MATCHING
In the authentication system, the performance is evaluated using EER and area under the ROC curve as shown in Fig. 10.Fig. 10   without feature selection.Thus, the best-detected subject is Subject 1 and the worst detected subject is Subject 2.

F. COMPARISON OF COMPUTATIONAL TIME
The average time required for pre-processing and feature extraction of one sample is 184.

G. REAL-TIME EVALUATION
In order to test the multimodal authentication system in a real-world scenario, we designed a GUI-based framework for the real-time processing of EEG signals and keystroke dynamics.The framework utilized the selected machine learning model trained with EEG and keystroke features.Fig. 11 shows the initial GUI for the authentication system.The first step in this system was to enter the User ID of the test subject and the Trials Count.The trial court was set by the test subject, which indicated the number of times the subject would enter the password.Depending on this value, the data would be segmented and pre-processed accordingly.Furthermore, additional trials resulted in better approximations of the features represented in the input data.The in-house python-based algorithm was used to detect the keypress and key-release intervals for each of the keys inputted by the test subject.The test subject would enter the password for the mentioned number of trials.The feature extraction and recognition will follow and the classifier response would be reported with accuracy.It is observed that for the same number of trials, the authentication and identification of each user can be done through this real-time system with very high accuracy and the accuracy of detection will reach 100% with more than 5 trials for most of the subjects.

V. DISCUSSION
The key findings, advantages, and disadvantages of this study are discussed below in detail.
The main objective of this study was to build a system that has better anti-spoofing capability (low false-positive rate or high precision) and also higher accuracy and recall (low false-negative rate).From every experimental result of the previous section, it is evident that the proposed multimodal setup has outperformed the individual modalities in terms of accuracy, precision, recall, and F1 score.So, it is clear that our proposed system has overcome the limitation of individual modalities: EEG (low accuracy and high falsenegative rate issue) and Keystroke (high false-positive rate).
An important objective of this research was to find out the most suitable fusion scheme for this type of signal or biometric system.From the analysis, it is evident late fusion (summing/multiplying the predicted probabilities of 2 different classifiers for 2 modalities) is the most suitable fusion scheme for this type of system.As EEG and Keystroke produced very different types of features, they are not suitable for any early feature fusion or concatenation.When two separate classifiers are trained on EEG and Keystroke, the classifiers become experts in their corresponding modality, and then in the late fusion stage, they can complement each other.Other fusion methods like -CCA and DCA produced 87.2% and 93.2% accuracy, respectively whereas our method produced 99.9% accuracy.
This research also aimed to build a more interpretable system so that we can understand which features are most important and which parts of the brain are most responsible for producing distinctive features during typing a word.From our analysis, it is evident that the parietal lobe produced the most important features for EEG because it contributes 3 important features (PZ_cD2, PZ_cD3, PZ_cD4) in the top 5 features in the feature ranking of Fig. 6.Most other important features of the top 10 features are related to the frontal lobe (AF3, AF4).From this analysis, it is evident that the area close to the motor cortex is producing important features because, during motor activity (such as typing), the brain area should be active in the motor cortex.Here, it is also evident that wavelet features at different levels are also playing important roles.For example, the wavelet decomposition at levels 2, 3, and 4 are cD2, cD3, cD4, and the top 9 features out of 10 are from these features.This indicates that not all but certain frequency bands of EEG contain biometric traits.As EEG is a non-stationary signal, a composite time-frequency analysis like wavelet analysis plays a crucial role.The reason is, wavelet analysis identifies not only the slowly varying components but also the abruptly changing components of a non-stationary signal.In any nonstationary signal, the frequency can change with respect to time.Wavelets can identify the frequency components in each time bin which is not possible for time or frequency domain alone.For this reason, a comprehensive analysis can be done and distinctive patterns of non-stationary signals can be identified by wavelets.The experimental results show that ensemble methods with bagging (RnF) and boosting (XGBoost) have performed well in most cases.In both cases, N learners are created from 1 learner.This method matches the ensembling of multiple modalities.That is why these classifiers are chosen and they show better performance.Other reasons are that both classifiers reduce variance and thus overfitting.In reducing overfitting, it was evident that bagging outperforms boosting.Both classifiers make the final decision by averaging the N learners but bagging applies an equallyweighted average but boosting applies a non-equally weighted average.Equally weighted average matches the score level fusion method and thus bagging performs better than boosting.
Although our proposed multimodal authentication system achieved very high accuracy and reliability with additional layers of security, there are still a few areas that can be improved.The models were trained on EEG and keystroke data from 10 different users.The performance of the model is yet to be tested on a rich dataset containing inputs from a large number of users.Another limitation of this study is the use of only one specific password for all users.This authentication system can be extended by implementing several distinct passwords into the multimodal dataset and learning algorithms.Even though EEG biometrics and keystroke dynamics have been integrated into an existing password/PIN framework, the requirement of the EEG headset may not appeal to finance or retail sectors.But with the advancement in technologies, it may be possible soon to integrate EEG biometrics into wireless portable earphones.For such cases, the proposed authentication system can be evaluated using a single channel or dual channel EEG data, along with keystroke dynamics from smartphones.This will enable faster adoption of the authentication system into mainstream devices.Finally, in this study, the features from the data were manually handpicked, and then machine learning techniques were applied to find a pattern in the dataset.This process can be automated using an autoencoder or convolutional neural network (CNN), which might enhance the real-time system performance from single-trial evaluation.

VI. CONCLUSION
In this paper, we demonstrated a multi-modal system that merged EEG and keystroke dynamics for user identification and authentication.We integrated our multimodal system into the widely used password/PIN-based authentication system, which reduces the inconvenience of the end-users as they do not need to adapt to a new authentication process.Using the features extracted from the input data, the behavioral pattern can be used to identify the user with a confidence level, and the password can be used for authorization.We implemented a variety of machine learning techniques on the pre-processed data and selected a model with the highest accuracy and other performance metrics.Furthermore, we trained separate tailored models for each user to further improve the identification performance of the multimodal system.The personalized models were used for classifying users as genuine or impostors.We also developed a fast binary template matching method for identification and authentication.To the best of our knowledge, this is the first implementation of EEG biometrics and Keystroke dynamics merged into a multimodal system for identification and authorization.This implementation provides additional layers of security through the EEG signal, which is difficult to mimic, and Keystroke dynamics, which captures the behavior and rhythm of the user's typing with high accuracy.Despite the benefits, we have discussed a few limitations of our system, which we will address in our future studies.Furthermore, we believe that our demonstration will encourage further studies in such multimodal systems, which can then be deployed to different sectors for real-world application.Accuracy Precision Recall F1 score

FIGURE 1 .
FIGURE 1. Flow diagram for the proposed data acquisition and multimodal authentication method for EEG and Keystroke dynamics.

FIGURE 4 .
FIGURE 4. Comparison of different types of augmentation applied to an EEG segment of channel AF3 -Original raw EEG (left); after Jitter augmentation (middle) and after TimeW augmentation (right).

FIGURE 5 .
FIGURE 5. Scatter plot of Holdtime8 and Holdtime12 features without augmentation (left), after SMOTE augmentation (middle) and after ADASYN augmentation (right) from Keystroke data.Blue dots represent the imposter class whereas orange ones represent the genuine class.

FIGURE 7 .
FIGURE 7. Performance vs the number of features plot for EEG (Left) and Keystroke (right).

FIGURE 8 .
FIGURE 8. Comparison of overall accuracy for the 7 classifiers (left) and co using the combination of EEG

FIGURE 9 .
FIGURE 9. CMC curves obtained for EEG, Keystroke and EEG-Keystroke modalities: without feature selection (left) and with feature selection (right).

FIGURE 11 .
FIGURE 11.The GUI framework for the real-time evaluation of the proposed multimodal identification and authentication system.

Top-10 features using Random Forest feature ranking model for EEG signal (top) and Keystroke data (bottom).
TABLE II that the highest accuracy, precision, recall, and F1 score are obtained by Random Forest (RnF) classifier.RnF obtained 95.8% FIGURE 6. accuracy, 96.0% precision, 95.8 % recall and 95.8% F1 score.The results are

TABLE II COMPARISON
OF TOP 7 CLASSIFIERS USING EEG AND KEYSTROKE FORIDENTIFYING 10 SUBJECTS *highest score is indicated by bold numbers consistent in both the cases -with and without feature selection.The best hyper-parameters of RnF are chosen by random search algorithm (number of trees = 500, split criterion = 'Gini', min samples split = 2, and max features is the square root of the total number of features).The secondbest performing classifier is Extreme Gradient Boosting (XGBoost), which achieved 93% accuracy, 93.3% precision, 93% recall, and 93% F1 score without feature selection.XGBoost achieved slightly lower scores with feature selection.However, it is evident that ensemble algorithms with bagging and boosting techniques are the most suitable ones for EEG biometric classification.Keystroke: Same classifiers were tested on the Keystroke data as shown in TABLEII.Here, we can see that RnF again achieved the highest accuracy, precision, recall, and F1 score of 99.5%, 99.3%, 99.3%, 99.3%, respectively without feature selection.RnF also achieved the highest performance with feature selection; however, the performance metrics slightly reduce than that were reported with all features.The second-best performing classifier is XGBoost which achieved 99.3% accuracy, 99.3% precision, 99.3% recall, and 99.3% F1 score.This analysis clearly shows that Keystroke can achieve better performance than EEG.Fusion of EEG and Keystroke can improve the performance metrics even further.

TABLE IV PERFORMANCE
OF PERSONALIZED CLASSIFICATION AFTER AUGMENTATION ON FEATURE VECTORS OF EEG, KEYSTROKES, AND EEG-KEYSTROKESTo improve the sensitivity of EEG-based modality further, we have applied SMOTE and ADASYN augmentation on the extracted features from the EEG data.The results of these augmentations are shown in TABLE IV.It can be seen from TableIVthat the average accuracy, precision, recall, and F1 score with SMOTE are 99%, 98%, 96.5%, and 97.2%, respectively.The average accuracy, precision, recall, and F1 score with ADASYN are 98.9%, 97.2%, 97%, and 97.1%

TABLE V RANK
WISE IDENTIFICATION RATES FOR EEG, KEYSTROKES AND COMBINED MODALITIES

TABLE V
8 ms.The average training time for classifying 10 subjects is 10.5 s and for 2 subjects, 8.8 s.The mean authentication time of one sample is shown in Table VII for EEG, Keystroke, and EEG-Keystroke

TABLE VI EER
FOR ALL SUBJECTS IN ALL MODALITIES WITH AND WITHOUT FEATURE SELECTION

FIGURE 10. ROC curves obtained for EEG, Keystroke and EEG-Keystroke modalities: without feature selection (left) and with feature selection (right). modalities
. It can be observed from TABLE VII that the lowest times are required in the template matching configuration with feature selection which are 6.33 ms, 4.82 ms, and 7.84 ms for EEG, Keystroke, and EEG-Keystroke, respectively.Template matching is 6 times faster than classification methods on average.Though template matching has slightly lower performances in terms of performance metrics, it outperforms the classification methods in terms of prediction time.So, it is recommended for real-time use.