Sparse Common Feature Analysis for Detection of Interictal Epileptiform Discharges from Concurrent Scalp EEG

Temporal interictal epileptiform discharges (IEDs) are often invisible in the scalp EEG (sEEG). However, due to within-electrode temporal correlation and between-electrode spatial correlation, they still have their signatures in the sEEG. Therefore, it is expected to have some common spatial and temporal features among the IEDs. In this paper, we first present a novel method, called common feature analysis (CFA)-based method, for IED detection via an existing common orthogonal basis extraction (COBE) algorithm. In the second approach, we benefit from the sparsity of IED waveforms in developing a new algorithm, namely sparse COBE, and based on that, a sparse CFA (SCFA)-based method for IED detection. The proposed CFA and SCFA models are compared with two state-of-the-art IED detection methods. Two types of approaches, namely within- and between-subject classification approaches, are employed for evaluating the methods. SCFA outperforms the others and achieves the accuracy values of 75.1% and 67.8% using within- and between-subject classification approaches, respectively. This enables the proposed techniques to capture the intracranial biomarkers of epilepsy and ameliorate the performance of a classifier in automatically detecting the scalp-invisible IEDs from sEEG.


I. INTRODUCTION
E PILEPSY is a chronic brain disorder that can affect people at any age [1]. It causes recurrent and erratic interruptions in brain functionality, called epileptic seizure, arising due to dysfunction of the brain electrophysiological system and uncontrolled electrical discharges in a group of neurons in the cerebral cortex [2], [3]. Between two seizure onsets, abnormal patterns occur, called interictal epileptiform discharges (IEDs), which can be captured by the EEG [4]. Nonetheless, scalp EEG (sEEG) suffers from low sensitivity in capturing these discharges and, consequently, around 30% to 40% of patients considered for epilepsy surgery require invasive intracranial EEG (iEEG) recording [5]. As a result, ameliorating the sensitivity of sEEG for epilepsy diagnosis and management as a low cost noninvasive approach becomes very important. Furthermore, findings from sEEG are crucial in presurgical assessment to decide if and where to implant iEEG electrodes. Therefore, developing an effective method for identification of IED from over the scalp can greatly enhance the effectiveness of surgical treatment.
Recording from mesial temporal structures through multicontact foramen ovale (FO) electrode bundles [6] paves the way to investigate the scalp fields associated with mesial temporal lobe epilepsy, the most common form of human focal epilepsy [7]. The FO electrodes are bilaterally introduced via FO into ambient cistern [8], [9] and provide an opportunity to simultaneously record sEEG and iEEG without disruption to brain coverings [9], [10]. Investigation of sEEG and iEEG simultaneously has shown that a small percentage of IEDs, for instance, 9% [11] or 22% [12], can be recognized in sEEG by visual inspection, here called scalp-visible IEDs. The reason lying behind this fact is possibly the relatively high attenuation of electrical fields thanks to being away from the source [13], [14]. However, it has been proven that the spikes from deep sources contribute to the scalp EEG regardless of how deep their locations are [15].
Most of IED detection algorithms are applicable to either sEEG [16], [17] or iEEG [18], [19]. Algorithms developed for detection of IEDs scored by sEEG use only scalpvisible IEDs for training the model and, consequently, are not sufficiently suitable as they hardly detect scalp-invisible IEDs. There are few studies investigating scalp IEDs from concurrent sEEG-iEEG recordings [20]- [23]. Spyrou et al. [20] detected scalp-visible and scalp-invisible IEDs, scored by iEEG, from sEEG by using time-frequency (TF) features. In [21], the authors mapped sEEG to iEEG by using an asymmetric-symmetric autoencoder, then detected IEDs by using a convolutional neural network. This method requires an intensive computational cost and does not exploit the data statistics effectively. Quite recently, we developed a model to map the sEEG to iEEG recordings by tensor factorization [22]. In our mapping model, the TF features were extracted by applying continuous wavelet transform. Time, frequency, and channel modes of IED segments from iEEG recordings were concatenated into a four-way tensor. Then, the tensor was decomposed into temporal, spectral, spatial, and segmental factors by employing Tucker and CANDE-COMP/PARAFAC decomposition techniques. Finally, TF features of both IED and non-IED segments from the scalp recordings were projected onto the temporal components for detecting IEDs. Furthermore, we already proposed a method based on tensor factorization to detect scalp-visible and scalp-invisible IEDs from concurrent sEEG-iEEG recordings [23], [24]. However, we here apply our proposed models to the concurrent sEEG and iEEG recordings as well, in which the IEDs were scored based on iEEG waveforms but detected from the sEEG recordings. In other words, both scalp-visible and scalp-invisible IEDs are included in the dataset thanks to using the iEEG as a ground truth for scoring the IEDs.
Group component and common feature analyses (CFA) have recently been a hot topic in biomedical signal processing [25]- [27]. Zhang et al. [25] recognized the steady-state visual evoked potential by analyzing common components. In [26], the authors proposed an artifact rejection method based on CFA. On the other hand, only very few papers have employed group component analysis for IED detection [28], [29]. In [28], the authors constructed a four-way tensor of time, channel, frequency, and segment information. Then, they factorized the tensor using Tucker model into temporal, spatial, and frequency modes. Each mode consisted of a number of components or signatures that were common between the trials. Finally, they detected the IEDs using the spatial factors. Thanh et al. [29] used tensor decomposition for detecting epileptic and non-epileptic spikes. They built a four-way tensor of time, channel, wavelet-scale, and epileptic spike, and decomposed it using nonnegative Tucker decomposition. Then, the extracted tensor factors and core tensor were used for epileptic spike detection. However, to the best of our knowledge, there exists no IED detection study that extracts common components across all IED segments in the most discriminatory time-space domain only. Therefore, we aim to present two models based on CFA and sparse CFA to detect the IEDs from sEEG using a unique limited set of concurrent sEEG and iEEG recordings.
In this study, we consider that the IED segments for each subject are naturally linked and share spatially and temporally some common features. These common features which are latent in EEGs may reflect more accurately the IEDs characteristics. Zhou et al. [30] developed an algorithm, namely common orthogonal basis extraction (COBE), for extracting common and individual features to boost image classification performance. We adopt the COBE algorithm to exploit the latent common features among the IED segments in order to enable detection of a higher percentage of IEDs from over the scalp using a unique set of simultaneously recorded sEEG and iEEG. This method is referred to as CFA-based method for IED detection. In the second approach, as the main contribution of this paper, we extend the COBE algorithm to exploit the common features with sparsity constraint, referred to as sparse common orthogonal basis extraction (SCOBE). We extract common features among the IED segments with sparsity constraints (sparse common features) using our developed SCOBE algorithm. This method is called sparse CFA (SCFA)-based method for IED detection. It should be noted that, in our dataset, the IEDs are scored from the iEEG by an expert clinician, while they are detected from the sEEG. This provides an opportunity to automatically detect scalpinvisible IEDs from sEEG, which is not feasible in the sEEGbased algorithms for IED detection. For classification, three types of classifiers, namely support vector machines (SVM), diagonal linear discriminant analysis (DLDA), and naïve Bayes (NB) are employed.
The rest of the paper is structured as follows: CFA and SCFA models are described in Section II, data description and preprocessing are provided in Section III, the results are reported in Section IV, the discussion is presented in Section V, and Section VI concludes our work.

II. METHODS
The IEDs are associated with abnormal patterns, thus it can be assumed that they are independent from the other brain activities. Moreover, they have much similarity in shape and morphology and therefore, some features are expected to be shared among them. In contrast, the non-IEDs are random and there is no shared feature between them. Therefore, we are interested in a feature space that spans the IEDs only.
In the proposed CFA and SCFA methods, common features and sparse common features among the IED segments are exploited respectively by COBE and SCOBE algorithms. Then, both IED and non-IED segments are projected onto them using Khatri-Rao product. Finally, the features of projected segments are extracted for classification. FIGURE 1 shows the flowchart of the proposed methods representing the overall IED detection system. The details of the methods are explained in the following subsections. VOLUME

A. COMMON FEATURE ANALYSIS
In the proposed CFA model, the common features are exploited using the COBE algorithm [30], explained below.

1) COBE
Suppose the training dataset consists of N IED segments, X = {X n ∈ R L×M : n ∈ N }, N = {1, 2, ..., N }, where L and M are respectively the number of time samples and channels. Our goal is to extract the common features among all IED segments. According to matrix factorization solution, for each matrix X n , we attempt the following minimization: where the columns of S n ∈ R L×Pn denote the sources in X n , W n ∈ R M ×Pn indicates the corresponding mixing matrix, and . F is the Frobenious norm operator. It is assumed that P n < min(L, M ), implying that S n W T n presents a lowrank representation of X n .
The sources of the data ensemble X n are linked together, thereby sharing some common features. Hence, we can define S n as follows: where the sub-matrixS ∈ R L×C consists of common features shared by all the matrices in X , and the sub-matrix S n ∈ R L×(Pn−C) , C ≤ min{P n : n ∈ N }, presents the individual sources of each X n . By doing so, we are able to re-factorize the data matrices X n in an augmented way as: whereW n andW n consist of the mixing matrices corresponding toS andS n , respectively. There are numerous solutions to minimization of (1), which are not unique. To reduce the solution space, the following three constraints are applied: iii. There is no interaction (correlation) between the spaces of common and individual features, i.e.,S TS n = 0 By substituting (3) in (1) and considering the above constraints, (1) can be reformulated to: where the notation 0 denotes a C × (P n − C) zero matrix.
There is a close relationship between the factorization problem (4) and principal component analysis (PCA) when P n = C, ∀n ∈ N . In this case,S = S can be found from Problem (5) can be considered as a partitioned version of the global PCA ofX when the data matrices X n are stacked to construct a global matrixX = [X 1 X 2 · · · X N ] and similarlỹ However, the factorization problem (4) is not equivalent to PCA when C < P n . The main difference between problems (4) and (5) is owing to the individual componentsS nW T n , meaning that the common components found by (4) can be interpreted as the principal components of the common subspaceX n = X n −S nW T n . For more details, the reader is referred to [30].
To solve (4), finding the common featuresS plays a vital role. From (3), we have: where (.) † denotes Moore-Penrose pseudo-inverse of a matrix. To estimateS, we can employ QR decomposition to decompose X n = Q n R n , where Q n is an orthogonal and R n is an upper triangular matrix. By defining Z n = R n W T † n , (7) can be reformulated to: Therefore, for any given n 1 , n 2 ∈ N , n 1 = n 2 , we have: where z n,k ands k are respectively the kth columns of Z n andS. It should be noted that the condition (9) is valid when there is a similarity among all the X n segments (and consequently Q n ). Due to this fact, our X n consists of only IED segments, falling in the same frequency range having similar morphologies (e.i., spkies and sharp waves). From (9), we can compute the first column ofS, signified bys 1 , by minimizinḡ where . 2 denotes l 2 -norm. Based on (8) and (9), the objective function J 1 has to be very small (very close to zero) to ensure thats 1 is a common basis vector among the trials.
An alternating least-square (ALS) optimization algorithm can be utilized to minimize (10). First, by fixing z n,1 , the optimals 1 is obtained bȳ which is then normalized to have a unit norm. Repeating for a fixeds, we calculate z n,1 as and repeat until convergence. For the proof of convergence of ALS algorithm see [31]. The vectors 1 is considered to be a common basis vector as long as min J 1 ≤ for a very small threshold ≥ 0; otherwise, there is no common feature among the trials and iterations (11) and (12) stop.
Given the estimated set of common basis vectors, [s 1 ,s 2 , . . . ,s k ], it needs to be ensured that the new sought vectors k+1 is not repeated. We can achieve this by considering the following property of Z n . Suppose Z n,C = [z n,1 z n,2 · · · z n,C ], then according to (8) we have: This means z T n,k+1 z n,k = 0 and z n,k+1 is the null space of z T n,k , allowing us to update Q n as Finally, this leads to findings k+1 through minimizing the following objective function: The ALS algorithm is repeated till J k+1 is minimized.

2) IED Detection Based on CFA
In the proposed CFA-based method for IED detection, we employ the COBE algorithm to extract the common basis vectorsS ∈ R L×C among the IEDs. Then, both IED and non-IED segments X k ∈ R L×M are projected onto the extracted vectors using Khatri-Rao product as follows: for k = 1, . . . , K, where K is the total number of both IEDs and non-IEDs in the training and test data, the symbol ' ' denotes Khatri-Rao product, X k is an IED or non-IED segment, and P k ∈ R (M C)×L represents the same segment after projection. The epileptiform spikes (no matter if they are scalp-visible or scalp-invisible IEDs) have similar behavior, meaning that most channels have the same trends including a sharp excitatory and a damped inhibitory oscillation during the spike onsets. In addition, these trends are similar to the common basis vector trend. By Kharti-Rao product, the time samples of each channel are separately elementwise multiplied by each of the common vectors [s 1 ,s 2 , . . . ,s C ]. Therefore, the epileptiform spikes (or the background activities of scalp-invisible IEDs) are magnified by projection.
On the other hand, since there is no common feature among the non-IEDs, this projection has no significant impact on them. We show this in the results (FIGURE 5). Furthermore, the kurtoses of projected segments are extracted for being used as the classification features, shown in Section III-D. This strength in the amplitude leads to an increase in the IED kurtosis, while does not significantly affect the non-IED kurtosis.

B. SPARSE COMMON FEATURE ANALYSIS
During the last decade, sparse representation has attracted much attention in various signal processing areas including epilepsy study [32], [33]. The train of spikes emitted from individual neurons in the brain can be considered sparse in some domains such as time and space domains. One of the interesting characteristics of an IED is its sparsity in the time domain. The original COBE algorithm does not exploit this property, making it inefficient for spike detection. Therefore, we develop a new algorithm, namely SCOBE, with sparsity constraint to exploit the common features. Then, we propose a model based on sparse common features to detect the IEDs, called SCFA.

1) SCOBE
In this approach, we extract the common basis vectors with sparsity constraint. In other words, the number of non-zero elements of each basis vector is sparsified. To this end, the sparsity condition is incorporated into (10) to change it into a constrained problem as follows: fors 1 = Da 1 , where T 0 is a small threshold set empirically, D ∈ R L×F is the dictionary (whose columns are the atom signals), a 1 ∈ R F includes the first sparse representation vector of the signals, and . 0 denotes l 0 -norm which accounts for the number of non-zero entries. However, (17) is an NP-hard problem but can be efficiently solved using sev-VOLUME 4, 2016 Algorithm 1. The SCOBE pseudocode. Input: X n , n ∈ N , ≥ 0 1: Decompose X n =Q n R n s.t. Q T n Q n =I M , n ∈ N . 2: Train a dictionary D using the K-SVD algorithm.
Initialize a k randomly and sparsely with a unit norm 6: while not converged do 7: y= n∈N Q n z n,1
Apart from D and a 1 , we need to minimize z n,1 , thereby employing ALS iteration. Suppose Da 1 is fixed, z n,1 is computed as: Then, by keeping z n,1 , we have where y = n∈N Q n z n,1 . To optimizes the objective function (19), the OMP technique [35] is used to approximate the sparsity and the K-SVD algorithm [38] to train the dictionary D. More details are given in Appendices A and B. In terms of stability, it should be noted that in our proposed method the objective (19) is a matrix-based problem optimized by the OMP technique. The stability of OMP has been proven in [39]. However, after solving (19), from (17) the first sparse common basis vector will bē and normalized to have a unit norm.s 1 and z n,1 are iteratively and in an alternating manner computed. It should be noted that the condition of min J 1 ≤ needs to be met for a very small threshold ≥ 0 fors 1 to be a sparse common basis vector among the trials. In order to avoid repeating the sparse common basis The proposed SCFA-based (or CFA-based) model for IED detection. X includes the IED segments (N ) from the training set only.S denotes the sparse common basis vectors (or common basis vectors) extracted by applying the SCOBE (or COBE) algorithm to X . X k can be an IED or non-IED segment from the training and test datasets and P k represents the same segment after projection. The notation ' ' denotes Khatri-Rao product.
vectors, we need to update Q n . Here, the property of Z T n,C Z n,C = I is also verifiable like COBE, (13). Therefore, Q n is updated through (14).
Finally, after computing Q (k+1) n , the new sparse common basis vector is obtained by solving the following objective function: which can be minimized by repeating the procedure in solving (17). New sparse common basis vectors are considered the vectors which make J k is smaller than a very samll threshold (J k < ). In other words, the number of common or sparse common components are determined by . Accordingly, should be small enough to avoid extracting uncommon factors. The pseudo-code of SCOBE is illustrated in Algorithm 1.
To avoid confusion, it should be noted that the number of non-zero elements of each basis vector -not the number of basis vectors -is sparsified. From (17) and (19), it can be seen that the number of non-zero elements of vector a c , wherē s c = Da c , is sparsified.
In the proposed SCFA-based method for IED detection, the sparse common basis vectors among the IEDs from the training dataset are extracted using the developed SCOBE algorithm. After obtaining the sparse common basis vectors, the IEDs and non-IEDs from the training and test datasets are projected onto them using Khatri-Rao product according to the procedure given in (16). The schematic diagram of CFAand SCFA-based methods for IED detection is illustrated in FIGURE 2.

A. DATA DESCRIPTION
We analyzed 20 minute EEG recordings from 18 subjects suffering from temporal lobe epilepsy. Informed consent was obtained from all the individual participants included in the study. sEEG and iEEG signals were simultaneously recorded using sampling frequency of 200 Hz at King's College Hospital London. The sEEG was recorded by using 18 standard silver chloride electrodes placed on the scalp according to the 'Maudsley' electrode placement systemwhich is essentially similar to the 10-20 system except that the lateral electrodes have lower positions in order to improve recording from the temporal lobes [11] -and 2 electrodes placed on the ear lobes. For recording iEEG, 12 intracranial multi-contact FO electrodes consisting of a couple of 6 electrode bundles were used. Both sEEG and iEEG signals were recorded with respect to Pz as a common reference, and filtered by a bandpass filter with cutoff frequencies of 0.3 Hz and 70 Hz.

B. IED SCORING
IEDs were scored by an expert epileptologist based on the morphology and spatial distribution of the observed waveforms from the iEEG. In other words, the iEEG recordings are used as ground truth since all scal-visible and scalpinvisible IEDs are observable in the iEEG signals. The scoring details are illustrated in our previous work [20]. Briefly, each IED is classified into one of the following groups: (I) scalp-invisible IED, (II) scalp-visible IED by considering the concurrent iEEG, and (III) scalp-visible IED without considering the concurrent iEEG. FIGURE 3 shows a sample of IED from each group. In FIGURE 3 (b), showing a scalpinvisible IED, there is no sign of spike or sharp waves over the scalp electrodes, while the FO channels captured the epileptiform discharges. From FIGURE 3 (c) showing a scalp-visible IED by considering the iEEG, we can see that some signatures of IEDs were captured by the scalp channels, but without referencing to the iEEG signals we can not consider these waveforms as an IED waveform. In the scalp-visible IED, without considering the concurrent iEEG shown in FIGURE 3 (d), the IED waveforms is observable over both scalp and FO channels. Note that all three groups of IEDs fall within the same IED class in classification.

C. DATA FILTERING AND SEGMENTATION
In order to increase SNR and avoid the 50 Hz grid frequency, a bandpass filter with cutoff frequencies of 4 Hz and 48 Hz was applied to the sEEG signals. In addition, contra-lateral (CL) reference method was employed as rereferencing method to the sEEG signals [40]. In CL, the right and left hemisphere electrodes are re-referenced to the right and left earlobe electrodes, respectively. In our work, "Z" electrodes are re-referenced to the average of the two earlobe electrodes.
For analysis and classification, the length of the segments with IED were selected to be 480 ms (96 samples) -160 ms before and 320 ms after the peak positions marked as IED. The non-IED segments with 480 ms length were extracted from the time segments in which no scored IED exists, and did not have overlap with the IED segments. The number of non-IED segments was the same as the number of IED segments for each subject-the number of trials are summarized in TABLE 1. Then, both IED and non-IED segments were linearly detrended to alleviate the undesired drifts. Finally, the scalp segments were normalized using the z-score method to have unit norm per electrode channel.

D. FEATURE EXTRACTION
We construct X ∈ R 96×18×N (whose dimensions 96, 18, and N correspond respectively to the time samples, scalp channels, and IED segments from the training set). Common (or sparse common) basis vectorsS ∈ R 96×C , where C is the number of vectors, are exploited by employing COBE (or SCOBE) to X . Then, both IED and non-IED segments from the training and test sets X k ∈ R 18×96 are projected onto the extracted vectors using the Khatri-Rao product (16), P k =S T X T k . Finally, the kurtosis of components of the projected IEDs and non-IEDs P k ∈ R (18C)×96 is computed as classification features.
Kurtosis is a statistical measure of whether the data are heavy-tailed or light-tailed and describes the shape of a distribution. For each component -here 18C componentsthe kurtosis can be computed as: whereμ and σ are respectively the component mean and standard deviation. Each scalp IED or non-IED segment  consists of 18 × C features.

E. COMPETING MODELS
We compare the performance of our proposed models with those claimed by very recent publications in this area.

1) Kurtosis features
We compare our proposed methods with a method in which the kurtosis features (KFs) are extracted by (22) from the raw data after preprocessing. The corresponding method is referred to as KFs method. This method is selected for comparison mainly because here we extract the kurtosis features from the projected segments in our proposed methods. Therefore, it is interesting to see the effects of kurtosis on the feature space and IED detection performance.

2) Time-frequency features
We previously found out that the time-frequency (TF) features are superior to continuous wavelet transform and chirplet transform for this particular application [20]. Therefore, we compare our proposed models with a model based on TF features. Common average reference (CAR) method is employed as re-reference method for artifact rejection, as it is recommended in the paper proposing the TF model for IED detection [20]. In the CAR method, the reference signal is the average over all the electrode signals, which is subtracted from each of them. Then, each scalp electrode is linearly detrended to alleviate the undesired drifts, and normalized to have zero mean and unit norm. Finally, the TF features are obtained by the spectrogram method with a Hanning window of size 80 ms (16 samples) and an overlap of 50% between the windows. A total of 11 windows slide over each segment. The squared magnitudes of short-time Fourier transform are calculated from the spectrogram as TF features. The number of discrete Fourier transform points has been set to 16 (the same as the number of time samples in a window) resulting in 9 frequency features. Finally, each scalp IED or non-IED segment consists of 1782 features (18 scalp channels ×11 temporal ×9 frequency).

3) Simultaneous multilinear low-rank approximation of tensors
The proposed models are compared with the method, namely simultaneous multilinear low-rank approximation of tensors (SMLRAT), proposed quite recently by Thanh et al. for EEG epileptic spike detection [29]. The model is summarized as follows. In the SMLRAT model, the authors decomposed epileptic and non-epileptic spikes through the continuous wavelet transform (CWT) and built a three-way tensor for each trial, Y ∈ R H×L×M , (whose dimensions H, L, and M correspond respectively to wavelet-scale, time, and channel). They concatenated only three-way epileptic tensors, four-way tensor Y ep ∈ R H×L×M ×N , and then applied nonnegative Tucker decomposition (NTD) to Y ep , as shown below, to obtain the B 1 , B 2 , and B 3 factors: where G ∈ R r1×r2×r3×N denotes the core tensor, , and B 4 ∈ R N ×N span the parameter spaces respectively representing the waveletscale, time, channel, and epileptic spikes. Finally, in order to obtain feature space of each trial, the epileptic and nonepileptic spikes were projected onto the wavelet-scale, temporal, and spatial factors as follows: where (.) † denotes Moore-Penrose matrix pseudo-inverse.
Here, we decompose the IED and non-IED segments through CWT and construct a three-way tensor for each segment -Y i ∈ R 38×96×18 . The three-way IED tensors are concatenated into a single four-way tensor, Y ep ∈ R 38×96×18×N . Then, NTD is employed to obtain the factor matrices; B 1 ∈ R 38×10 , B 2 ∈ R 96×15 , and B 3 ∈ R 18×18 ; and features F i ∈ R 10×15×18 . We should note that LC or CAR re-referencing and z-score normalization are not applied to SMLRAT since not only these methods were not employed in the paper proposing SMLRAT but also these deteriorate spatial components in tensor-based methods.

F. FEATURE SELECTION
We utilized Fisher score algorithm for finding significant features. Fisher score is defined as follows: where µfc and ρfc are respectively the mean and the variance of thef -th feature in thec-th class, nc is the number of instances in thec-th class, and µf is the mean of thef -th feature.

G. CLASSIFICATION AND CROSS-VALIDATION
In order to classify the IED and non-IED segments, we employed three different classifiers, namely SVM, DLDA, and NB. SVM and NB are popular classifiers for biomedical data analysis [41], [42] particularly for epileptic seizure prediction [23], [43]. We use linear SVM in all methods, which outperformed other kernel-SVMs in our experiment. In addition, DLDA is superior to LDA for our dataset. Therefore, DLDA is employed.
Classifying the IEDs and non-IEDs is performed in two approaches, namely within-and between-subject classification approaches. In the within-subject classification approach, an VOLUME 4, 2016 individual classifier is trained for each subject and a k-fold (k=5) cross validation is employed to validate the models. Increasing the number of folds did not change the outcome. In this approach, subjects 13, 14, and 16 are excluded from classification because of having less number of trials, thus the results of 15 subjects are reported. In the between-subject classification approach, one-subject-leave-out cross validation is employed to validate the models. In other worlds, a subject is used as the test data and other subjects (17) are employed to train a classifier. This is repeated for all subjects. Accuracy (ACC), sensitivity (SEN), specificity (SPEC), and F1 score (F1-S) are obtained as the evaluation criteria as follows: where TP and TN are respectively the number of IED and non-IED samples classified correctly in their classes, FP is the number of non-IED samples recognized incorrectly as IED samples, and FN is the number of IED samples categorized wrongly in the non-IED class. Accuracy indicates the percentage of detection of IED and non-IED samples. Sensitivity and specificity respectively illustrate the performance of a classifier in correctly detection of the IED and non-IED samples.

IV. RESULTS
The obtained results are presented in three sections. In Section IV-A, the extracted common components and the impact of their projection on the IED segments are investigated.
The within-and between-subject classification approaches results (performance ± standard error (SE)) are respectively reported in Section IV-B and IV-C. DLDA, SVM, and NB classifiers were employed for classification. We made use of the first 36 significant features according to Fisher scores in CFA and SCFA, 18 significant features in KFs, 100 significant features in SMLRAT, and 200 significant features in TF. Those numbers of features gave the highest accuracy in their models.

A. COMPONENTS AND PROJECTION
The first common basis vector and sparse common basis vector extracted respectively using COBE and SCOBE are illustrated in FIGURE 4. The sparse common basis vector not only is sharper but also has higher amplitude than the common basis vector. Furthermore, the sparse common basis vector does not fluctuate as much as the common basis vector does. FIGURE 5 shows a non-IED and an IED segment before and after projection onto a common basis vector and sparse common basis vector. Although slow waves appear over a few scalp channels after projecting the non-IED segment onto the first common and sparse common basis vector (FIGURE 5 (b) and (c), respectively), strong spikes and sharp waves appear over all scalp channels after projecting the IED segment onto the first common and sparse common basis vector (FIGURE 5 (e) and (f), respectively). Before projecting the IED, FIGURE 5 (d), the IED waveforms are observable only over channels P3 and P4. After projecting the IED segment onto the common basis vector obtained using COBE, FIGURE 5 (e), IED waveforms in the shape of spikes and sharp waves are observable over almost all channels. After projecting the IED segment onto the sparse common basis vector, FIGURE 5 (f), the IED waveforms become more sharper.

B. IED DETECTION BASED ON WITHIN-SUBJECT CLASSIFICATION APPROACH
Both scalp-visible and scalp-invisible IEDs scored by an expert clinician based on the iEEG recordings were detected from the sEEG recordings. The obtained IED detection results based on within-subject classification approach are illustrated in TABLE 2.
In DLDA, SCFA outperforms other methods and provides the best performance with 74.2% accuracy, 63.7% sensitivity, 84.6% specificity, and 0.70 F1-score values. CFA presents the best sensitivity value of 64.1% which is approximately 4%, 18%, and 7% more than TF, SMLRAT, and KFs sensitivity values, respectively. In SMLRAT, the DLDA classifier is biased to the non-IED class, meaning that most segments are recognized as non-IED segments.
The best accuracy of SVM classifier is obtained using SCFA which was 74.3%. Regarding SEN and F1-S, the SCFA model outperforms other methods as well. In terms of SPEC, KFs provides the best value. TF is the worst method in all criteria except in terms of SEN. SCFA achieve the best accuracy of 75.1%, specificity of 84.7%, and F1-score of 0.71% using the NB classifier. The best sensitivity of 68% is obtained by TF. CFA classifies the IEDs and non-IEDs by 73.6% accuracy which is higher than TF, SMLRAT, and KFs do.

C. IED DETECTION BASED ON BETWEEN-SUBJECT CLASSIFICATION APPROACH
The obtained IED detection results based on between-subject classification approach are shown in TABLE 3. Here, the performance of SMLRAT, KFs, CFA, and SCFA are reported. The TF method is not employed here in the betweensubject classification approach. The paper [20] proposing the TF method for IED detection detected IEDs in the within-and between-subject classification approach. However, in the between-subject classification approach, they trained a classifier for each subject using the data of the same subject; then, all trained classifiers were ensemble to detect IEDs of a new subject. Since the data of different subjects are not combined in the TF model, it is not employed in this approach.
Using DLDA, SCFA achieves the best accuracy of 67.6% which is approximately 10% and 6% more than the accuracy obtained respectively by SMLRAT and KFs. In addition, it presents the best sensitivity and F1-score values. In terms of specificity, KFs achieves the best value of 87.8%.
Using SVM, CFA provides the best accuracy of 67.3% which was slightly more than the accuracy of SCFA, 67.1%. In terms of specificity, KFs achieves significantly better performance. However, KFs and SMLRAT are biased to the non-IED class. While they achieve respectively the specificity values of 90.4% and 82%, they detect IEDs respectively with only 32.8% and 29% sensitivity values.
Using NB, SCFA outperforms CFA, KFs, and SMLRAT in all criteria except specificity. It achieves the accuracy of 67.8% which is respectively 1.5%, 3.0%, and 16.6% more than those of CFA, KFs, and SMLRAT. The SMLRAT model provides the worst performance with sensitivity value of 17.5%. Generally, the NB classifier is biased to the non-IED class in all methods.

V. DISCUSSION
In both IED classification approaches, SCFA, a new sparse common feature analysis method, outperforms TF and SML-RAT (which is based on non-negative Tucker decomposition). The major advantage of our proposed models is that they extract the components in a trial-, subject-, and channelindependent-based approach, which enables the algorithms to effectively capture the background EEG activities and the intracranial biomarkers of epilepsy. Furthermore, SCFA outperforms CFA, while in both of them common components are extracted and used for classification. The only difference between CFA and SCFA is that SCFA exploits common components with sparsity constraints. These shows that our proposed algorithm, SCOBE, is superior to the plain COBE algorithm.
SMLRAT [29] has been reported to have a high performance where scalp-visible epileptic and non-epileptic spikes were detected with an accuracy as high as 95.8% from scalp recordings. In contrast, here, both scalp-visible and scalpinvisible IEDs are detected, causing a fall in the performance of all methods. SMLRAT performs significantly better in the within-subject approach compared to the between-subject approach. The best accuracy obtained for the within-subject classification approach is 72.9%, while it is 57.1% for the between-subject classification approach. SMLRAT is based on spectral, temporal, and spatial components. In our dataset, the locations of IED sources were different among subjects. IEDs may be originated from right, left, or both temporal lobes for each subject. That is, when the data of different subjects are combined, spatial components not only become meaningless but also deteriorate the performance of a classifier.
In many studies, IEDs were detected by high accuracy [44], [45], while the performance of both proposed and compared methods is not high in our study. However, only scalp IEDs were involved in their dataset, while a large proportion of IEDs is not observable from over the scalp. Therefore, they lost a large proportion of IEDs by default. Our dataset consists of both scalp-visible and scalp-invisible IEDs. The iEEG and sEEG were simultaneously recorded. The iEEG recordings were used as a ground truth for scoring IEDs, and the IEDs are detected from only the concurrent sEEG. The importance of this dataset is that we need to record concurrent iEEG and sEEG from the training subjects. After training a model, it can be employed to detect scalpvisible and scalp-invisible IEDs of a new subject from only the scalp recordings. The proposed SCFA and CFA methods detect respectively IEDs of new subjects with the accuracy values of 67.8% and 67.3%. This can bring a huge benefit to clinicians in monitoring epilepsy.
There is no limitation with respect to the data we have. However, the IED morphologies are different and also change with age [46]. Therefore, to generalize the application, we should have access to the data from a wider age range. As part of our future study we intend to include the IED shape diversity into a higher dimensional tensor decomposition approach.

VI. CONCLUSION
Automated detection of as many as possible IEDs from over the scalp is of paramount importance for epilepsy diagnosis and management. This is due to the fact that the majority of IEDs are invisible on the scalp. Therefore, during the observation of IEDs from sEEG signals, we often miss a large proportion of IEDs. To overcome this deficiency, we effectively use a limited set of concurrent iEEG-sEEG recordings to design an algorithm which can be applied to the sEEGs only. In this work, we adopt COBE algorithm proposed in [30] to extract the common components among the IEDs, then extent it to exploit the common features with sparsity constraints, called SCOBE. We propose two models namely CFA and SCFA based on COBE and SCOBE, respectively. We show that by employing the proposed models for the scalp-invisible IEDs, they become detectable from the sEEG signals. We employed SVM, DLDA, and NB for classification, and compared our proposed methods with two benchmark models, i.e., TF [20] and SMLRAT [29]. IEDs were detected in two different classification approaches, within-and between-subject classification approaches. The SCFA model outperforms other methods in both approaches and achieves respectively the best accuracy values of 75.1% and 67.8% using the NB classifier. These findings show that common component analysis can be very effective in capturing IED signatures, and exploiting the common components among IED segments with sparsity constraints is superior to exploit the common components without any constraint. .

APPENDIX A OMP ALGORITHM
The OMP algorithm is an iterative algorithm that finds the sparse vector a 1 element-by-element in a step-by-step iterative manner. In this algorithm, the atom d f -fth column of D -with the highest correlation to the current residue, denoted by r, is selected at each step;f := f Argmax |d T f r|. Once the atom is selected, the signal is orthogonally projected to the span of selected atoms I; a 1 I := D † I y. After recalculating the residue, r = y − D I a 1 I , the procedure repeats until meeting a stopping condition. For implementation of the OPM algorithm, we utilized the Matlab toolbox provided by Rubinstein et al. [35].

APPENDIX B K-SVD ALGORITHM
The K-SVD algorithm trains a dictionary for sparse approximation through singular value decomposition (SVD) [38]. The goal of the algorithm is to iteratively learn a dictionary to achieve the sparsest representations of the signals in Ψ ∈ R L×T by optimizing the following constrained objective function: where Γ ∈ R F ×T is the sparse representation matrix of the signals Ψ using the dictionary D ∈ R L×F . It is important to note that T F L and that the columns of D need to be normalized. At first, D is selected randomly from Ψ. Then, a sparse approximation algorithm, here OPM algorithm, is utilized to compute the sparse representation vectors γ i for each example ψ i . For updating, at first, the group of examples using this atom, φ f = {i|1 ≤ i ≤ F, γ f T = 0}, where γ f T is the fth row in Γ, is defined, and the overall error matrix, E f , BAHMAN ABDI-SARGEZEH received the M.Sc. degree in digital electronic engineering from Iran University of Science and Technology, Tehran, Iran, in 2018. He is currently pursuing the Ph.D. degree in biomedical signal processing at Nottingham Trent University, Nottingham, UK. His research interest includes application of multiway analysis (e.g., tensor factorization and subspace analysis) and deep neural networks in EEG.
ANTONIO VALENTIN is Lecturer in Epilepsy in the Department of Clinical Neuroscience, King's College London, and Honorary Clinical Fellow in Clinical Neurophysiology at King's College Hospital. He graduated in Medicine at the University Complutense, Madrid, Spain, and had an accreditation as University Specialist in Informatics Related with Health, by the University Complutense. His main research interest is in the diagnosis and treatment of epilepsy, mainly with brain stimulation techniques including single pulse electrical stimulation (SPES), transcranial magnetic stimulation (TMS). and deep brain stimulation for epilepsy and deep brain stimulation for severe dystonia in children. He coedited the book "Introduction to Epilepsy" and is the co-director of the International League Against Epilepsy (ILAE) VIREPA EEG Advanced course.
GONZALO ALARCON graduated in Medicine from Madrid, Spain in 1984 and received his PhD from the Universidad Complutense. In 1987, he obtained a Fleming Award to pursue his research on spinal processing of pain in University of Bristol. He later specialised in clinical neurophysiology in the UK. Over 25 years in King's College London and later on, his focus has been on EEG, epilepsy, telemetry, intracranial recordings and intra-operative electrocorticography and running MSc in epileptology. He published two books and over 250 peer reviewed papers. Currently, he is with the University of Manchester, UK.
SAEID SANEI, SM'05, FBCS, received his PhD from Imperial College London, UK. His current research focus is on application of adaptive and nonlinear signal processing, subspace analysis, and tensor factorization to EEG, speech, and medical images. He has published five monograms, several book chapters, and over 400 papers in peer reviewed journals and conference proceedings. He has served as an Associate Editor for the IEEE Signal Processing Letters, IEEE Signal Processing Magazine, and Journal of Computational Intelligence and Neuroscience. Currently, he is with Nottingham Trent University, UK, and is a Visiting Academic in digital health to Imperial College London, UK.