Multivariate Analysis of Bivariate Phase-Amplitude Coupling in EEG Data Using Tensor Robust PCA

Cross-frequency coupling is emerging as a crucial mechanism that coordinates the integration of spectrally and spatially distributed neuronal oscillations. Recently, phase-amplitude coupling, a form of cross-frequency coupling, where the phase of a slow oscillation modulates the amplitude of a fast oscillation, has gained attention. Existing phase-amplitude coupling measures are mostly confined to either coupling within a region or between pairs of brain regions. Given the availability of multi-channel electroencephalography recordings, a multivariate analysis of phase amplitude coupling is needed to accurately quantify the coupling across multiple frequencies and brain regions. In the present work, we propose a tensor based approach, i.e., higher order robust principal component analysis, to identify response-evoked phase-amplitude coupling across multiple frequency bands and brain regions. Our experiments on both simulated and electroencephalography data demonstrate that the proposed multivariate phase-amplitude coupling method can capture the spatial and spectral dynamics of phase-amplitude coupling more accurately compared to existing methods. Accordingly, we posit that the proposed higher order robust principal component analysis based approach filters out the background phase-amplitude coupling activity and predominantly captures the event-related phase-amplitude coupling dynamics to provide insight into the spatially distributed brain networks across different frequency bands.


I. INTRODUCTION
NEURONAL oscillations and the interactions between them are believed to play a key role in understanding the cognitive function of the brain [1], [2]. The interaction and coordination between oscillations at different frequencies is defined as cross-frequency coupling (CFC) [3], [4]. CFC plays a fundamental role in large scale neuronal encoding and communication by providing temporal and spatial dynamics necessary for routing information through brain networks [3], [5]. CFC is an umbrella term to define a variety of phenomena like phase-phase coupling (PPC), amplitude-amplitude coupling (AAC), and phase-amplitude coupling (PAC).
One of the most studied forms of CFC is PAC, which quantifies the interplay between the phase of a slower oscillation and the envelope of a faster oscillation [4]. PAC has been reported to play a critical role in the execution of cognitive functions [6], attention selection [7], long-term memory processing [8], and sensory signal detection [9]. Although a variety of metrics have been developed to quantify PAC, most of these measures are bivariate in nature, i.e., they quantify local PAC observed between different frequencies of the same signal (within-channel PAC) or two different signals (cross-channel PAC). With the availability of multi-channel EEG data, there is a growing need for methods that can quantify neuronal couplings across the whole brain and across different frequency bands.
Recently, several methods have been introduced to quantify multivariate phase amplitude coupling [10]- [12]. The existing methods are based on matrix factorization and as a result, they model multivariate PAC for a pre-defined pair of frequencies rather than capturing the variation of PAC across space and frequency, simultaneously. For example, phase coupling estimation (PCE) is limited to quantifying CFC between one high frequency and N low frequency signals. As such, PCE cannot capture the whole brain CFC dynamics. [11]. Generalized eigendecomposition (gedCFC), on the other hand, is based on generalized eigendecomposition of multi-channel covariance matrices [10]. This is a hypothesis-driven framework and requires a priori knowledge about the low and high frequency bands that are coupled with each other. Thus, even though it is a multivariate PAC analysis method, it is limited to two pre-determined frequency bands. In recent work [12], we have introduced a tensor based framework using PARAFAC decomposition to identify multivariate PAC. While this method can capture the multi-way nature of PAC across channels and frequency bands, it is highly dependent on model order selection. Moreover, it identifies the phase and amplitude providing channels separately rather than directly determining the coupled channel pairs. Thus, this method requires additional significance testing of all possible combinations of phase and amplitude providing channels identified through the outer product of the extracted factors.
In this paper, we first compute bivariate PAC values across all phase and amplitude providing channels, frequency bands, and subjects and then represent these pairwise PAC values as a multi-way tensor. We propose a Higher order Robust Principal Component Analysis (HoRPCA) based multivariate framework to capture the spatial and spectral variation of the event-related PAC. HoRPCA decomposes an input tensor into low-rank and spare parts and has been widely used in a variety of outlier detection applications such as distinguishing sparse event-related EEG from the background non-task related brain activity [13]- [15]. It has been previously reported that different behavioral tasks evoke distinct patterns of PAC across the cortex [16]. For example, enhanced PAC in the temporal cortex region is correlated with working memory maintenance tasks [17] whereas an increase in PAC in the medial frontal cortex is related to error processing tasks [18]. During an event-related study, as all subjects perform the same task and respond to the same stimulus, we assume a relatively similar PAC network structure exists for all subjects. A similar assumption was previously made about within frequency synchronization networks [15], [19]. Based on this assumption for event-related activity, HoRPCA based multivariate PAC approach captures the background PAC network in the low-rank part of the tensor while the sparse tensor captures the PAC activity associated with the event of interest. We evaluate the performance of HoRPCA based multivariate PAC approach on synthetic EEG data with varying signal parameters such as noise, subject variability, and volume conduction and compare it with existing multivariate PAC approaches. Finally, we apply the proposed method to EEG recordings collected during a cognitive control study to identify the spatial and spectral dynamics of pairwise PAC associated with different response types and to differentiate between the response types using multivariate analysis of PAC networks.

RID-
Rihaczek time-frequency distribution of a signal x(t) is defined as 1 [20]: (1) where exp(− θτ 2 σ ) corresponds to the Choi-Williams kernel, and exp( j θτ 2 ) corresponds to the kernel function of the Rihaczek distribution [21], A(θ, τ) refers to the ambiguity function of x(t) and is defined as: This is a complex-valued distribution that can be employed to extract the amplitude and phase components of a given signal. In this paper, this time-frequency distribution will be used to extract the phase of the low frequency and amplitude of the high frequency oscillations to compute PAC as detailed in Section III-A.

B. Tensor Notation
A multidimensional array with N modes given as ∈ ℝ where x i 1 ,i 2 ,..i N denotes the (i 1 , i 2 ,..i N )th element. Column vectors obtained by fixing all indices of the tensor except the one that corresponds to nth mode are called mode-n fibers.
The process of rearranging the tensor elements into a matrix is defined as unfolding or matricization. The mode-n unfolding of the tensor , which is denoted by X (n) , can be obtained by arranging the mode-n fibers as the columns of the resulting matrix. The vectorization of is defined as vec( ).

A. Time-Frequency PAC (t-f PAC)
PAC for a given signal is defined as the modulation between the amplitude A f a (t) of the high frequency (f a ) with the phase φ f p (t) of the low frequency (f p ) oscillations. The first step in computing PAC between two neuronal oscillations, x(t) and y(t) (where the amplitude of x(t) is coupled with the phase of y(t)) is to compute the RID-Rihaczek distributions C x (t, f) and C y (t, f) following (1). The amplitude component of x(t) at the desired frequency f a is extracted from the frequency constrained time marginal of C x (t, f) as follows [22], [23]: where f a 1 and f a 2 refer to the bandwidth around the high frequency of interest, f a . The phase component of y(t) at the desired frequency f p is computed from the complex RID-Rihaczek distribution C y (t, f) as follows [22], [23]: After quantifying the amplitude and phase components, bivariate PAC between x(t) and y(t) can be computed using the amplitude normalized modulation index (MI) proposed by Özkurt and Schnitzler [24] as follows: where K is the number of trials, A x, k f a t is the extracted amplitude component from x(t) at high frequency f a for the kth trial and ϕ y, k f p t is the extracted phase component at low frequency f p from y(t) for the kth trial. This metric quantifies PAC as a function of time, low frequency (f p ) and high frequency (f a ) and is between 0 and 1. As time-frequency based phase and amplitude component estimates were used to compute MI x, y (f p , f a , t), for the remainder of the paper, this PAC index will be referred to as t-f PAC to differentiate it from conventional PAC metrics. MATLAB implementation of this metric is given in [25].

B. Statistical Significance Testing
The significance of pairwise PAC values can be determined by surrogate data testing. The surrogate data is generated by following the block swapping procedure suggested in [26]. In this approach, a surrogate amplitude time series was generated by splitting the amplitude time series at a random point and swapping the two obtained time series. This surrogate amplitude time series was then associated with the original phase time series to compute PAC values. This procedure was repeated 100 times to generate 100 random time averaged surrogate PAC values for each frequency pair. Using these values, a threshold value, Th i,j (f p , f a ) is obtained at the 95% confidence interval for each low frequency (f p ) -high frequency (f a ) combination between phase channel i and amplitude channel j. Time averaged PAC values that surpass this threshold value are considered significant for that low-high frequency pair and used for constructing the PAC network.
We also obtain a threshold Th i, j F p , F a , for all 1 ≤ i, j ≤ N channel pairs and low and high frequency band pairs by averaging all the threshold values within those frequency bands, where F p is the low frequency band of interest, F a is the high frequency band of interest, |F p | is the number of frequencies in the low frequency band F p and |F a | is the number of frequencies in the high frequency band F a . In this paper, we focus on four different low frequency bands (F p s), delta (1-3 Hz), theta (4-7 Hz), alpha (8)(9)(10)(11)(12), and beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) Hz) that may modulate the amplitude of the high frequency (F a ) band gamma (31-100 Hz). After constructing these adjacency matrices for all frequency bands of interest, the goal of multivariate PAC analysis is to detect the spatial and spectral localization of the coupled channel pairs.

C. PAC Tensor Construction Across Frequency Bands and Subjects
Given A F p ,F a , a 4-way tensor ∈ ℝ N × N × M × S is constructed where the first mode corresponds to N phase providing channels, second mode corresponds to N amplitude providing channels, third mode corresponds to the M different low-high frequency band pairs and the fourth mode of the tensor is included to capture the variation of PAC values across S subjects.

D. PARAFAC-Based Multivariate PAC
In this paper, we compare our proposed approach with a previously proposed PARAFAC based multivariate analysis. PARAFAC is a generalization of PCA for multi-way data and has been commonly used to extract information from multi-channel EEG data [27]- [29]. For PARAFAC based multivariate PAC, following [12], a 4-way PARAFAC decomposition is used to express in terms of its factors across each mode. As is non-negative, a nonnegativity constraint was imposed on the PARAFAC factors as follows: i jkl = r 1 R λ r a ir b jr c kr d lr (6) where, a ir ≥ 0, b jr ≥ 0, c kr ≥ 0 and d lr ≤ 0 are the elements of the loading vectors across each mode. The first mode factors a r ∈ ℝ N × 1 provide the spatial loading for the different phaseproviding channels whereas b r ∈ ℝ N × 1 provide the spatial loading for the different amplitude-providing channels. The third mode factors, c r ∈ ℝ K × 1 correspond to the low frequency bands that modulate the high frequency band. The factors for the fourth mode d r ∈ ℝ S × 1 contain the signature for each subject. As described in [12], the peaks in the factors of the first two modes allow us to determine the spatial locations of the phase-and amplitude-providing channels while the peaks in the factors of third mode provide the coupled low-high frequency band pairs. In this analysis, the rank, R was determined using DIFFerence in FIT (DIFFIT) method proposed in [30] as it was reported to be more suitable for offline analysis [31].

E. HoRPCA-Based Multivariate PAC
For matrix or two-way data, the limitations of PCA associated with outliers and non-Gaussian errors have been addressed using robust PCA (RPCA) [32]. In this method, a given matrix is decomposed into a low-rank and a sparse matrix. The extension of RPCA to higher-order data, i.e., tensors, has been proposed recently by Goldfarb and Qin [33]. In HoRPCA, the nuclear rank of a matrix is altered by the Tucker rank (Trank) of a tensor. To generalize RPCA to tensors, Trank and l 0 norms are replaced with their convex surrogates CTrank and l 1 norm. Given a tensor, , the corresponding optimization problem is given as: Munia  min ℒ, CTrank ℒ + λ 1 subject to ℒ + = , (7) where ℒ is the low-rank part of the tensor and is the sparse part of the tensor and 1 : = vec 1 .
In this paper, we propose to employ HoRPCA [34]- [36] to capture the dynamics of multivariate PAC. HoRPCA is applied to to obtain the estimates of low-rank component ℒ and sparse component . The low-rank tensor ℒ accounts for the background or taskindependent PAC network whereas the sparse component represents the task-relevant PAC network.
For computing HoRPCA, we considered the Singleton model proposed in [33], which evaluates the nuclear norm of the tensor as the sum of the nuclear norms of the mode-n unfolding of the tensor, so CTrank L : = i 1 4 L i . HoRPCA with the Singleton lowrank tensor model can be written as the following convex optimization problem: The low-rank component ℒ and sparse component can be extracted from by solving (8) using an alternating direction augmented Lagrangian (ADAL) method [33]. To apply HoRPCA algorithm, we need to tune the regularization parameter λ, which controls the trade-off between the sparse and low-rank parts of the tensor . If λ is small, more data points will be allowed to move from ℒ to , and as λ increases fewer data points will be considered anomalous. For this study, following [33], λ was set as λ ≡ 1

I max
where, For multivariate PAC detection, we are interested in the sparse component , as the nonzero elements of this tensor will provide us the phase and amplitude providing channels. Each row of : , : , m, l will provide the phase providing channels while each column of : , : , m, l will provide the amplitude providing channels for the mth frequency band and lth subject.

IV. DESCRIPTION OF DATASETS AND EXPERIMENTS
The proposed multivariate PAC approach was validated first on multiple synthesized data sets, and then on EEG data collected during a cognitive control study.

A. Synthesized Multi-Channel Data
To generate synthesized EEG data, time series data were created at 2, 004 dipole locations in the brain. These dipole locations were based on the standard MRI brain. EEG data were created by considering the fact that the spectrum of EEG data follow the power law, i.e., higher the frequency, weaker the amplitude ). The rate of decay of the amplitude is defined by the parameter β with β = 0, 1, and 2 indicating white noise, pink noise, and Brownian (red) noise, respectively [37]. The frequency spectrum of the human brain has been reported to be similar to Brownian noise [38], [39], hence EEG data was generated from Brownian noise. Time series data from Brownian noise was generated at each of the 2, 004 uncorrelated dipole locations in the brain at a sampling frequency of F s Hz. Cross-voxel correlations were imposed across all the dipoles by generating a random dipole-to-dipole correlation matrix. The new data matrix was constructed as Y = X T V D 1/2 , where V is the eigenvector, D is the eigenvalue, and X is the data matrix. To project each dipole location to the scalp EEG locations, the forward model was generated by following the openmeeg [40] algorithm implemented in Brainstorm [41].
To introduce phase-amplitude coupling in the simulated data, first the dipole locations corresponding to the high frequency amplitude component and low frequency phase component were chosen. Brownian noise signal x B (t) was generated with a sampling frequency of F s Hz. Low frequency phase signal x f p (t) was generated by bandpass filtering x B (t) at the phase providing frequency f p with a bandwidth of 2 Hz (f p ± 1 Hz). To create the amplitude signal, x B (t) was bandpass filtered at the amplitude providing frequency f a with a bandwidth of 10 Hz (f a ± 5 Hz) to ensure that the high frequency activity is broadband. PAC was generated using the procedure described by Kramer and Eden [42]. The time locations of relative maxima of the phase signal x f p (t) were detected. At each maxima, a DC shifted Hanning window with a duration of 42 ms, was multiplied with the amplitude time series.
The amplitude signal x f a (t) with monophasic coupling was generated at high frequency f a by multiplying the Hanning window with the filtered amplitude time series centered at the relative maxima (peaks) of the phase signal x f p (t). The coupling intensity is controlled by multiplying the Hanning window itself with a multiplier I, where I = 1 indicates full coupling and I = 0 indicates no coupling. Finally, the time series of the chosen dipole locations were replaced with x f p (t) for the low frequency and x f a (t) for the high frequency signal. The whole procedure was repeated d times to generate d distinctly coupled neuronal oscillations between 2d dipole locations (d phase providing dipoles and d amplitude providing dipoles).

B. EEG Data
The EEG dataset used to assess the proposed PAC measure was collected during a cognitive control-related error processing study. The experiment conducted was a letter version of the speeded-reaction Flanker task [43]. The study was designed following the experimental protocol and guidelines approved by the Institutional Review Board (IRB) of the Michigan State University (IRB: LEGACY13-144). The data acquisition was performed following the guidelines and regulations established by this protocol. Written and informed consent was collected from each participant before data collection. A string of five letters, which could Munia  be congruent (e.g., SSSSS) or incongruent (e.g., SSTSS), was exhibited in front of each participant for each trial. The participants have to select the center letter with a standard mouse with respect to the Flanker letters. The trials began with a 35ms of flanking stimuli (e.g., SS SS). Then, the target stimuli were embedded in the center of the flankers (e.g., SSSSS/SSTSS) and remained for 100 ms (total presentation time is 135ms) followed by an inter-trial interval of varying duration ranging from 1200 to 1700 ms.
Continuous EEG responses were recorded using the ActiveTwo system (BioSemi, Amsterdam, The Netherlands) by placing the 64 Ag-AgCl channels following the international 10/20 system. All EEG signals were sampled at 512 Hz. The trials with artifacts were removed, and volume conduction was minimized using the Current Source Density (CSD) Toolbox [44]. The artifact removed data were considered for analysis in this paper. The trials corresponding to the error and correct responses were separated from each other for analysis. Each trial was one second long. A total of 20 participants were considered for the ongoing study. The inclusion criteria were that the number of trials for error response should be at least 20 or higher. Since the total number of correct/error response trials varies for different participants, the number of trials considered for both responses were kept the same for a fair comparison. The correct response trials were randomly sub-sampled from the whole set of correct responses and the whole procedure was repeated 10 times to select the correct response trials.

C. Synthesized Data Experiments
The proposed HoRPCA based multivariate PAC method is first tested on the synthesized data set described in Section IV-A. Experiments were conducted to evaluate the robustness of the proposed method against spurious coupling due to noise, subject variability and linear mixing with respect to existing approaches. In these experiments, HoRPCA based method was compared to three other methods, i.e., simple averaging, matrix factorization based gedCFC [10] and PARAFAC based method [12]. For simple averaging method, we first average the delta-gamma, theta-gamma, alpha-gamma and beta-gamma PAC networks for each subject. We also average the thresholds for delta-gamma, theta-gamma, alpha-gamma and beta-gamma networks obtained from the statistical significance testing described in Section III-B for each subject. To detect the coupled channels, we compare the averaged PAC network with the thresholds for each electrode pair. PAC values for gedCFC were computed following the codes provided in [10]. As gedCFC computes PAC between two pre-defined low and high-frequency bands, we compute gedCFC based PAC individually for each frequency band pair combination. PARAFAC based analysis is conducted as described in section III-D. For both gedCFC and PARAFAC based methods, to differentiate the coupled channel pairs from background PAC values, the outputs of these two methods are compared to the average threshold PAC value Th i, j F p , F a for that channel pair obtained through the statistical significance testing described in Section III-B.
In this section, we conducted two experiments. Experiment 1 focuses on whether the methods can differentiate true coupling from no coupling whereas experiment 2 focuses on the accuracy of the detected channel pairs. The performance evaluation metrics and the description of the experiments are as follows.
Performance evaluation metrics: The performance metrics are:

Precision =
T P T P + F P ,

1) Experiment 1: Evaluation of Detection Power of Different Approaches:
In this experiment, we are interested in quantifying the ability of the different methods to differentiate between multi-channel data with and without PAC for different noise levels. This simulation is conducted to ensure that our method is robust against external noise, e.g., measurement or biological noise, and is able to differentiate true coupling from uncoupled oscillatory activity. Two multi-channel EEG datasets were generated. First dataset was generated with theta-gamma coupling (I = 0.7) between any two randomly selected dipole locations. Signals for amplitude and phase providing dipoles were generated following the procedure described in Section IV-A. For all other dipoles, the signals were time series data generated from Brownian noise. This procedure was repeated 20 times to generate a dataset with 20 subjects with different dipole locations and this dataset was referred to as 'dataset with coupling'. For the second dataset, low and high frequency signals were generated at the same dipole locations as the first dataset but the coupling intensity was set to zero (I = 0) to generate no-coupling condition. This procedure was also repeated 20 times to generate a dataset with 20 subjects and this dataset was referred to as 'dataset without coupling'. The performance of each approach was determined by evaluating whether each method detected any coupled pairs for each subject or not. Based on this evaluation, detection accuracy in terms of precision, recall, F-measure and G-mean is quantified across subjects for varying signal to noise ratio (SNR) levels (−6dB to 6dB). We repeat the whole experiment 20 times to quantify the variability of each method.

2) Experiment 2: Performance Evaluation With Varying Signal Parameters:
We next conduct experiments assessing the accuracy of the detected coupled channel pairs with respect to various signal parameters, including noise, variability in dipole locations and number of coupled channel pairs. As gedCFC cannot detect multiple channel pairs across multiple frequencies, it was not included in these comparisons. For evaluating the performance of the remaining three methods (HoRPCA, PARAFAC and averaging), for a fair comparison, the number of channel pairs detected by each method was fixed as the actual number of coupled pairs in the data.

Robustness to noise:
To evaluate the robustness of the proposed method, we generated synthesized data with additive white Gaussian noise of varying SNR from −6dB to 6dB added to all dipole locations. Assessing the robustness to noise is important as real EEG signals are not simple linear combinations of dipole sources and noise can be generated by both neural and external measurement systems. Eight coupled pairs (5 theta-gamma and 3 alpha-gamma) were generated for each subject between 16 dipole locations (8 phase providing and 8 amplitude providing dipoles) following the steps described in Section IV-A. The projections of these dipole locations are shown in Fig.2. This procedure was repeated ten times to generate a dataset with ten subjects for each SNR level. The whole experiment was repeated 20 times to quantify the variability of each method.

Robustness to subject variability:
For multi-subject analysis, methods such as our method which tries to identify coupled channels across subjects, variability across subjects is natural. In this experiment, we evaluate the performance of the proposed method by introducing variability in the location of the coupled channels to determine the robustness of the method against variability across subjects. First, eight multivariate coupled pairs of oscillations (5 theta-gamma and 3 alpha-gamma) were generated for each subject between 16 locations (8 phase providing and 8 amplitude providing) following the steps described in Section IV-A and for the dipoles illustrated in Fig.2. Using this procedure, a dataset was generated with ten subjects. SNR was fixed at 6dB for all subjects. Variability was introduced by changing 5% to 25% of the dipole locations with a step size of 5% for different datasets. We repeated the whole experiment 20 times to quantify the variability of each method.

Robustness to volume conduction:
The performance of the proposed measure was also evaluated with increasing number of coupled channel pairs to determine the robustness of the proposed method against volume conduction. Volume conduction is very commonly encountered in EEG and refers to the phenomenon where nearby electrodes are highly correlated with each other due to the conductivity of the scalp [45]. This is important as the number of coupled dipoles increases there will be more interference across channels which may make it harder to detect the truly coupled channels. Synthesized datasets (10 subjects in each dataset with the same number of coupled pairs) were generated by varying the number of coupled channel pairs from 4 to 20 following the steps described in Section IV-A. SNR was fixed at 6dB for all subjects. 75% of the coupled channel pairs were theta-gamma coupled while 25% were alpha-gamma coupled. The whole experiment was repeated 20 times to quantify the variability of each method.

1) Detection of PAC Networks:
Using the average PAC within a frequency band and time window of interest (MI) defined in section III-C, pairwise PAC values were computed across all 64 electrodes for delta-gamma, theta-gamma, alpha-gamma and beta-gamma frequency band combinations for each subject and response type, error and correct. As previous studies indicate increased synchronization associated with the ERN in the time window 0-100 ms [46], the connectivity matrices were constructed by averaging the time- The detected channel combinations for error (X er ) and correct (X cr ) responses are referred to as PAC networks for error and correct responses, respectively.

2) Classification Based on Detected PAC Network:
A classification experiment was conducted to determine if the detected multivariate PAC networks can be used as features to discriminate between error and correct responses. As the proposed method uses the sparse part of the tensor for classification and since this sparse part is data dependent, the classification was performed by following the sparse representation classification (SRC) approach described in [47], [48].

end
We compare the classification performance of HoRPCA with simple averaging, gedCFC and PARAFAC based methods. As simple averaging and gedCFC methods are not data dependent, to classify the error and correct responses for these two methods, we employ simple NN classifier without the SRC based algorithm. For simple averaging method, the classification was performed using 5-fold cross-validation with an NN classifier. The feature matrices in this case are the 64 × 64 PAC networks averaged across frequency bands corresponding to each subject and response type. The classification for gedCFC was also performed by using 5-fold cross-validation with NN classifier. In this case, the features are the multivariate PAC values for the theta-gamma band extracted by gedCFC. For PARAFAC based method, as it is also data dependent similar to HoRPCA, the classification was performed following the SRC based NN-classifier approach described in Algorithm 1. For all cases, the performance was evaluated in terms of precision, recall, F-measure and Gmean.

1) Results for Experiment 1:
The evaluation performance of the detection power for the four methods for various SNR values is reported in Table I. From Table I, it can be seen that the proposed HoRPCA based PAC approach outperforms gedCFC in all cases and can detect the presence of PAC even for highly noisy data (−6dB) with a F-measure close to 0.85. The detection power of PARAFAC based approach was also high and comparable to the proposed HoRPCA based approach. The high performance exhibited by the HoRPCA and PARAFAC based methods indicates that the tensor based multi-way analysis takes full advantage of the multi-linear structure of the data, which improves the classification performance compared to matrix factorization-based methods like gedCFC. Averaging the networks has the worst performance (i.e., performance around the chance level) for all SNR levels indicating loss of information due to averaging.

2) Results for Experiment 2:
The performance of the three approaches for various SNR levels, variability in dipole locations and number of coupled channel pairs are depicted in Tables II, III and IV, respectively. From Tables II, III and IV, we can see that the proposed HoRPCA based method outperforms PARAFAC and the averaging based methods in all cases. HoRPCA detected the coupled channels correctly with more than 80% precision and recall even for high noise levels, e.g. −6dB, (Table II). HoRPCA is also shown to be more robust against subject variability (Table III). It is also less sensitive to the number of coupled channel pairs (Table IV). While PARAFAC based multivariate PAC detection performs well for low noise levels and less variability in the data, its performance deteriorates quickly as the uncertainty in the data increases. For example, PARAFAC is unable to detect the correct coupled channel pairs in the presence of variability across subjects. This can be explained by the fact that PARAFAC based method detects the channel pairs which are common across all subjects. In all cases, averaging the networks yielded the worst performance and was found to be more sensitive to different signal parameters. This is due to the fact that averaging cannot capture variability across subjects and frequency bands. The standard deviations reported in these Tables reflect that the variability of the HoRPCA method is smaller than others across multiple realizations of the experiments. This indicates the stability of the proposed method.

1) PAC Network Detection Result:
The low-rank and sparse networks extracted for the theta band of error and correct responses averaged across subjects is shown in Fig.3. This figure shows that the low-rank parts of the tensor for both response types are similar, while the sparse parts illustrate the differences in the coupled brain regions for the two response types.
This observation is also verified by statistical testing. A Wilcoxon Signed Rank Sum Test (p < 0.05) with Bonferroni correction is conducted to determine the statistical significance of the difference between ERN and CRN PAC networks across subjects. The statistical testing revealed no significant difference between the low rank networks but significant difference was observed between sparse networks. We constructed a p-value PAC network that shows the channel pairs with significant difference between the sparse networks corresponding to error and correct responses. The p-value PAC network is depicted in Fig.4.
From the extracted sparse networks, we detected the channel combinations for error (X er ) and correct (X cr ) responses as shown in Fig.5 (a) and (b), respectively. The colormap shows the number of subjects for which a channel combination was detected. The yellow pixels indicate channel pairs with coupling across the majority of subjects, while the light blue pixels correspond to channel pairs with coupling for only a few number of subjects.
The channel pairs (phase-amplitude coupled channels) detected for a majority of subjects for error response are:  Table V shows the comparison of classification performance for 50 repetitions of the 5-fold cross-validation in terms of precision, recall, Fmeasure and G-mean. From this Table, it can be seen that the proposed HoRPCA based multivariate method results in very high accuracy (F-measure and G-mean > 0.98) compared to averaging (F-measure and G-mean close to 0.70) and matrix factorization-based gedCFC (F-measure and G-mean close to 0.85). The discrimination power of PARAFAC was slightly lower than the proposed method. The increased discrimination power of the tensor based methods indicates that the tensor based multivariate t-f PAC approaches are more effective at identifying the brain regions central to PAC.

VI. DISCUSSION
In this paper, a HoRPCA based approach was proposed to identify the spatial and spectral components of multivariate PAC across all channels, frequency bands, and subjects. As illustrated in Fig.5, HoRPCA can determine the coupled channel pairs directly from the row and column indices of the non-zero elements of the sparse tensor component for a given frequency band. This approach provides certain advantages over PARAFAC decomposition of the same tensor. First, with the PARAFAC model, order selection is an open problem. While the first factor across each mode can be used to identify the coupled channel pairs and frequency bands, these factors are only effective at capturing the largest variance in the data and are not necessarily robust to outliers and variations in the data such as the ones considered in Tables II, III and IV. Second, PARAFAC identifies the amplitude and phase providing channels individually rather than identifying the actual channel pairs. This results in P × Q possible channel pairs, where P is the number of amplitude providing channels and Q is the number of phase providing channels. In order to determine the actual coupled channel pairs, one needs to do significance testing which is computationally expensive. Estimation of coupled channel pairs from the indices of the sparse matrix in the HoRPCA method overcomes these limitations. Based on the F-measure values reported for various simulations, HoRPCA based method outperforms PARAFAC and averaging based method and is more robust to varying signal conditions such as noise level, subject variability and number of couplings. HoRPCA is also more robust to noise compared to gedCFC indicating the advantage of tensor representation with respect to matrix factorization based methods (Table I).
In the analysis of EEG data, the proposed approach detected event-related theta-gamma PAC during error and correct responses with higher PAC for error response. This finding is consistent with prior studies where theta-gamma coupling was reported during visual tasks like working memory processing and serial memory recall [9]. Theta-gamma PAC was also reported in an error processing MEG study [18] and in error/correct trial learning [49]. A possible explanation for the presence of higher theta-gamma PAC during error response is that error trials may reflect a miscoding of information, which leads to a large-scale functional integration across theta-gamma frequency bands to improve the performance after an error response [49].
Cross-frequency networks extracted through HoRPCA are sparse networks as the background couplings are discarded (Fig.3, Fig.6). Analysis of the low-rank parts of the tensors for both error and correct responses showed that there was no significant difference between the two response types based on solely the low-rank tensor (Fig.3). This observation justifies our hypothesis that the low-rank part of the tensor captures background PAC activity while the sparse part corresponds to response-evoked PAC. Observation of the detected PAC networks shows that the networks corresponding to the correct response were concentrated between frontal theta phase providing channels and parietal gamma amplitude channels. On the other hand, networks for error response were concentrated between frontalcentral theta phase activity and parietal gamma activity. Comparison of ERN and CRN networks through statistical significant testing revealed that the difference was centered around the medial frontal cortex, which has been previously reported to be active during error processing [18]. Prior studies also hypothesized that error-related negativity initiates the medial frontal based top-down control mechanisms to improve the performance after an error response [50], [51]. Thus, the PAC networks extracted by the proposed HoRPCA approach are consistent with previous literature reflecting higher theta-gamma coupling in the medial frontal cortex and relating this with error-related negativity. The high classification accuracy achieved by HoRPCA based multivariate PAC also indicates that multivariate PAC can be successfully employed to characterize cross-frequency connectivity networks associated with error and correct responses.
Although the proposed multivariate approach is promising, there are some limitations. First, the multivariate analysis was conducted by constructing the network with amplitude extracted only from the gamma frequency band. The low-high frequency band combinations were chosen based on the literature indicating the presence of PAC mainly between the low frequency phase (typically 5-12 Hz) and high frequency amplitude (typically 30-100Hz) [9]. However, other low-high frequency combinations such as delta-theta/alpha/beta, thetaalpha/beta and alpha-beta can also be studied by adding those networks in . Another limitation is that the multivariate measure focuses on the average PAC values across a predetermined time window. However, there is growing empirical evidence that PAC is dynamic, varying across time. Consequently, future work could include extending the proposed measure to capture the temporal and spatial dynamics of PAC.

VII. CONCLUSION
In this study, we proposed a HoRPCA based multivariate PAC framework to capture the dynamics of cross-frequency PAC across various brain regions and frequencies. The empirical results obtained from the analyses of simulated and EEG data supported our hypothesis that the sparse tensor extracted through HoRPCA can capture the spectral and spatial dynamics of event-related multivariate PAC network while discarding the background perturbations as part of the low-rank tensor. With these unique properties, the proposed HoRPCA based multivariate approach can lead to obtaining a complete understanding of connectivity across frequency bands and brain regions.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.
This work involved human subjects or animals in its research. Approval of all ethical and experimental procedures and protocols was granted by The Institutional Review Board at Michigan State University under Approval No. LEGACY13-144.

Fig. 1.
Illustration of the proposed method: (a) Generation of the high-order tensor by concatenating the N × N PAC networks across ℳ various frequency bands and subjects.
(b) Low-rank (ℒ) and sparse tensor ( ) decomposition of through HoRPCA. Munia     Low-rank and sparse networks extracted for the error and correct responses averaged across subjects. (a) Average low-rank PAC network for error response; (b) Average low-rank PAC network for correct response; (c) Average sparse PAC network for error response; (d) Average sparse PAC network for correct response. Munia   The detected channel combinations across all subjects for (a) error response (X er ); and (b) correct response (X cr ) for theta frequency band. Munia   Theta-gamma PAC networks detected for (a) Error response and (b) Correct response. The color scale from white to black indicates the number of subjects for which a channel pair was detected. The arrows originate from phase providing channel. Munia