Shift-Invariant Canonical Polyadic Decomposition of Complex-Valued Multi-Subject fMRI Data With a Phase Sparsity Constraint

Canonical polyadic decomposition (CPD) of multi-subject complex-valued fMRI data can be used to provide spatially and temporally shared components among groups with both magnitude and phase information. However, the CPD model is not well formulated due to the large subject variability in the spatial and temporal modalities, as well as the high noise level in complex-valued fMRI data. Considering that the shift-invariant CPD can model temporal variability across subjects, we propose to further impose a phase sparsity constraint on the shared spatial maps to denoise the complex-valued components and to model the inter-subject spatial variability as well. More precisely, subject-specific time delays are first estimated for the complex-valued shared time courses in the framework of real-valued shift-invariant CPD. Source phase sparsity is then imposed on the complex-valued shared spatial maps. A smoothed $\ell _{\mathbf {{0}}}$ norm is specifically used to reduce voxels with large phase values after phase de-ambiguity based on the small phase characteristic of BOLD-related voxels. The results from both the simulated and experimental fMRI data demonstrate improvements of the proposed method over three complex-valued algorithms, namely, tensor-based spatial ICA, shift-invariant CPD and CPD without spatiotemporal constraints. When comparing with a real-valued algorithm combining shift-invariant CPD and ICA, the proposed method detects 178.7% more contiguous task-related activations.


Shift-Invariant Canonical Polyadic Decomposition of Complex-Valued
Multi-Subject fMRI Data With a Phase Sparsity Constraint

I. INTRODUCTION
T ENSOR decomposition applied to multi-subject functional magnetic resonance imaging (fMRI) data is of growing interest, as such approaches enable us to make use of the multiway structure of fMRI data in terms of space, time and subject/trial [1]- [3]. Canonical polyadic decomposition (CPD), also called CANDECOMP or parallel factor analysis (PARAFAC), is a common method for tensor decomposition [4], [5] because it can provide a unique decomposition under mild conditions [6], [7]. CPD has been widely applied to multi-trial/subject fMRI data since 2004 to extract shared spatial maps (SMs), shared time courses (TCs), and trial/subject-specific intensities [1], [2], [8]- [11]. CPD is a complementary method to independent vector analysis (IVA), which emphasizes the capture of inter-subject variability and the decomposition of multi-subject fMRI data into individual SMs and TCs [12].
CPD was first applied to the analysis of fMRI data in [1]. The multi-trial fMRI data from a single subject generally accords with the CPD model that decomposes a K -way tensor into a linear combination of a series of rank-one tensors, and each rank-one tensor is the out product of K loading vectors. By contrast, multi-subject fMRI data tend to violate the CPD model since the data involve larger inter-subject spatial and temporal variability than multi-trial fMRI data. For this reason, some constraints on spatial and temporal modalities were imposed to make the CPD model more rigid. Most This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ studies have focused on imposing spatial constraints on CPD. Beckmann and Smith proposed a tensor-based spatial independent component analysis (ICA) (T-sICA) algorithm, which imposed spatial independence on the SMs of multi-subject fMRI data via ICA, followed by a rank-one approximate of CPD to obtain shared TCs and subject-specific intensities [2]. Zhou and Cichocki proposed a novel CPD algorithm incorporating a priori knowledge about the components in one mode, thus the spatial independence can be readily constrained by extracting a factor matrix using ICA [10]. De Vos et al. also proposed a CPD algorithm where independence is imposed on the factors in one mode [13]. This algorithm may require heavy computation in the analysis of multi-subject fMRI data due to working with the covariance matrix and fourth-order cumulant tensor.
In addition to the spatial variability, time shifts occur naturally in multi-subject fMRI data and even in multitrial fMRI data from an individual, due to the hemodynamic delay. As such, Mørup et al. extended the CPD model to incorporate time shifts, called shift-invariant CPD, and then applied the algorithm to detect variable latencies in multi-trial fMRI data from one single subject [8]. The shift-invariant CPD employs a frequency-domain method to estimate time shifts since the exhaustive searching strategy is time-consuming in the time domain when many time delays are involved [14]. The frequency-domain method estimates more accurate time delays and uses less computation time than the time-domain algorithm [8]. Under the same experimental conditions, the inter-subject temporal variability from multisubject fMRI data tends to be larger than the inter-trial temporal variability from single-subject fMRI data, due to different response times or hemodynamic delays for different subjects. As such, we leveraged both inter-subject spatial and temporal variability and proposed an improved method combining ICA and shift-invariant CPD (named ICA-sCPD) for analyzing multi-subject fMRI data. The results showed that our method outperformed T-sICA and shift-invariant CPD [11].
To our best knowledge, CPD has not yet been applied to complex-valued multi-subject fMRI data. Motivated by the benefits of using both magnitude and phase information in fMRI data [15]- [20], we propose to exploit the phase in a CPD algorithm to take advantage of the full brain information contained in complex-valued fMRI data. Different from spatial independence, we propose the incorporation of a spatial sparsity constraint on CPD. The consideration is that the spatial sparsity is in line with the intrinsic characteristic of brain activation [21]- [23], and sparse representation utilizing the SM sparsity exhibited promising results in the analysis of magnitude-only fMRI data [24]- [29]. Because magnitudes of complex-valued SM components are highly noisy, we seek to utilize the small phase characteristic of blood oxygenation-level dependent (BOLD)-related voxels. Specifically, BOLD-related voxels show small phase changes within the range of [−4/π, 4/π], while unwanted voxels tend to have large phase changes relative to a baseline phase image ([−π, −4/π) and (4/π, π]) [15]. We call the phase map of the complex-valued SM component the spatial source phase, to distinguish it from the observed phase data. In addition, time shifts are still used to incorporate a temporal constraint. Simulated fMRI data with different SM and TC changes and noise levels as well as experimental fMRI data are used to examine the performance of our proposed method.
To summarize, our contributions are as follows: (1) We propose to exploit the spatial source phase in the CPD algorithm with spatiotemporal constraints for decomposing complex-valued multi-subject fMRI data into shared SMs, shared TCs, and subject-specific time delays and intensities.
(2) We propose to use the sparsity of the spatial source phase to constrain the complex-valued shared SM in the CPD algorithm. After performing phase de-ambiguity for complexvalued SM estimates, the voxels with larger phase values are reduced based on the smoothed 0 norm [31].
(3) We derive the update rule to estimate subject-specific time delays for complex-valued shared time courses in the framework of a real-valued shift-invariant CPD algorithm [8].
The remainder of this paper is organized as follows. Section II describes the details of our proposed method. Section III introduces the simulated and experimental fMRI dataset as well as the performance indices for algorithm evaluation. Section IV gives the experimental results to exhibit the advantage of our proposed method. A discussion is included in Section V.
II. METHODS In this section, we first present the complex-valued, shiftinvariant CPD algorithm (named csCPD). We then impose a phase sparsity constraint on the shared SMs of the csCPD (named pcsCPD) to form our proposed method. Finally, we present a detailed implementation of the proposed pcsCPD algorithm.

A. Complex-Valued Shift-Invariant CPD Algorithm
1) The Model: Let the three-way (voxel × time × subject) multi-subject fMRI data be X = {x v, j,k } ∈ C V ×J ×K , where V is the number of in-brain voxels, J is the number of time points for each subject, K is the number of subjects, v = 1, · · · , V , j = 1, · · · , J , and k = 1, · · · , K . Assume N is the number of components, and S [s 1 , · · · , [c 1 , · · · , c N ] = {c k,n } ∈ C K ×N (n = 1, · · · , N) correspond to shared SMs, shared TCs, and subject-specific intensities, respectively. The model for csCPD is as follows [8], [11]: where T = {τ k,n } ∈ R K ×N is the time delays for N components of K subjects; b n ( j − τ k,n ) denotes b j,n with the time delay τ k,n , and τ k,n is the time delay for component n of subject k. We, thus, have the shared TCs with time delays for T is obtained by cyclic left shifting b n with τ k,n points, otherwise it is obtained by cyclic right shifting b n with τ k,n points. E = {e v, j,k } ∈ C V ×J ×K is the residual reflecting the intersubject variability effect.
Performing a discrete Fourier transform on (1), we have the model for the csCPD algorithm in the frequency domain, as follows: where f = 1, · · · , F is the frequency point, and here, In the csCPD algorithm, the loading matrices S, B, C and T are unknown. Thus, we randomly initialize S, B, C, and T , and then update these four loading matrices in each iteration until convergence or reaching the number of maximum iterations. We first present update rules for S, B, and C, and then derive the update rule for T in the framework of the real-valued shift-invariant CPD method [8].
2) Updates of S, B, and C: Assume vectors s v , x (1)v and e (1)v are constructed by the vth row of S, X (1) (2) f by the f th row ofB, X (2) ∈ C F ×V K , andẼ (2) ∈ C F ×V K ; and c k , x (3)k and e (3)k by the kth row of C, X (3) ∈ C K ×J V , and E (3) ∈ C K ×J V , respectively. Based on (1) and (2), the csCPD model can be further rewritten as follows: x (2) where Z ∈ C J K ×N is the aggregating mixing matrix and its element is z j +k( J -1),n = c k,n b n ( j − τ k,n ), i.e., the Khatri-Rao product between C and B (k) (k = 1, · · · , K );C ( f ) = C · exp{−i 2π( f − 1)T /J }, and '·' is the dot product. Thus, the alternative least squares (ALS) updates of S, B and C can be derived from (3) to (5) [8] as follows: where S and C are updated in the time domain, while B is updated in the frequency domain.
3) Update of Subject-Specific Time Delays T : As stated earlier, T = {τ k,n } ∈ R K ×N totally includes K × N time delays. Each of the time delays is optimized using the rule of ALS, i.e., the remaining time delays are fixed when a time delay is updated. Next, we take an exampling time delay τ k,n for component n (n = 1, · · · , N) of subject k (k = 1, · · · , K ) to derive the update rule.
Referring to (5), the update of the time delay τ k,n minimizes the following least squared error: Because the shared SMs S ∈ C V ×N is not relevant to T , and the number of in-brain voxels V is very large (usually larger than 5×10 4 ), we project (3) into the S subspace to reduce the dimensionality of the tensor as follows: The size ofX is significantly smaller than that of X ∈ C V ×J ×K , as N V . As a result, the estimate of T is accelerated and the computation memory is saved.
With the new tensorX, we rewrite (5) as follows: We then have the following vector presentation for the Khatri-Rao product between B (k) andŜ: To explicitly show the time delay τ k,n , we rewrite (9) based on (12) as follows: Let q (k) n =x (3)k − n =n c k,n (b (k) n ŝ n ) denote the remaining signal when projecting all but component n (n = 1, · · · , N) out ofx (3)k , thus (13) becomes the following: In the square brackets of (14), the first term does not contain the information of τ k,n , and the second term does not vary with τ k,n , as b (k) n is cyclic shifted based on the time delay τ k,n . Thus, the update of the time delay τ k,n maximizes the sum of the third and fourth terms as follows: By expanding each term in (15) to extract the time delay, τ k,n is finally obtained by the following: The detailed derivation of (16) from (15) can be found in the supplementary materials. 1 The time delay we estimate here is an integer. A noninteger time delay can also be estimated using a gradient based iteration update, but this can result in a local minima [30] and, therefore, be time-consuming.

B. Phase Sparsity Constraint on Shared Spatial Maps S
In the above-described csCPD algorithm, the shared SMs S is estimated using (6). However, each component of S includes a large number of high-magnitude unwanted voxels due to the high noise nature of complex-valued fMRI data [15]. As such, we propose the addition of a sparsity constraint on S to remove these unwanted voxels.
The sparsity constraint is usually imposed on the magnitude of a complex-valued signal. However, the high-magnitude, unwanted voxels involved in the shared SM components lead to the failure of placing a sparsity constraint directly on the magnitude. For this reason, we propose to impose sparsity on the phase of shared SMs according to the small phase characteristic of BOLD-related voxels [15]. To achieve more direct and efficient effects on the newest estimate of shared SMs, the phase sparsity constraint is performed as a sequential update for S, instead of being incorporated into the csCPD algorithm given in (1). Next, we obtain the corrected phase of shared SMs via phase de-ambiguity; we then present the model of phase sparsity constraint based on the smoothed 0 norm and the second update rule of S. 1) Phase de-Ambiguity: Given a shared SM component s n = [s 1,n , · · · , s V ,n ] T of S (n = 1, · · · , N), its associated TC estimate is z n , the nth column of the aggregating mixing matrix Z. The rotation angle θ n for s n is determined by maximizing the power of the real part of the rotated z n as follows [15]: The phase without ambiguity (i.e., spatial source phase) for s n is then obtained as follows [15]: Note that the phase de-ambiguity in (18) does not change the magnitude is capable of identifying the BOLD-related voxels (with smaller phase θ(s v,n ) ∈ [−4/π, 4/π]) from the unwanted voxels (with larger phase θ(s v,n ) ∈ [−π, −4/π), or θ(s v,n ) ∈ (4/π, π]). As such, the phase sparsity constraint can be imposed on the voxels with larger θ(s v,n ) in the shared SMs.
2) The Phase Sparsity Constraint Model: Given s v = [s v,1 , · · · , s v,N ] estimated by (6), we use a smoothed 0 function to minimize large-phase voxels in s v with an additional regularization term satisfying (3), as follows: where and where θ th n denotes a threshold of θ(s v,n ) ∈ [0, π] for adding a sparsity constraint on the unwanted voxels in s n . Although larger phase changes of the unwanted voxels account for theoretically 75% ( θ(s v,n ) ∈ (π/4, π]) and actually 60-70% of the full phase changes [15], a smaller percentage of the voxels should be removed in the separation process to retain more BOLD-related voxels. Here, we select θ th n as the phase value segmenting the largest V /3 values of θ(s v,n ) . As a result, only 33% of the total voxels are gradually removed in the separation process via phase sparsity. (19) is equivalent to the minimization of the 0 norm for a sufficiently small σ . When 3) Update of S Based on Phase Sparsity: For the model of the phase sparsity constraint given in (19), we use the steepest descent method to minimize the smoothed 0 function on the feasible set satisfying x (1) The update rule for s v is obtained as follows: where and λ is a positive step size.
In summary, the proposed algorithm updates S twice by using (6) and (22), correspondingly. To escape from local minima and singular values, σ in (24) should be slowly decreased [31], [32]. We let σ iter = γ σ iter-1 , where 'iter' denotes the iteration index, γ is the decrease rate, and 0.9 < γ < 1 (we set γ = 0.99). The update order for each iteration is B, T , S, and C. A detailed procedure for the proposed algorithm is described in Table I.

III. EXPERIMENTAL METHODS
To evaluate the efficacy of the proposed pcsCPD method, we carry out extensive experiments using both simulated and experimental complex-valued fMRI data, with comparison to the following three algorithms: (1) csCPD (without a spatial constraint, see subsection II.A); (2) CPD (without spatiotemporal constraints) using the COMFAC algorithm, which is a fast implementation of trilinear ALS [33]; and (3) T-sICA using the complex-valued entropy bound minimization (EBM) algorithm [34] to perform ICA. Each algorithm is repeated 20 times for each case. The estimated integer time delays range from -10 points to 10 points. We test σ 0 values from 0.5 to 20 and λ values from 0.1 to 5 according to the suggestions in [31], [32] and finally select σ 0 = 1 and λ = 0.5 for the simulated fMRI data, as well as σ 0 = 2 and λ = 4 for the experimental fMRI data. In addition, we choose ε iter_min = 10 −6 , ε iter_min = 10 −4 and iter max = 500 for csCPD, CPD, and pcsCPD.
For each of shared SM estimate, i.e., s n , phase de-ambiguity is first performed using (17) and (18) to obtain θ(s v,n ), phase de-noising is then performed based on θ(s v,n ). Specifically, a binary mask m n = {m n (v)}, v = 1, · · · , V , is generated as follows: 1, if |θ(s v,n )| ≤ π/4 and |s v,n | ≥ 0.5 0, otherwise The phase-denoised s n is obtained by masking the following: Each shared TC estimate b n is also phase corrected. Replacing z n in (17) with b n to estimate its rotation angle θ n , we obtain the phase corrected TC as follows: The denoised SMs s n by (26) and the phase-corrected TCs b n (n = 1, · · · , N) are used for evaluating the performance of the proposed algorithm. When comparing with magnitude-only fMRI data analysis, we select the real-valued ICA-sCPD algorithm, which simultaneously utilizes spatial independence and temporal delay constraints [11]. Comparisons with real-valued sCPD [8] and CPD are provided in the supplementary materials. 1 The ICA in ICA-sCPD is selected as the fastICA algorithm in deflation mode with a "tanh" nonlinearity function. The stopping criterion is set to less than 10 −6 or a maximum iteration of 1000.

A. Simulated fMRI Data
We generate simulated complex-valued fMRI data sets with K = 10 subjects, and each subject includes eight fMRI-like complex-valued sources (http://mlsp.umbc.edu/ resources.html). One is task-related (S1), two are transiently task-related (S2, S6), and five are artifact related (S3, S4, S5, S7, and S8). Each simulated source is a spatial map of V = 60 × 60 voxels with a time course of J = 100 time points. Reshaping each spatial map into a one-dimensional vector (8 × 3600) and mixing with eight time courses (100 × 8), we obtain simulated complex-valued fMRI data sets (3600 × 100 × 10). To simulate the experimental fMRI data with spatial and temporal variability, we further change the above-generated ideal SMs and TCs by adding various spatial changes and different time delays for each subject. Specifically, we randomly decrease the activated SM voxels in the range of 0% (no change), 10%, 20%, and 30% (denoted as s = 0, 1, 2, 3), with respect to the original dataset. As the periods of the task-related and transiently task-related TCs are 25 points and 20 points, respectively, we randomly shift the time courses in the range of 0%, 2%, 4%, 8% and 10% (denoted as t = 0, 1, 2, 3, 4), with respect to the total points of 100. Thus, the time delay change is within [-10, 10]. Moreover, we add complex-valued Gaussian noise to the simulated fMRI data with different signal-tonoise ratio (SNR) levels from -20 dB to 20 dB with a 2.5 dB interval (a total of 17 different levels of SNR). The SNR is defined as 20 lg(σ s /σ n ), whereσ s andσ n are the temporal standard deviations of the source signal and Gaussian noise, respectively. We set the number of components for each algorithm to be the true number of components N = 8.
We use the simulated fMRI data to evaluate the effects of spatial and temporal changes and the noise effect on pcsCPD, csCPD, CPD, and T-sICA.

B. Experimental fMRI Data
In this study, we use complex-valued fMRI data sets from 10 subjects performing a finger-tapping motor task while receiving auditory instructions [15], [35]. All participants signed IRB-approved informed consent at the University of New Mexico. The experimental paradigm is a block design with alternating periods of 30 seconds on (finger tapping) and 30 seconds off (rest). The experiments were performed with a 3T Siemens TIM Trio system with a 12-channel receive coil. The fMRI experiment used a standard Siemens gradientecho EPI sequence modified to store real and imaginary data separately. The following parameters were used: fieldof-view = 24 cm, slice thickness = 3.5 mm, slice gap = 1 mm, number of slices = 32, matrix size = 64 × 64, TE = 29 ms, TR = 2 s, and flip angle = 70 degrees. The Statistical Parametric Mapping (SPM) software package was used to perform the preprocessing of the data. The magnitude datasets were coregistered to compensate for movements in the fMRI time series images. The images were then spatially normalized into standard Montreal Neurological Institute space. Following spatial normalization, the data (real and imaginary) were slightly resampled using trilinear interpolation, resulting in 53 × 63 × 46 voxels. Motion correction and spatial normalization parameters were calculated from the magnitude data and applied to the phase data. Both the real and imaginary images were then spatially smoothed with a 10 × 10 × 10 mm 3 full-width-at-half-maximum Gaussian kernel.
Prior to the analysis, we filtered the experimental fMRI data by an 8-150 mHz band-pass filter [36], [37]. The model order of each algorithm was obtained by first estimating the model order for each single subject of fMRI data using the minimum description length (MDL) criterion with a subsampling scheme [40] and then averaging over 10 subjects. The final model order was set to 50, which is consistent with our previous study [35].

C. Performance Indices
We calculate (1) the absolute correlation coefficient |ρ c | between the estimates of the shared SM and TC, the subjectspecific time delay and intensity and their corresponding references; and (2) the mean and standard deviation of |ρ c | over all 20 runs for each case, to evaluate the algorithm performance for both the simulated and experimental fMRI data.
The simulated fMRI data includes references as the ground truth. For the experimental fMRI data, we use the model TC as the reference for the task-related TC, which is acquired by convolving the stimuli with the canonical SPM hemodynamic response function; and we use a group general linear model (GLM) map as the reference for the task-related SM. The GLM reference is specifically obtained by performing one sample t-test ( p < 0.05) on the single-subject GLM results [11], [15]. The time delay reference is the estimates corresponding to the largest |ρ c | values between the lag-shifted TC estimates of complex-valued EBM and the model TC. The subject intensity is measured via the similarity between the proposed method and the other algorithms in terms of |ρ c |.
In addition, the quality of the phase of the SM and TC estimates is evaluated by the performance indices described above for the experimental fMRI data. Since the SM phase has frequently varying polarity, we use the source phase mask m n given in (25) to calculate the correlation with the GLM reference.

A. Simulated fMRI Data
We first compare the proposed pcsCPD with csCPD, CPD and T-sICA under 20 cases of spatial and temporal changes (s = 0, 1, 2, 3; t = 0, 1, 2, 3, 4), and take SNR = -10 dB, 0 dB and 10 dB as examples to test the noise effects. Fig. 1 shows the results. The average |ρ c | over all eight components is analyzed and denoted as |ρ c |. We observe from Fig. 1 that our proposed algorithm yields the highest means and the lowest standard deviations of |ρ c | for all cases when compared with the other three algorithms. CPD can obtain better performance for the case of s, t = 0 where the simulated fMRI data conforms to the CPD model, but it sharply degrades with the increase in s and t.
We examine the effects of spatial and temporal changes on the algorithms by considering the case of SNR = 10 dB, as shown in Fig. 1(1). The proposed approach exhibits the slightest effects, whereas CPD shows the most severe effects in terms of both the means and standard deviations of |ρ c |. As expected, csCPD without a spatial constraint demonstrates larger degradation with the increase in s than in t. In contrast, T-sICA without a temporal constraint illustrates a larger degradation with the increase in t than in s. Generally speaking, T-sICA is better than csCPD in detecting SM, while csCPD is better than T-sICA in detecting TC.
We investigate the effects of the noise levels on the algorithms by decreasing the SNR from 10 dB to 0 dB and -10 dB, see Figs. 1 (2) and (3). T-sICA shows considerable degradation for SNR = -10 dB, while the other three algorithms exhibit reasonable decreases in the means and standard deviations of |ρ c | for the shared SM and TC estimates. Fig. 2 includes t-values calculated by a two-sample t-test for evaluating the difference in the mean |ρ c | for all 20 cases of spatial and temporal changes under SNR = -10 dB, 0 dB and 10 dB, as displayed in Fig. 1, between the proposed pcsCPD and each of the csCPD, CPD and T-sICA. The t-values range from 3.913 to 140.812, all of which are larger than 2.093 ( p-value = 0.05, degree of freedom = 19). This result verifies that the proposed approach achieves significant improvements over csCPD, CPD and T-sICA.
We further present the results of the noise influence from -20 dB to 20 dB (17 levels of SNR) in two example cases, a smaller spatial and temporal change case (s = 1, t = 1) and a larger spatial and temporal change case (s = 2, t = 3). Fig. 3 includes the means and standard deviations of the |ρ c | values for all four algorithms. The proposed algorithm exhibits the highest mean |ρ c | values with smaller standard deviations for the two cases. CPD and T-sICA alternatively obtain the lowest mean |ρ c | values from high to low noise levels. The csCPD algorithm obtains moderate performance among these four algorithms.
For the larger spatiotemporal change (s = 2, t = 3), the distinction between the algorithms becomes wider than the smaller spatiotemporal change (s = 1, t = 1). T-sICA benefits from incorporating spatial independence in estimating the shared SMs (see Fig. 3B(1)), whereas csCPD benefits from incorporating the time delay in estimating the shared TC (see Fig. 3B(2)). The t-value of the two-sample t-test ranges from 5.927 to 16.335, all of which are larger than 2.120 ( p-value = 0.05, df = 16), as displayed in Fig. 4, confirming the significant improvement of the proposed pcsCPD over csCPD, CPD and T-sICA.
Finally, we compare the means and standard deviations of the |ρ c | values for the shared SM and TC, and the subject-specific time delay and intensity estimates obtained in two example cases, i.e., an easy case s = 1, t = 1,   4. The t-values of the two-sample t-test for evaluating the difference in the mean |ρ c | values for the two cases, as displayed in Fig. 3, between pcsCPD and csCPD/CPD/T-sICA. The t-value = 2.120 (p-value = 0.05, df =16) is shown as a threshold. SNR = 10 dB, and a relatively hard case s = 2, t = 3, SNR = -10 dB. Components 1, 6 and 8 are selected to represent the task-related, transiently task-related, and noise components, respectively. Fig. 5 shows the results. The proposed method yields the best performance for the three types of components under the two cases. In contrast, CPD, csCPD and T-sICA show sensitivity to the type of components, and the sensitivity increases in the hard case. Generally speaking, these three algorithms obtain the best estimations for the taskrelated component, which is a component of interest with a higher SNR for task fMRI data.

B. Experimental fMRI Data
We examine the task-related component for motor task fMRI data and analyze the magnitude and phase of the taskrelated SM and TC. Table II illustrates the means and standard deviations of |ρ c | for our proposed pcsCPD, as well as the csCPD, CPD and T-sICA. Our approach yields the highest average |ρ c | values, followed by T-sICA, csCPD and CPD. In addition, the proposed method achieves relatively lower standard deviations compared to the other three algorithms. CPD obtains the worst performance in terms of both the means and standard deviations of |ρ c |, indicating that the experimental fMRI data does not conform to the CPD model. Fig. 6 shows the magnitude and phase images of the shared SM estimates (|Z | ≥ 0.5). For the SM magnitudes given in Fig. 6 (1), the proposed method detects the largest contiguous and reasonable activations in the left primary motor areas (LPMA), right primary motor areas (RPMA), and supplementary motor areas (SMA). T-sICA is better than  III  COMPARISON OF PCSCPD, CSCPD, CPD, AND T-SICA IN TERMS OF  THE TOTAL NUMBER OF ACTIVATED VOXELS AND THE VOXELS  INSIDE AND OUTSIDE THE GLM MASK FOR THE  csCPD in terms of having less noisy voxels involved. CPD ranks last due to fewer motor-related activations and more noisy voxels. Table III includes the number of total activated voxels and the voxels inside and outside the GLM mask. Our proposed approach detects the highest number of voxels both inside and outside the GLM mask compared to T-sICA, and the voxels outside the GLM mask are mostly located in motorrelated areas (see the supplementary materials for details 1 ). CPD provides the largest number of total activated voxels, but with the minimal number of voxels inside the GLM mask and the maximal number of voxels unrelated to motor activations. When observing the SM phase maps shown in Fig. 6(2), the proposed pcsCPD obtains the highest |ρ c | value, followed by T-sICA, csCPD, and CPD.
The shared TC, subject-specific time delay and intensity estimates are shown in Fig. 7. The proposed method yields the highest |ρ c | values for the TC magnitude and phase compared to the other three algorithms. The TC phase is highly correlated with the model TC (|ρ c | = 0.76), which is consistent with previous studies [38]. Fig. 7(3) displays the time delays estimated by the proposed pcsCPD and csCPD. The average |ρ c | values between the time delay estimates and the references from all subjects are 0.78 for the proposed method and 0.31 for csCPD. The time delays are accurately estimated for 5 subjects by pcsCPD and for 2 subjects by csCPD. Finally, the subject intensities are shown in Fig. 7(4). The |ρ c | values between our method and the other three algorithms are 0.76 for T-sICA, 0.54 for CPD, and 0.32 for csCPD. The higher |ρ c | between pcsCPD and T-sICA indicates the high quality of the subject intensity estimates, as these two methods provide better results than csCPD and CPD for the SM and TC estimates. Fig. 8 shows the comparison of the proposed method with ICA-sCPD in terms of the magnitude maps, |ρ c | values and the number of activated voxels for the shared taskrelated SM estimates (|Z | ≥ 0.5). The proposed method obtains a lower |ρ c | value but a larger number of activated voxels than ICA-sCPD, as shown in Figs. 8(1) and 8(2). Figs. 8(3) and 8 (4) show the activated voxels detected by the proposed pcsCPD (in red), by ICA-sCPD (in green), or by both (in blue) in task-related regions, including RPMA and SMA (denoted as "RPMA+SMA") and LPMA. The number of voxels detected individually by the proposed method is much larger than ICA-sCPD. More precisely, the proposed algorithm detects 73.9% more contiguous voxels in RPMA+SMA (932 vs. 536) and 746.5% more contiguous voxels in LPMA (838 vs. 99) than ICA-sCPD. In summary, our proposed method extracts 178.7% more contiguous task-related activations than ICA-sCPD, consistent with the previous finding that phase data provides additional brain information [15], [18], [20], [35], [39].

V. DISCUSSION
This study proposes the use of the spatial source phase in the complex-valued CPD algorithm for decomposing complexvalued multi-subject fMRI data into shared SMs and TCs and subject-specific time delays and intensities. The spatial source phase sparsity and inter-subject temporal delays are simultaneously incorporated to address the inter-subject spatiotemporal variability among multi-subject fMRI data. The results from both the simulated and experimental fMRI data demonstrate the improvements of the proposed pcsCPD method over the complex-valued T-sICA, csCPD, and CPD algorithms, as well as the real-valued ICA-sCPD algorithm, in detecting more contiguous task-related activations.
The incorporation of spatial source phase sparsity is one contribution of this study to the analysis of complex-valued fMRI data. Although the spatial sparsity yields promising results in the magnitude-only analysis [24]- [29], it is not straightforward to incorporate sparsity for the high noise complex-valued fMRI data analysis. A huge number of unwanted voxels have magnitude values as high as or even much higher than the BOLD-related voxels in the complexvalued SMs [15]. Therefore, the magnitude sparsity does not hold for complex-valued components as magnitude-only components do. Fortunately, the spatial source phase is qualified due to the small-phase characteristic of the BOLD-related voxels after phase de-ambiguity [15], which facilitates the sparse representation of complex-valued fMRI data to examine the intrinsic characteristic of brain activation.
In the proposed method, the spatial activation is mainly affected by three phase-sparsity-related parameters, the threshold θ th n for adding the sparsity constraint on the unwanted voxels in s n , the initial σ (σ 0 ) in the smoothed 0 function given σ is slowly decreased, and the step size λ in the second update of S based on phase sparsity. The selection of these parameters has a more dramatic influence on the experimental fMRI data than the simulated one because the simulated fMRI data have much sparser spatial activations than the experimental data. A reasonable criterion for choosing θ th n is removing a smaller percentage (here 33%) of the total voxels in the separation process to retain more BOLD-related voxels. A larger θ th n leads to the loss of BOLD-related voxels, while a smaller one causes the interference of unwanted voxels. In addition, a different sparsity of the shared SMs supports a different selection of σ 0 and λ for the simulated and experimental fMRI data. Larger σ 0 and λ values should be used for the experimental fMRI data to encourage greater sparsity.
Imposing both spatial and temporal constraints is essential for analyzing multi-subject complex-valued fMRI data. The CPD without spatiotemporal constraints obtains the worst separation performance. T-sICA shows better SM estimates than csCPD when the spatial changes are larger due to imposing the spatial independence constraint, and csCPD exhibits better TC estimates than T-sICA when the TC changes are larger due to incorporating the inter-subject TC delays. In view of the performance of the above three compared algorithms, our proposed method demonstrates the least sensitivity to spatiotemporal changes and robustness to noise in both the simulated and experimental fMRI data experiments, indicating the match between the multiple-subject fMRI data and the CPD model with spatiotemporal constraints.
Our previous studies have verified that the complexvalued fMRI data analysis can detect more contiguous and reasonable activations than magnitude-only fMRI data analysis [15], [18], [20], [35]. By using complex-valued ICA and IVA algorithms, the complex-valued analyses extracted 139% more voxels for ICA and 393% more voxels for IVA than the magnitude-only analyses for the task fMRI data [15], [35]. The proposed method also extracts more contiguous task-related activations (73.9% more for RPMA+SMA and 746.5% more for LPMA) than the magnitude-only ICA-sCPD method, supporting the uniqueness and complementarity of phase fMRI data to the magnitude fMRI data.
The use of the number of activated voxels and spatial correlation as performance metrics may be a possible limitation of the study. Future work includes the following: exploiting new parameter metrics; applying our proposed method to resting-state fMRI data, which are more easily obtained and analyzed from patients with neurological disorders, to exact shared spatial activations and shared time courses with subjectspecific time delays; examining how well a complex-valued approach can predict, e.g., patients from controls compared to a magnitude-only approach; and incorporating the identification of the real rank before calculating the loading matrices [41], [42]. Adaptive selection of the threshold of the spatial source phase in imposing the phase sparsity constraint on spatial maps also deserves future investigation.