Analyzing Dynamical Brain Functional Connectivity as Trajectories on Space of Covariance Matrices

Human brain functional connectivity (FC) is often measured as the similarity of functional MRI responses across brain regions when a brain is either resting or performing a task. This paper aims to statistically analyze the dynamic nature of FC by representing the collective time-series data, over a set of brain regions, as a trajectory on the space of covariance matrices, or symmetric-positive definite matrices (SPDMs). We use a recently developed metric on the space of SPDMs for quantifying differences across FC observations, and for clustering and classification of FC trajectories. To facilitate large scale and high-dimensional data analysis, we propose a novel, metric-based dimensionality reduction technique to reduce data from large SPDMs to small SPDMs. We illustrate this comprehensive framework using data from the Human Connectome Project (HCP) database for multiple subjects and tasks, with task classification rates that match or outperform state-of-the-art techniques.


I. INTRODUCTION
T HERE has been a great interest in studying brain functional connectome generated from functional MRI (fMRI), which measures the blood oxygen level dependent (BOLD) contrast signals inside the brain over a period of time. The recent Human Connectome Project (HCP) [1] facilitates such research by providing high-quality publicly available dataset. The functional connectome, also referred to as functional connectivity (FC), is estimated as a set of statistical dependencies of fMRI signals among remote anatomical regions. These dependencies are expressed as quantifications of similarity, correlation, or covariance between simultaneous BOLD measurements across regions in human brain. Recent studies [2] have shown that FC is a dynamic process and changes continuosly, even in the resting state. The short-term FC is often represented as a correlation or covariance matrix of fMRI data over a small time window, with the matrix size equal to the number of brain regions being considered.
Our primary goal is to quantitatively represent and compare dynamic FC between different anatomical regions, over long intervals representing either performances of certain tasks or with brain at resting state. We achieve this by representing dynamic FCs as trajectories on space of covariance matrices, and comparing these trajectories using a Riemmanian metric on this space. This, in turn, leads to classification of dynamic FCs and prediction of various kinds of disease or stimuli using fMRI data. One challenge is the high dimensionality of the observed data -it becomes expensive to efficiently compare covariance trajectories when the dimension of covariances (i.e. the number of regions being studied) is very large. To handle this issue, we provide a metricbased, dimension-reduction method that helps reduce large covariance matrices to small covariance matrices, using a metric-based approach, and significantly improves computational efficiency of the ensuing statistical analysis.
A significant progress has been made over the last few years in the field of fMRI data analysis. One of the foci has been the classification of task and external stimuli using fMRI data. In [3], the authors used a multivariate pattern analysis (MVPA) approach to test the hypothesis that motivation-related enhancement of cognitive control results from improved encoding and representation of task set information. In the study presented in [4], fMRI data were studied using deep learning tools for Alzheimer's disease prediction. Barachant et al. [5] studied covariance matrices classification using SVM classifiers under a Riemannian metric based kernel. Zhao et al. [6] proposed a Riemannian framework for performing group analysis on longitudinal RS-fMRI data. Experiments studied the potential use of their framework in longitudinal connectivity analysis. Dodero et al. [7] extracted traces of brain function by constructing trajectories in a lowdimensional space. Their approach allows studying dynamic FC in detail from the moment-to-moment fluctuations of the brain signals during resting state or other conditions. Qiu et al. [8] proposed a framework for computing the mean of a set of brain functional networks and embedding brain functional networks into a low-dimensional space. They used Locally Linear Embedding (LLE) under the Log-Euclidean Riemannian metric. Dodero et al. [9] presented a general computational framework for classifying connectivity graphs This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ using kernels defined on the Riemannian manifold of positive definite matrices. Based on representation through a regularized graph Laplacian, this method can also be applied to classification of any kind of connectivity graphs with nonnegative weights. Huang et al. [10] proposed a regression approach that smooths out the noise by exploiting the geometric structure of correlation matrices. They demonstrated that the method is useful for identifying distinct state spaces in the resting-state connectivities of males and females through the transition matrices of the common states. Despite these achievements, these methods either did not provide quantitive measures to represent and compare dynamic FCs at the network level, or did not have efficient solutions in region-based large-scale multi-class classification problems. Here we aim to address these issues using a novel combination of representing dynamic FC using a sequence of symmetric positive definite matrices (SPDMs), a Riemannian structure on SPDMs, and a tool for dimension reduction from large SPDMs to small SPDMs.
We first parcellate human brain into functional units or regions, and represent short-term FC as SPDMs, with entries corresponding to covariances of localized fMRI signals. We use a sliding window to segment multivariate time series into overlapping temporal blocks, and estimate a covariance matrix for each block. The original time series data (BOLD signal) can thus be converted into a time series of covariance matrices, termed as a covariance trajectory or SPDM trajectory. Next, we utilize a Riemannian structure (described in several places including [11]) on the space of SPDMs to facilitate the comparison of trajectories. To improve computational efficiency, we propose a novel dimension-reduction approach based on the chosen Riemannian metric of SPDMs. The novel contributions of this paper are as follows: 1) Dynamic FC Representation Using SPDM Trajectories: We represent the dynamic FCs as SPDM trajectories and compares these trajectories based on an advanced Riemannian framework. 2) Metric-Based Task Classification: We use a Riemannian metric to perform registration, comparison, and classification of dynamic FCs as covariance trajectories.

3) SPDM Dimension Reduction:
We introduce a metricbased dimension-reduction technique to map large SPDMs into small ones, thus improving the computational efficiency and facilitating statistical analysis of data involving hundreds of regions. The rest of this paper is organized as follows. In Section II, we describe the mathematical components of the proposed framework, including: the estimation and representation of FC using SPDMs, Riemannian metrics for comparing SPDM trajectories, and dimension reduction for large SPDMs. Section III presents experimental results using both simulated and real HCP data. The paper ends with a short discussion and summary in Section IV.

II. METHODOLOGY
In this section we lay out different components of our framework. After standard preprocessing steps (see more details later in the real data analysis section), we obtain a time series of fMRI BOLD signal for each brain region of interest (ROI). We then select a set of ROIs and consider the corresponding vector-valued or multivariate time series. A sliding window method is used to temporally group elements of this multivariate time series into overlapping blocks. Within each block we estimate a covariance matrix, and thus obtain a time series of covariance matrices or a SPDM trajectory for each fMRI recording. This data is then ready for statistical analysis and classification. We describe the main pieces of this analysis next.

A. Covariance Estimation
The first step is to estimate a correlation or a covariance matrix that represents the short-term FC between distinct ROIs in the brain. More specifically, we pick n ROIs and then represent FC over a small time-interval using an n × n covariance matrix of the corresponding BOLD signals over that interval. In this paper we use the approach from [12] to estimate the covariance matrix. The estimated is an optimal convex linear combination of the sample covariance matrix S and the identity matrix I , i.e., = ρ 1 I + ρ 2 S, where the optimal weights ρ 1 and ρ 2 are estimated from the data. The optimality is defined with respect to a quadratic loss function, asymptotically as the number of observations and the number of variables go to infinity together. Extensive Monte Carlo simulations confirm that the asymptotic results tend to hold well even in finite sample situations. Please refer to [12] for more details. Now, after estimating a covariance matrix for each time-interval segmented by a sliding window, we obtain a covariance trajectory for each multivariate time series.

B. Riemannian Structure on Symmetric Positive-Definite Matrices (SPDMs)
In order to quantify differences in FCs, represented by covariance matrices, we need a metric structure on the manifold of covariance matrices or SPDMs. While there are several Riemannian structures in the literature [11], [13], [14], we use the one introduced in [11], since it has the advantage of having close forms for many necessary operations we need on the SPDM manifold, e.g., geodesic distance, parallel transport, exponential map, inverse exponential map. Zhang et al. [11] also have demonstrated that this metric is superior over other metrics such as the log-Euclidean one [13] in analyzing dynamic FCs.
LetP n be the space of n × n SPDMs, and let P n be its subset of matrices with determinant one. In our approach, we impose separate distances on the determinant one matrices and the determinants themselves. For any square matrix G with unit determinant, i.e., G ∈ G L(n), we can write it as a product of two square matrices G = P S, where P ∈ P and S ∈ SO(n) (SO(n) is the set of all n × n rotation matrices). This is called the polar decomposition. It motivates us to analyze P by representing it as a G after removing S. More formally, we identify P with the quotient space SL(n)/SO(n). This identification is based on the map π : SL(n)/SO(n) → P, given by π([G]) = GGt , for any G ∈ [G], where the square-root is the symmetric, positivedefinite square-root of a symmetric matrix. The notation [G] stands for all possible rotations of the matrix G, given by [G] = {GS|S ∈ SO(n)}. The inverse map of π is given by: π −1 (P) = [P] ≡ {P S|S ∈ SO(n)}. This establishes a one-toone correspondence between the quotient space SL(n)/SO(n) and P. Skipping further details, this process leads to the following geodesic distance between points in P. For any P 1 , P 2 ∈ P: where A 12 = log(P 12 ), P 12 = P 1 −1 P 2 2 P 1 −1 , and · F denotes the Frobenious norm of a matrix. Since for anỹ P ∈P we have det(P) > 0, we can expressP as a pair (P, 1 n log(det(P))) with P =P det(P) 1/n ∈ P. Thus,P is identified with the product space of P × R + and we take a weighted combination of distances on these two components to reach a metric onP: For any two arbitrary covariancesP 1 andP 2 , letP 12 = P 1 −1P 2 S 12 for some optimal S 12 ∈ SO(n) (using Procrustes alignment). Also, note that forP 12 ∈P, we have det(P 12 ) = det(P 2 )/ det(P 1 ). Therefore, the resulting squared geodesic distance betweenP 1 andP 2 is: Under this framework, we also can get closed-form solutions for the exponential map, inverse exponential map, parallel transport and Riemannian curvature tensor, which are often useful. We refer the reader to [14] for more details. Note that in Eqn (3), distance between covariance matrices is made up of two components -the determinant term and the unit-determinant symmetric matrix term. We can choose arbitrary relative weights on these terms to combine the two components. While, in some simple cases, it has been shown that one can obtain decent classification performance using only the determinant term, in general the second component provides important critical information about actual difference between covariance trajectories. As an example, Fig. 1 shows evolutions of log(det(α(t)))/n versus t, where α denotes a covariance trajectory, for different experimental conditions. It can be seen from this plot that the determinant information is not sufficient to classify tasks and experimental conditions. Therefore, in the following sections, we will mainly focus on studying the contributions from the unit-determinant symmetric matrix term (first term on the right side of Eqn (3)).

C. Riemannian Metric for FC Trajectories
To compare FC trajectories from different observations, we need a method for calculating distances between such trajectories. While the previously defined metric on SPDMs provides a way forward, an important issue remains. The time rate of executions of different tasks across subjects and experiments may not be tightly synchronized. Even if the cues are offered at exact times, the reaction times of different subjects could be different, even the same subjects can differ in response times across experiments. One needs to take care of this execution rate variability in order to better understand pure FC variation. Mathematically, this variability is represented as follows.
Let α denote a smooth trajectory on the Riemannian manifold of SPDMs P, where P is endowed with the Riemannian distance in (2). Let M denote the set of all such trajectories: , γ is a diffeomorphism with the end points preserved. Elements of represent different execution rates across different experimental recordings. If α is a trajectory on P, then α • γ is a trajectory that follows the same sequence of points as α but at the evolution rate governed by γ . That is, α and α • γ differ only in the execution rates, or response times, the actual FC responses are identical. It is important to note that forms a group under the composition operation.
Let α 1 and α 2 be two smooth trajectories in M, a simple way to establish a metric between them is to use 1 0 d(α 1 (t), α 2 (t))dt. Although it represents a natural solution to quantify differences between α 1 and α 2 , it suffers from the problem that d(α 1 , α 2 ) = d(α 1 • γ 1 , α 2 • γ 2 ), when the activities are observed under different execution rates denoted by γ 1 and γ 2 . This implies that even the same FC trajectories will have non-zero distances between them if they are performed at different execution rates. The execution rate γ is often considered as a nuisance variable if we only consider the value or the shape of the trajectories. We will later demonstrate that removing the effects of γ will facilitate better classification of FCs in certain scenarios. To make comparison of trajectories invariant to their execution rates, we use the Riemmanian metric as described in [11]. This construction is described next.
Definition 1: Let α : [0, 1] → P be a smooth trajectory with a starting point α(0) = P ∈ P. Define the transported square-root vector field (TSRVF) of α to be a scaled paralleltransport of the vector fieldα along α to the starting point P according to: where | · | denotes the norm that is defined through the Riemannian metric on P. V α(t )→α(0) denote the parallel transport of V from α(t) to α(0) along α. A covariance trajectory α is then represented by the pair (P, q(t)) in this analysis.
For any point P ∈ P, denote the set of all square-integrable curves in T P (P) as C P ≡ L 2 ([0, 1], T P (P)). The space of all smooth SPDM trajectories, then, becomes an infinitedimensional vector bundle C = P∈P C P , which is the indexed union of C P for every T p (P).
For given two points (P 1 , q 1 ) and (P 2 , q 2 ) on C, we want to find the geodesic path connecting them.
To become a geodesic, (x(s), v(s, ·)) needs to satisfy certain conditions according to [11], [15] and these conditions help develop a geodesic shooting method for finding geodesic between two points on C. Skipping details, we provide the final expressions for geodesic distances.
Definition 2: Given two trajectories α 1 , α 2 and their representations (P 1 , q 1 ), (P 2 , q 2 ) ∈ C, and let (x(s), v(s)) ∈ C, s ∈ [0, 1] be the geodesic between (P 1 , q 1 ) and (P 2 , q 2 ) on C, the geodesic distance is given as: This distance has two components: (1) the length between the starting points on P, l x = 1 0 |ẋ(s)|ds; and (2) the standard L 2 norm on C p 2 between the TSRVFs of the two trajectories, where q 1 represents the parallel transport of q 1 ∈ C p 1 along x to C p 2 . In the following context, we will also denote d c ((P 1 , q 1 ), (P 2 , q 2 )) as d c (α 1 , α 2 ) for simplicity.
The main motivation for us to use the TSRVF representation, to represent a SPDM trajectory, is that we have the following important property held: This key property allows us to study dynamic FCs in a manner that is invariant to their evolution rate. That is we can compare two equivalence classes or orbits Definition 3: Define the geodesic distance d q on C/ as the shortest distance between two orbits in C, It has been shown in [11] that this is a proper metric in the space C/ . To solve the above equation, we take the following optimization strategy This optimization is very expensive since we need to iteratively search for the optimal path (x, v) on C and the reparameterization γ . Instead, we use a fast approximation here. We first find the baseline x connecting two trajectories using geodesic between α 1 (0) and α 2 (0) on P and then align their TSRVFs accordingly using the Dynamic Programming Algorithm. This substantially speeds up the solution albeit at the cost of potentially differing from the optimal solution stated under the theoretical formulation.

D. Dimension Reduction for SPDMs
As mentioned above, the number of ROIs in brain connectivity studies can become very large [16], resulting in FC trajectories with large size SPDMs. The problem of comparing trajectories can become computationally expensive and, therefore, we seek a method for the data reduction, while preserving the symmetric, positive-definite nature of covariance matrices. Our main idea is to find a linear projection that maps high-dimensional SPDMs to low-dimensional SPDMs in a principled, near-optimal manner. In addition to providing computational simplification, the low-dimensional SPDMs also facilitate analyses in the following ways: 1) Such a projection can bring FC trajectories associated with different number of ROIs to the same smaller dimension, and make comparisons between them possible. 2) In the case that not all ROIs carry the same amount of information, dimension reduction can help filter out some extraneous components. The problem of dimension reduction of SPDMs has been studied and used in a variety of computer vision and pattern recognition problems, see e.g., [17], [18]. In this paper, we develop a novel dimension-reduction technique adapted to the Riemannian metric presented in Section II B. Thus, the reduced SPDMs are especially suitable for analyzing under the proposed Riemannian framework.
Given a set of n × n unit-determinant SPDMs {P i }, where n is a large integer, our goal is to find orthogonal matrix B ∈ R n×d , where d << n and B T B = I d , to project P i to Q i in R d×d according to Q i = B T P i B. The space of such orthogonal matrices is called a Stiefel manifold, denoted by S n,d . The next question is: What should be the optimality criterion for defining B? A simple yet important idea is that pairwise distances between given SPDMs should be preserved as much as possible after the projection. That is, find B ∈ S n,d such that d P d (Q i , Q j ) ≈ d P n (P i , P j ) for all i, j in the training set. This criterion can be formulated as: A direct optimization of this quantity over B ∈ S n,d is complicated due to the complexity of the chosen Riemannian metric. Instead, we develop an approximation where the comparison of distances is replaced by the comparison of relevant matrices directly.
In the original space, the distance between matrices P i and P j is governed by the matrix P i j = P −1 i P 2 j P −1 i (see Eqn (1)). Similarly, distance in the smaller space is determined by the matrix Q i j = Q −1 i Q 2 j Q −1 i . In order to compare these matrices, we need to bring them to the same space. LetP i denotes the reconstruction of P i from its projection Q i , i.e. P i = B Q i B T ∈ R n×n . Our goal is to find B ∈ S n,d that minimizes the quantity: However, this specification requires the following proviso. SinceP i is rank d, it is not invertible, and one needs to use its pseudoinverse instead. LetP − i = B Q −1 i B T denote the pseudoinverse ofP i . (In the appendix we establish this expression as a pseudoinverse.) Then, we have the following result.
Lemma 1: Under the conditions specified above, we have, for all i , j , To prove this, one only needs to show thatP − The proof of Lemma 2 is given in the appendix. We solve this optimization problem on the Stiefel manifold using the Matlab toolbox Manopt [19].

III. EXPERIMENTS
In this section, we provide a number of simulations and real data experiments to illustrate the proposed methodology. The first simulated example illustrates the proposed dimensionreduction technique. The second simulation studies the combined effects of dimension reduction and temporal alignment. In the real data experiment section, we present a few classification experiments based on different ROIs, activities, and classifiers with and without dimension reduction and alignment.
A. Simulation Studies on Dimension Reduction and Alignment 1) Simulation Experiment 1: We first randomly generate k sets of SPDMs, each set containing T independent n × n SPDMs. Let P i ( j ) be the j th SPDM in the i th set; it is generated using P i ( j ) = K i K i T + n I n + ε i j ε i j T , where K i and ε i j are n × n square matrices generated with i.i.d standard normal entries. In this simulation we have used the values n = 100, T = 20 and k = 10. Next, we apply our dimension reduction method to reduce the dimension of SPDMs from n = 100 to d = 20, 10 and 5. In order to evaluate this reduction, we calculate pairwise-distance matrices before and after dimension reduction, and display them in Fig. 2. The results show that although the dimension of SPDM changes dramatically, the patterns of distances within and across sets are preserved well for d = 20 and d = 10, with a slight deterioration in the case of d = 5. To quantify changes in distance matrices, we let D be the distance matrix between covariance matrices in the original dimension, D d be the distance matrix between covariance matrices after reducing dimension to d, and display their Frobenius norm D − D d F vs d in Fig. 2 (e). We also utilize some real data associated with social task along with AAL [20] template to display functional connectivities using reconstructed SPDMs in different dimensions, with BrainNet Viewer [21]. We threshold precision matrices associated with these SPDMs to represent edges in graphs, as shown in Fig. 3. Similarities between these graphs indicate the success of data compression with our dimension reduction method.
2) Simulation Experiment 2: Next, we study the combined effects of dimension reduction and temporal alignment. We simulate covariance trajectories in the space of large SPDMs and time warp them to generate new trajectories. Then, we perform dimension reduction and solve for temporal registration in the reduced space, in that order, and compare the warping results to the original time warpings.
We start by generating several covariance sequences, labeled {S i }, each with 20 time points and the covariance size 100 × 100. They are generated as follows: First we simulate a time series in R 100 from multivariate normal distribution N (0, I 100 ) for t = 1, . . . , 300, and then use a sliding window with window size of 80 and step size of 10 to segment the time series into overlapping blocks. We then estimate covariance in each block to form a 20-length sequence of 100 × 100 covariances. The resulting sequence is resampled at 20 time points after temporal smoothing with a Gaussian kernel resulting in S i . Next, we randomly generate warping functions γ i , and form new sequences S i • γ i , denoted byS i . Finally we implement dimension reduction on S i andS i , respectively to obtain trajectories of size d × d × 20, with d << 100, denoted by S d i andS d i . Then, we use the optimization stated in Eqn (6) to align S d i toS d i . In Fig. 4, we show the recovered warping  functions for different cases. Each panel represents a different experiment with different simulated data. The ground truth warping function is shown in black color for n = 10, while the other curves represent the recovered warping functions for different values of d. From these results, we can see that for a moderate dimensional reduction, with d = 50 or d = 20, the alignment results are near perfect. However, a large reduction (d ≤ 5) can result in some loss of information and lead to a deterioration in alignment performance. The computational cost for analyzing a large set of SPDM trajectories is a big concern, thus motivating the need for dimension reduction. In Fig. 5 we plot the computational cost of comparing trajectories, with and without temporal alignments, for different sizes of SPDM trajectories. This cost was computed with a PC with Intel i7-6700HQ and 8GB RAM. One can see that the cost of performing temporal alignment grows rapidly with the matrix size and becomes impractical for n larger than 100. This is an important consideration and sometime the gains of alignment are nullified by increases in the computational cost.

B. Real Data Experiments
Next we use HCP dataset to perform quantitative analysis of fMRI responses from different ROIs under different activities. The majority of the HCP fMRI data were acquired at 3T. A series of 4D fMRI imaging data were acquired for each subject while they were performing different tasks involving different neural systems, e.g. visual, motion and cognition systems. The acquired image is with an isometric spatial resolution of 2 mm and temporal resolution of 0.7 s. All fMRI data in HCP are preprocessed by removing spatial distortions, realigning volume to compensate for subject motion, registering the fMRI to the structural MRI, reducing the bias field, normalizing the 4D image to a global mean, masking the data with the final brain mask and aligning the brain to a standard space (MNI space) [22]. This preprocessed fMRI data are then available for connectivity analysis.
We begin by segmenting brain into multiple ROIs using an existing template [23]. With this template, the cortical area is divided into several networks: visual, auditory, default, frontoparietal, dorsal attention, cingulo-opercular, ventral attention, and salience. The fMRI time series data for each region (totally, 333 regions) are extracted using CONN functional connectivity toolbox [24]. Similar to the simulation study, we use a sliding window method to reach covariance trajectories representing dynamic FCs. For each activity we use the first 176 time points in fMRI time series. Despite the incompleteness of data, this setup makes sure the resulting sequences are compared at the same time scale. In total we extract data from 196 subjects under five different tasks -motor, gambling, social cognition, language, emotion -and the resting state and use them in our analysis.
In each classification experiment, we calculate pairwise distance d q , as defined in Eqn (5), between individual trajectories, and feed these distances into two types of classifiers. One is SVM with RBF kernel. The other is a simple deep neuron network (DNN) classifier with four dense layers along with dropout and ReLu activation at each layer, and one last dense layer with softmax activation, using Keras with Tensorflow [25] backend. We find in some cases alignment helps although that, in the more general multiclass classification tasks, using d c in Eqn (4) we can still obtain same level of accuracies. Thus, one can choose to use Eqn (4) in these tasks to save computation time. The results of individual experiments are presented next.

1) Dynamic FCs Under Different Sets of ROIs and a Fixed
Task: In this experiment, we restrict to the gambling task and use different sets of ROIs to study corresponding dynamic FCs. The first set of 4 ROIs is from the Salience network and the second set of 4 ROIs is from the CinguloParietal network. Fig. 6 shows locations of these ROIs in human brain. We extract the fMRI signals from 40 subjects in the HCP data set, and in total we obtain 80 trajectories. 40 for each set of ROIs. We then calculate two 80 × 80 pairwise distance matrices before and after the temporal alignment. In Fig. 7, we display these distance matrices before and after alignment, and a histogram of relative changes in pairwise distances due to alignment (for all trajectories). From this figure, we see that the dynamic FCs generated by the Salience network are more homogenous compared with the ones generated by the CinguloParietal network. This is reasonable because ROIs in the CinguloParietal network are actively involved in the gambling task [26]. The histogram in the last panel shows that there is a significant reduction (∼ 10 − 20%) in distances due to temporal alignment of trajectories. Next, these distance matrices are used as features for a binary classification task to study whether we can distinguish the dynamic FCs generated from different brain regions when people perform the same task. The average classification rates before and after alignment using the DNN classifier are shown in Table I. From this result, we can see that the alignment helps improve the classification rate for the class with higher variation.

2) Dynamic FCs Under Different Tasks But for Same ROIs:
Here we are interested in studying the classification of dynamic FCs under different tasks using the same set of ROIs. To save computational cost, in the following experiments we use d c for computing distances between trajectories. We present results from three scenarios:

a) Scenario 1: Classification and prediction based on ROIs in
the VentralAttn and DorsalAttn networks: We use 196 subjects from the HCP dataset. For four selected tasks -gambling, social, emotion and resting state -we construct dynamic FC trajectories for each subject using ROIs either in the Ventral or in the DorsalAttn networks. The VentralAttn network has 20 ROIs, resulting in 784, 20×20×20 dynamic FC trajectories. The DorsalAttn network has 29 ROIs, resulting in 784, 29 × 29 × 20 dynamic FC trajectories. We then calculate pairwise distances between trajectories, with dimension reduction, and feed the distance features to the classifiers. The average overall classification rates based on five-fold cross-valiation are shown in Fig. 9. There we can see that the classification rates are relatively stable, except when we reduce the dimension all the way to d = 5. Notably, we can reduce the computational cost of comparing trajectories by a significant amount due to dimension-reduction and still retain good performance. In Fig. 9 we also present results obtained using trajectories directly as input for an SVM classifier, noted as SVM-t. Results show that using distances as features results in higher success rates in this low-dimensional classification task.
A confusion matrix of classification results using all ROIs in DorsalAttn network and a DNN classifier is shown in Fig. 8. We see that for one case (resting state) the success rate reaches 100%, while it is lower for some cases. Using the same settings as in Scenario 1, the average overall classification rates for the two classifiers are shown in Fig. 10. We study a four-class classification (gambling, social, emotion   and resting state), versus a more difficult six-class classification (motor, gambling, social, language, emotion and resting state). We see that the success rates using the proposed metric reach 93% in the four-class classification, but drop to 72% in the six-class classification. In this setup, despite the simple architecture of given DNN classifier, its overall performance is as competitive as using SVM. One can seek performance improvement using more advanced neural network architectures. Here we also compare performance using the log-Euclidean metric between SPDMs for computing distances between covariance trajectories for comparison, denoted as SVM-L in Fig. 10. We see our proposed metric gives better overall performance, and the log-Euclidean metric gives better

c) Scenario 3: Classification of dynamic FCs with a large number of brain regions and dimension reduction with comparison:
In this experiment, we involve all 333 ROIs and four activities (gambling, social, emotion and resting-state) with 100 subjects for each activity. Here we also compare performances of our proposed pipeline with the regular PCA method. To perform PCA, for each of the averaged ROI based multivariate time series in R n (n = 333 here), we first reduce its dimension to R d using PCA, then transform it into covariance trajectory and calculate pairwise distances as earlier. We did not perform experiments in the case when n = 333 because of the high computational and storage cost.
The pairwise distance matrices between 400 trajectories with different degrees of dimension reduction are shown in Fig 11. Despite different dimension reduction, the block patterns are still well preserved. Classification results using SVM classifier for these setups are presented in Table II. We see that the proposed pipeline provides the highest classification rates across different dimensions, highlighting the power of the proposed method.
Existing fMRI pattern classification literature utilizes data from different sources. Some works use particular datasets to classify mental disorders and some aim on classifying task/resting activities. To our knowledge, there are a few works using HCP dataset for classification of activities. One example is Zhang et al. [27], where the classification accuracy is up to 100% when classifying tfMRI/rsfMRI in a twoclass classification problem. [28] and [29] also reach high classification rates in a two-task classification problem. In contrast, our results achieve high classification rates in a multitask classification problem, and also reach 100% classification rates for activities such as resting state. Pairwise distances of 300 trajectories from 10 networks (FrontalParietal, SMhand, Salience, Visual, CinguloOperc, VentralAttn, DorsalAttn, RetrosplenialTemporal, CinguloParietal, SMmouth respectively) after reducing dimensions to d = 4.
Another usage of the proposed framework is for comparing covariance trajectories with different dimensions, i.e., different number of ROIs. In view of the dimension reduction used in our method, one can bring different trajectories to the same dimension, thus making the comparison between theses trajectories possible. As an example, we use resting state data and different number of ROIs in 10 networks to form covariance trajectories with 30 subjects in each network, resulting on total of 300 trajectories. We project all covariance trajectories using d = 4 and compute pairwise distances between all 300 trajectories. The results, shown in Fig. 12, shows a strong consistent pattern in Salience network under resting state [30].

IV. DISCUSSION
In this paper we have presented a comprehensive framework for comparing multivariate fMRI time series, based on a distance on trajectories in the space SPDMs. We also derive a method for SPDM dimension reduction, which saves significant computational costs while preserving pairwise distances as much as possible. The method can be further used to compare covariance trajectories in different dimensions. Experimental results show high classification accuracies in multi-class classification problems, especially in emotion task and resting state data.
Through extensive experiments, we demonstrate the usefulness of the proposed framework in fMRI data analysis. For different tasks, the same set of brain regions can display different dynamic FC patterns. Some brain regions can be very informative in distinguishing the tasks, while others do not. These findings are consistent with existing literature, e.g., in [31], the authors discovered that FCs are dominated by common patterns, and tasks can only modestly influence brain networks. Classification of tasks using the full brain fMRI does not always result in the best performance. Hence, particular brain regions should be considered for task-based fMRI analysis and classification. The selection of brain regions can be done through prior neuroscientific knowledge or some data driven methods [32].
We also notice that in a multi-class classification problem, distance features can be similar for certain tasks due to similar functional connectivity patterns of the selected ROIs, thus the resulting classification rates are relatively low in these tasks. In terms of classification, we expect that given the distance features, using advanced deep learning frameworks and ensemble models may improve performance.

ACKNOWLEDGEMENT
The authors are thankful to the organizers of HCP database for making it available to public for this research. HCP is funded in part by WU-Minnesota consortium (PIs: Van Essen and Ugurbil) 1U54MH091657.

A. Discussion on Pseudoinverse ofP
For an n × d orthogonal matrix B, an n × n SPDM P and a projection Q = B T P B ∈ R d×d , the reconstruction of P is given by:P = B Q B T , such that B T B = I d . SinceP is singular, we cannot invert it, but can use the notion of a pseudoinverse instead. Let the SVD of Q be Q = U Q 0 U T , and an SVD ofP beP = U 1 U 2 Here S 1 ∈ R d×d and and S 2 ∈ R (n−d)×(n−d) . Due toP not being full ranked, S 2 is a matrix of zeros, andP = U 1 S 1 U 1 T . Its pseudo inverse is thus defined to beP − = U 1 S 1 −1 U 1 T . On the other hand,P = B Q B T = BU Q 0 U T B T . Since U is a rotation matrix, BU is just a rotation of B, and BU is still orthonormal. Setting BU = U 1 and Q 0 = S 1 , we get SVD ofP, and its pseudo inverse can be written asP − = BU Q 0 tr(P i j P i j − 2B T P i j B Q i j + Q i j Q i j ).
To simplify notation, we replace the trace operator with the matrix inner product, i.e tr(AB) = A, B = i, j a i j b i j . Now, for each i , we have tr(P i j P i j − 2B T P i j B Q i j + Q i j Q i j ) = argmin B,Q i j ( P i j , P i j − 2 B T P i j B, Q i j + Q i j , Q i j ) = argmin B,Q i j (− B T P i j B, B T P i j B It is easy to see that, to minimize the objective function, we need to set B T P i j B = Q i j for each i, j . By substituting Q i j = B T P i j B, we have