Taking the 4D Nature of fMRI Data Into Account Promises Significant Gains in Data Completion | IEEE Journals & Magazine | IEEE Xplore

Taking the 4D Nature of fMRI Data Into Account Promises Significant Gains in Data Completion


Robust Riemannian inference for estimating missing voxels via spatio-temporal tensor completion. Taking the four dimensional (4D) nature of fMRI data into account provide...

Abstract:

Functional magnetic resonance imaging (fMRI) is a powerful, noninvasive tool that has significantly contributed to the understanding of the human brain. FMRI data provide...Show More
Society Section: IEEE Engineering in Medicine and Biology Society Section

Abstract:

Functional magnetic resonance imaging (fMRI) is a powerful, noninvasive tool that has significantly contributed to the understanding of the human brain. FMRI data provide a sequence of whole-brain volumes over time and hence are inherently four dimensional (4D). Missing data in fMRI experiments arise from image acquisition limits, susceptibility and motion artifacts or during confounding noise removal. Hence, significant brain regions may be excluded from the data, which can seriously undermine the quality of subsequent analyses due to the significant number of missing voxels. We take advantage of the four dimensional (4D) nature of fMRI data through a tensor representation and introduce an effective algorithm to estimate missing samples in fMRI data. The proposed Riemannian nonlinear spectral conjugate gradient (RSCG) optimization method uses tensor train (TT) decomposition, which enables compact representations and provides efficient linear algebra operations. Exploiting the Riemannian structure boosts algorithm performance significantly, as evidenced by the comparison of RSCG-TT with state-of-the-art stochastic gradient methods, which are developed in the Euclidean space. We thus provide an effective method for estimating missing brain voxels and, more importantly, clearly show that taking the full 4D structure of fMRI data into account provides important gains when compared with three-dimensional (3D) and the most commonly used two-dimensional (2D) representations of fMRI data.
Society Section: IEEE Engineering in Medicine and Biology Society Section
Robust Riemannian inference for estimating missing voxels via spatio-temporal tensor completion. Taking the four dimensional (4D) nature of fMRI data into account provide...
Published in: IEEE Access ( Volume: 9)
Page(s): 145334 - 145362
Date of Publication: 19 October 2021
Electronic ISSN: 2169-3536
PubMed ID: 34824964

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Functional magnetic resonance imaging (fMRI) provides a noninvasive and indirect measure of brain activity and is a vital methodology in basic and applied neuroscience research. One of the common problems in fMRI analysis is the problem of missing data. The data collected during fMRI experiments often suffer from scanner instabilities, magnetic susceptibility artifacts [1], subject motion, and task-related noise [2]. An additional challenge arises when the fMRI data are processed in their native, complex domain, as the phase images exhibit a challenging noise structure [3]. The denoising technique is often used in complex-valued fMRI as a preprocessing step to increase the sensitivity of analyses [4], [5]. However, the denoising process results in the exclusion of a significant number of voxels from neuroimaging studies [3], [4]. The excluded brain regions at the cortical boundaries increase the risk of false negatives. The removal of voxels with missing data may increase the probability of false positives when spatial extent thresholds are applied to establish significance because fewer voxels generate smaller cluster sizes to trigger significance [2]. Hence, missing data present an important challenge for fMRI analysis. This challenge can result in the loss of statistical power and affect the consequent inferences motivating the use of data completion techniques to recover/estimate the excluded or missing voxels.

The missing data in fMRI studies can be either missing at random (MAR) or not missing at random (MNAR), which implies random groups or contiguous groups of voxels are missing [6]. Magnetic field inhomogeneity is the main source of missing MAR values due to the variability in the response of various tissues to magnetization [7]. The variability in tissue magnetization may result in geometric distortions, which are especially pronounced at tissue/border edges, the frontal lobe proximal to the sinuses, and the temporal lobe proximal to the mastoid air cells and air canals [8]. The MNAR pattern is a more challenging missing value pattern in fMRI data. For example, task-related or resting-state fMRI (rs-fMRI) experiments may lead to the nonrandom exclusion of voxels that are associated with brain activity or the effect of interest due to standard fMRI preprocessing and motion censoring techniques [9], [10]. Another example of MNAR pattern can arise as a result of complex-valued fMRI data quality control when noisy regions are eliminated from the phase images [4].

One approach for solving the incomplete fMRI data problem is to use robust estimation techniques to infer missing voxels. Data completion methods are suitable for this purpose [11]–​[14]. Data completion methods infer the missing values from partially observed entries and structural properties of the data. Real-world fMRI datasets exhibit correlations among their samples and hence highly correlated latent factors, which make data completion a feasible task. The completion methods used to estimate missing values fall into three broad categories: methods based on expectation maximization (EM) [13], nonlinear optimization [15], and nuclear norm minimization [11], [16], [17]. These approaches are further categorized into low-rank matrix and tensor completion methods. The algorithms that have been used in [18]–​[21] utilize low-rank matrix completion methods, in which each brain volume is transformed into a two-dimensional (2D) matrix $\mathbf {Y} \in \mathbb {R}^{T\times V}$ , where $T$ denotes the number of timepoints and $V$ denotes the number of voxels for each brain volume.

The majority of tensor completion methods developed for medical imaging mainly focus on three-dimensional (3D) fMRI data [22]–​[24]. In [22], authors made use of tensor completion for fMRI data to recover continuously missing 3D scans at select timepoints with the multiway delay embedding transform (MDT) and alternating least square (ALS) approach based on Tucker decomposition. However, the fMRI dataset used in [22] was transformed into a 3D representation for data completion purposes. In [23] and [24], authors formulated the task of sampling and reconstruction of fMRI data as a 3D tensor completion problem. Hence, the advantages of the four-dimensional (4D) structure of fMRI data have not been widely exploited for the data completion purposes. The state-of-the art tensor-based methods that focus on a higher-order fMRI data representation for brain imaging analyses include [25]–​[27].

In [25], block term decomposition (BTD) is used to present a 4D analysis approach based on blind source separation (BSS) demonstrating the potential of such a representation. There has been considerable interest on tensor-based methods for analysis [25], [28] and functional connectivity estimation [29] that use higher-order models in the medical imaging context. However, less attention has been paid to the full utilization of all available spatial and temporal modes of fMRI data for the data completion task, which could directly highlight the importance of taking the full 4D information into account.

Recently, a nonconvex TT-rank minimization method for 4D fMRI tensor completion has been developed in [30] with promising results reported for the MAR voxels pattern. Similar to this work, we view 4D fMRI data as a sequence of 3D whole-brain volumes observed at a number of consecutive time points for data completion. Mathematically speaking, an fMRI dataset is a 4D tensor with 3D space $\times $ time. Tensors capture correlation information across multiple dimensions via a high-order decomposition. Since fMRI data are inherently 4D, the 4-way representation allows us to naturally preserve the global structure of the data and maximize the use of both the spatial and temporal structures, i.e., the correlations, in the data. In addition, we make use of tensor train (TT) decomposition, which provides scalable numerical algebra operations and hierarchical compressed representations of structurally rich data sets; hence, it results in both reduced computational complexity and the efficient utilization of the covariance information.

In addition to not considering the time dimension, the majority of the algorithms for estimating missing data for medical imaging modalities were derived from schemes originally developed for Euclidean spaces. However, representing the geometry of medical images in Euclidean spaces has been shown to undermine the ability to account for joint variability either at the subject or group level [31], [32]. It was shown in [32] that brain images belong to a low-dimensional manifold that can provide a better approximation for the magnetic resonance imaging (MRI) data.

To overcome this problem, we propose to design a Riemannian framework for statistical learning in the context of fMRI tensor completion directly on a manifold, which is equipped with a Riemannian affine-invariant metric [33]. In [33] authors show that when tensors are endowed with the affine-invariant Riemannian distance, they provide a powerful framework for a robust interpolation and restoration of missing data by using generalized geometrical statistics. Therefore, the estimation error can be substantially reduced by learning tensor-valued fMRI data using Riemannian methods. As demonstrated in [32], [34]–​[36], the use of the Riemannian methods yields a high discriminative power in fMRI/MRI data analyses. The Riemannian geometry offers a principled mathematical framework for applying standard optimization algorithms that work with Euclidean geometry. The proposed approach involves a projection of the quantities expressed in the Euclidean space onto the tangent space, which is equipped with the invariant information-preserving distance. Our study seeks to adapt the classical optimization algorithms into their Riemannian counterparts by substituting the Euclidean distance with the Riemannian distance and optimizing the cost function over the Riemannian space.

In recent years, much progress has been made in the research of various conjugate gradient methods that use Riemannian geometry [37], [38]. The extension of the conjugate gradient algorithm to Riemannian manifolds is accomplished by replacing the classical gradient with the natural gradient [39]. This extension is relatively simple, as it is sufficient that all the vector operations take into account the Riemannian nature of the problem space. Yao et al. [40] proposed the Riemann Fletcher-Reeves conjugate gradient method to solve the stochastic eigenvalue problem, and Steinlechner [41] proposed the Riemannian nonlinear conjugate gradient scheme for the approximation of high-dimensional multivariate functions. To account for the nonlinearity of the search space of the original data, in this paper, we propose a novel Riemannian approach for 4D tensor completion of fMRI data. We combine the strength of two ideas — natural 4D tensor representation of fMRI data and the model parametrization that best fits to the fMRI data. TT decomposition guarantees numerical stability, and the affine-invariant Riemannian distance allows volume-preserving transformations of the shape of the data. We show the validity of this approach and its desirable properties with fMRI data from the Center for Biomedical Research Excellence [42].

In addition to the successful demonstration of the great potential of considering the full 4D information in fMRI data, our contributions in this paper are summarized as follows:

  • Viewing fMRI data as a 4D hypervolume allows us to take both temporal and spatial correlations into consideration, and therefore, allows us to establish a joint 4D learning scheme. The joint 4D learning scheme fully captures the intrinsic relationship between the spatial and temporal modes, thus guaranteeing superior performance compared to 2D and 3D fMRI completion methods in recovering missing voxels.

  • We cast the problem of estimating missing fMRI data as a smooth optimization problem on Riemannian manifolds due to the nonlinear structure of the parameter space. To the best of our knowledge, this is the first time Riemannian geometry has been used for the estimation of missing fMRI data in the form of a 4D tensor in a practical setting.

  • We develop a new geometric version of the spectral-scaling conjugate gradient method. Our method exploits second-order information and offers enhanced numerical stability.

  • We extend our Riemannian optimization method in TT format, which allows efficient numerical algebra operations without exponential scaling of the memory consumption and the computational complexity with the number of dimensions [43]. It has been shown in [17] that TT decomposition captures the global correlation of the tensor entries, and thus, it is an efficient tool for tensor completion methods.

  • We demonstrate that the proposed Riemannian nonlinear spectral conjugate gradient with TT (RSCG-TT) method provides a robust estimate of the original data for both the MAR and the MNAR missing value patterns encountered in real fMRI datasets and has performance superior to that of the state-of-the-art tensor completion methods across different rates of missing values. We show that the proposed method outperforms the state-of-the-art gradient algorithms based on canonical polyadic decomposition (CPD) and Tucker decomposition formulated in Euclidean space.

The remainder of the paper is organized as follows. Section II-A introduces some notations and preliminaries for TT decomposition. Section B-A describes the Riemannian manifold learning framework with TT decomposition, followed by the formulation of the tensor completion problem on TT manifolds in Section II-B. In Section II-C, we propose a new spectral conjugate gradient scheme in conventional Euclidean space. In Section III, we present the algorithmic development of the proposed RSCG-TT method in full. We provide a description of the experimental setup in Section IV. The efficiency and robustness of the proposed algorithm on real fMRI data with missing data patterns that reflect simulated and real-world scenarios are demonstrated in Section V. We discuss the experimental results in Section VI. Conclusions and future work are given in Section VII.

SECTION II.

Theory

In this work, we consider TT tensors as our structured tensor format of interest [44], [45]. As noted in Section B-A2, the set of all such tensors of a fixed multilinear rank $\mathbf {r}$ , which we equip with an affine invariant Riemannian metric [46], forms a smooth embedded submanifold of $\mathbb {R}^{{I_{1} \times \cdots \times I_{N}}}$ and were originally studied in [47]. With this differentiable Riemannian structure, we can construct optimization algorithms to solve the tensor completion problem for $N$ -dimensional tensors. In Section II-B, we formulate high-dimensional data completion as a Riemannian tensor completion problem on TT manifolds. We provide the preliminaries on Riemannian manifold learning for tensor decomposition in TT format in Appendix B, and notations and definitions of the necessary ingredients used in Riemannian TT learning are summarized in Table 2.

TABLE 1 Notation Used Throughout the Paper
Table 1- 
Notation Used Throughout the Paper
TABLE 2 Notations and Symbols Used in Riemannian TT Learning
Table 2- 
Notations and Symbols Used in Riemannian TT Learning

To provide the reader with some context on TT decomposition, we provide the basic concepts and results in Section II-A. For an overview of the most common choices of tensor formats, we refer the reader to [48]. Although our presentation is self-contained, we recommend the reader to refer to the original content [44], [47], [49] for further clarification. The mathematical notations and definitions are adopted from [50] and [43], [51]. In this paper, we denote vectors with boldface lowercase letters ($\mathbf {x}$ ,$\mathbf {y}$ , $\mathbf {z}$ ,…), matrices with boldface capital letters ($\mathbf {X}$ ,$\mathbf {Y}$ , $\mathbf {Z}$ ,…), and tensors with bold calligraphic letters ($\boldsymbol {\mathcal {X}}$ ,$\boldsymbol {\mathcal {Y}}$ , $\boldsymbol {\mathcal {Z}}$ ,…); see Table 1.

A. TT Decomposition

A TT decomposition [44], shown in Fig. 1, represents an $N$ th-order tensor $\boldsymbol {\mathcal {X}} \in \mathbb {R}^{I_{1} \times I_{2} \cdots \times I_{N}}$ defined as \begin{equation*} \boldsymbol {\mathcal {X}} = \boldsymbol {\mathcal {G}}^{(1)} \times ^{1} \boldsymbol {\mathcal {G}}^{(2)} \times ^{1} \cdots \times ^{1} \boldsymbol {\mathcal {G}}^{(N)},\tag{1}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $\boldsymbol {\mathcal {G}}^{(n)} \in \mathbb {R}^{R_{n-1}\times I_{n} \times R_{n}}$ is the $n$ th 3D TT core and $\boldsymbol {\mathcal {G}}^{(1)} \in \mathbb {R}^{I_{1} \times R_{1}}, \boldsymbol {\mathcal {G}}^{(N)} \mathbb {R}^{R_{N-1} \times I_{N}}$ are matrices. The rank of TT decomposition (TT rank) [44] is an $N+1$ tuple of ranks \begin{align*} \mathrm {rank}_{\mathrm {TT}}(\boldsymbol {\mathcal {X}})=&\mathbf {r} = [R_{0}, R_{1}, R_{2}, \cdots, R_{N-1}, R_{N}], \\ R_{n}=&\mathrm {rank}(\mathbf {X}_{ < n > }),\tag{2}\end{align*} View SourceRight-click on figure for MathML and additional features. where $R_{n}$ is the $n$ th component of the TT rank, which controls the complexity of TT decomposition, and $R_{0}=R_{N}=1$ . The rank $R_{n}$ is the rank of matrix $\mathbf {X}_{ < n > } \in \mathbb {R}^{I_{1}I_{2} \cdots I_{n} \times I_{n+1} \cdots I_{N}}$ obtained by the $n$ th splitting operator [44] of tensor $\boldsymbol {\mathcal {X}}$ as follows:\begin{equation*} \mathbf {X}_{ < n > } =\left({{\boldsymbol {\mathcal {X}}}, \prod _{i=1}^{n}{I_{i}},\prod _{i=n+1}^{N}{I_{i}}}\right).\tag{3}\end{equation*} View SourceRight-click on figure for MathML and additional features.

FIGURE 1. - TT decomposition of an 
$N$
th- order tensor 
$\boldsymbol {\mathcal {X}} \in \mathbb {R}^{I_{1} \times I_{2} \times \cdots \times I_{N}}$
 with TT rank 
$\mathbf {r} = (R_{0}, R_{1},R_{2}, \cdots, R_{n}, \cdots, R_{N-1},R_{N}$
), where 
$R_{0}=R_{N}=1$
. Conceptually, TT decomposition is a collection of TT cores connected in a cascade, where 
$\mathcal {G}^{(n)} \in \mathbb {R}^{R_{n-1} \times I_{n} \times R_{n}}$
 are 3D tensors and 
$\mathcal {G}^{(1)} \in \mathbb {R}^{I_{1} \times R_{1}}$
 and 
$\mathcal {G}^{(N)} \in \mathbb {R}^{R_{N-1} \times I_{N}}$
 are matrices [43].
FIGURE 1.

TT decomposition of an $N$ th- order tensor $\boldsymbol {\mathcal {X}} \in \mathbb {R}^{I_{1} \times I_{2} \times \cdots \times I_{N}}$ with TT rank $\mathbf {r} = (R_{0}, R_{1},R_{2}, \cdots, R_{n}, \cdots, R_{N-1},R_{N}$ ), where $R_{0}=R_{N}=1$ . Conceptually, TT decomposition is a collection of TT cores connected in a cascade, where $\mathcal {G}^{(n)} \in \mathbb {R}^{R_{n-1} \times I_{n} \times R_{n}}$ are 3D tensors and $\mathcal {G}^{(1)} \in \mathbb {R}^{I_{1} \times R_{1}}$ and $\mathcal {G}^{(N)} \in \mathbb {R}^{R_{N-1} \times I_{N}}$ are matrices [43].

Fig. 1 shows that the TT cores $\boldsymbol {\mathcal {G}}^{(n)}$ are connected in a cascade, and the leaf nodes are identities. The entries of TT decomposition for an $N$ th-order tensor can be computed as a multilayer multiplication of slices of TT cores, where the internal edges represent the TT rank [43]. The inner product of TT tensors can be defined in terms of the standard Euclidean product for vectors:\begin{equation*} \langle \boldsymbol {\mathcal {X}}, \boldsymbol {\mathcal {Y}} \rangle = {{\mathrm {vec}}(\boldsymbol {\mathcal {X})}}^{T} {\mathrm {vec}} (\boldsymbol {\mathcal {Y}}) = \mathrm {tr} \Big ((\mathbf {X}_{ < n > })^{T} \mathbf {Y}_{ < n > }\Big),\tag{4}\end{equation*} View SourceRight-click on figure for MathML and additional features. where ${\mathrm {vec}}(\boldsymbol {\mathcal {X}})$ is the vectorization operator and $n= {1, \cdots, N}$ .

B. Tensor Completion on TT Manifolds

We consider the completion of an $N$ -dimensional ($N \ge 3$ ) tensor $\boldsymbol {\mathcal {T}} \in \mathbb {R}^{I_{1} \times \cdots \times I_{N}}$ with partially observed entries defined in TT format. We assume the data are low-rank, and we present tensor completion as an optimization problem on a Riemannian manifold with a fixed TT rank as follows:\begin{align*} \min _{\boldsymbol {\mathcal {X}}} f(\boldsymbol {\mathcal {X}}):=&\frac {1}{2}\Vert \text {P}_{\Omega } \boldsymbol {\mathcal {X}} - \text {P}_{\Omega } \boldsymbol {\mathcal {T}}\Vert ^{2}_{F} \\ \text {subject to} ~\boldsymbol {\mathcal {X}} \in \mathcal {M}_{r}:=&\big \{ \boldsymbol {\mathcal {X}} \in \mathbb {R}^{I_{1} \times \cdots \times I_{N}}|\text {rank}_{\mathrm {TT}}(\boldsymbol {\mathcal {X}}) = \mathbf {r} \big \}, \\\tag{5}\end{align*} View SourceRight-click on figure for MathML and additional features. where $\text {P}_{\Omega }$ is the projection onto the sampling set $\Omega \subset \{ 1, 2, \cdots, I_{1}\} \times \cdots \times \{ 1, 2, \cdots, I_{N}\}$ , which defines the indices of the known entries of $\boldsymbol {\mathcal {T}}$ , \begin{align*} \text {P}_{\Omega }\boldsymbol {\mathcal {X}}:= \begin{cases} \boldsymbol {\mathcal {X}}(i_{1}, i_{2}, \cdots,i_{N}) & \text {if} (i_{1}, i_{2}, \cdots,i_{N}) \in \Omega \\ 0 &\text {if} (i_{1}, i_{2}, \cdots,i_{N}) \notin \Omega. \end{cases}\tag{6}\end{align*} View SourceRight-click on figure for MathML and additional features.

We choose TT decomposition as our structured tensor representation to solve the problem in (5). The choice of the TT format is driven by the requirement to address tensor completion for very high-dimensional ($N \ge 4$ ) problems and the property of TT decomposition, which captures the global correlation of the tensor entries [17] due to its well-balanced matricization scheme. The TT format allows us to develop an algorithm that scales linearly in the number of dimensions $N$ when the tensor size is $I$ and polynomially in the TT rank $\mathbf {r}$ , yielding a total asymptotic complexity of $N \vert \Omega \vert \mathbf {r}^{2}$ . We exploit the Riemannian structure of TT manifold [47] by viewing (5) as an unconstrained optimization problem on $\mathcal {M}_{r}$ , which allows for the use of Riemannian optimization techniques [37], [49].

The Euclidean gradient for problem (5) with respect to (w.r.t.) $\boldsymbol {\mathcal {X}}$ is given by \begin{equation*} \nabla f_{\boldsymbol {\mathcal {X}}} (\boldsymbol {\mathcal {X}}) = \text {P}_{\Omega } \boldsymbol {\mathcal {X}} - \text {P}_{\Omega } \boldsymbol {\mathcal {T}},\tag{7}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $f_{\boldsymbol {\mathcal {X}}} (\boldsymbol {\mathcal {X}}) \in \mathbb {R}^{I_{1} \times \cdots \times I_{N}}$ , and the Riemannian gradient is obtained by projecting (7) onto the tangent space $T_{\boldsymbol {\mathcal {X}}}\mathcal {M}_{r}$ :\begin{align*} \text {grad}f(\boldsymbol {\mathcal {X}})=&\text {P}_{T_{\boldsymbol {\mathcal {X}}}\mathcal {M}_{r}} \nabla f_{\boldsymbol {\mathcal {X}}} (\boldsymbol {\mathcal {X}}) \tag{8}\\ =&\text {P}_{T_{\boldsymbol {\mathcal {X}}}\mathcal {M}_{r}}(\text {P}_{\Omega } \boldsymbol {\mathcal {X}} - \text {P}_{\Omega } \boldsymbol {\mathcal {T}}).\end{align*} View SourceRight-click on figure for MathML and additional features.

C. A Nonlinear Spectral Conjugate Gradient (SCG) Scheme

To solve the tensor completion problem in (5), we use a nonlinear SCG scheme as our method of interest due its global superlinear convergence property, low memory requirements, and suitability for solving large-scale unconstrained optimization problems.

We develop a nonlinear SCG procedure that seeks the conjugate gradient direction $\boldsymbol {\mathcal {N}}_{k}$ closest to the direction of the double parameter self-scaled preconditioned Broyden-Fletcher-Goldfarb-Shanno (BFGS) update. To improve the numerical stability of our method, we use the spectral-scaling secant condition [52] while deriving the search direction. The choice of the values for the spectral parameters in the search direction are determined to minimize the spectral condition number of the BFGS approximation to the inverse Hessian of the objective function. To achieve global convergence for general functions, we use a modified BFGS update (MBFGS) [53], which possesses the global convergence property without the convexity assumption on the objective function. Hence, by incorporating the spectral-scaling preconditioner in the MBFGS update, we simultaneously use second-order information and improve the numerical stability of our method.

Next, we introduce the key ingredients of our nonlinear SCG method in Euclidean space and provide a definition of our nonlinear SCG method in Riemannian space in Section III.

Starting from the initial guess $\boldsymbol {\mathcal {X}}_{0} \in \mathbb {R}^{I_{1} \times \cdots \times I_{N}}$ , the nonlinear conjugate gradient method generates a sequence $\boldsymbol {\mathcal {X}}_{k}$ as follows:\begin{equation*} \boldsymbol {\mathcal {X}}_{k+1} = \boldsymbol {\mathcal {X}}_{k} + \alpha _{k} \boldsymbol {\mathcal {N}}_{k},\quad k = 0, 1, \cdots,\tag{9}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $\alpha _{k} > 0$ is the step length determined by a suitable line search strategy. The spectral search direction $\boldsymbol {\mathcal {N}}_{k}$ [54] is given by \begin{equation*} \boldsymbol {\mathcal {N}}_{k} = -\theta _{k}\boldsymbol {\mathcal {G}}_{k} + \beta _{k}\boldsymbol {\mathcal {N}}_{k-1},\quad \boldsymbol {\mathcal {N}}_{0} = -\theta _{0} \boldsymbol {\mathcal {G}}_{0},\tag{10}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $\theta _{k}$ and $\beta _{k}$ are spectral and conjugate gradient parameters, respectively, and $\boldsymbol {\mathcal {G}}_{k}$ denotes the Euclidean gradient $\nabla f(\boldsymbol {\mathcal {X}}_{k})$ . We denote $\boldsymbol {\mathcal {S}}_{k-1} = \boldsymbol {\mathcal {X}}_{k} -\boldsymbol {\mathcal {X}}_{k-1} =\alpha _{k-1} \boldsymbol {\mathcal {N}}_{k-1}$ and $\boldsymbol {\mathcal {Y}}_{k-1} = \boldsymbol {\mathcal {G}}_{k} - \boldsymbol {\mathcal {G}}_{k-1}$ .

A well-known drawback of the spectral conjugate gradient methods with the search direction (10) is that they cannot always satisfy the sufficient descent conditions [55], but they perform very well in practice for solving large-scale unconstrained optimization problems. Here, the sufficient descent condition means that there exists a positive constant $c$ such that \begin{equation*} \boldsymbol {\mathcal {G}}_{k}^{T}\boldsymbol {\mathcal {N}}_{k} \leq -c \Vert \boldsymbol {\mathcal {G}}_{k}\Vert ^{2} \quad \forall k \ge 0,\tag{11}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $c$ is a positive constant. The condition in (11) is crucially important in establishing the global convergence of our method. To take advantage of the attractive properties of the spectral gradient methods and enforce the generation of the search directions that satisfy (11), we require that the step length $\alpha _{k}$ satisfies the improved Wolfe conditions [56], which ensure that the condition (11) always holds. To support this requirement, we employ a nonmonotone line search strategy, which produces the step length $\alpha _{k}$ to satisfy the improved Wolfe conditions [56]. In Section II-C1, we derive the search direction (10) in Euclidean space and propose a SCG method based on the revised MBFGS update [53]. In Section III, we generalize the proposed SCG method to the Riemannian space and present our RSCG-TT method using the Riemannian optimization scheme on the TT manifold.

1) Derivation of the Spectral Search Direction

It was shown in [57], [58] that the BFGS variable metric method can be regarded as a conjugate gradient algorithm by incorporating the BFGS updating scheme of the inverse Hessian approximation within the framework of a memoryless quasi-Newton method. We propose a new efficient spectral conjugate gradient algorithm based on the MBFGS update [53]. We require that the search direction (10) is close to a quasi-Newton direction; that is, \begin{equation*} \boldsymbol {\mathcal {N}}^{SGC}_{k} \approx -\boldsymbol {\mathcal {H}}_{k}\boldsymbol {\mathcal {G}}_{k},\tag{12}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $\boldsymbol {\mathcal {H}}_{k}$ is the symmetric approximation to the inverse Hessian $\nabla ^{2}\,\,f^{-1}(\boldsymbol {\mathcal {X}}_{k})$ , expressed as a memoryless BFGS update [59]. During each iteration, the quantity $\boldsymbol {\mathcal {H}}_{k} \approx \nabla ^{2}\,\,f^{-1}(\boldsymbol {\mathcal {X}}_{k})$ is replaced with a diagonal preconditoner $\theta _{k} \boldsymbol {\mathcal {I}}$ such that the spectral-scaling secant condition [52] is satisfied:\begin{align*} \boldsymbol {\mathcal {H}}_{k}\boldsymbol {\mathcal {Z}}_{k-1}=&\boldsymbol {\mathcal {S}}_{k-1}, \theta _{k} \boldsymbol {\mathcal {I}} \boldsymbol {\mathcal {Z}}_{k-1} = \boldsymbol {\mathcal {S}}_{k-1}, \tag{13}\\ \boldsymbol {\mathcal {Z}}_{k-1}=&\boldsymbol {\mathcal {Y}}_{k-1} + h_{k-1} \Vert \boldsymbol {\mathcal {G}}_{k-1}\Vert ^{q} \boldsymbol {\mathcal {S}}_{k-1}, \tag{14}\\ h_{k-1}=&p + \max \left\{{ \frac {-\boldsymbol {\mathcal {S}}_{k-1}^{T} \boldsymbol {\mathcal {Y}}_{k-1}}{\Vert \boldsymbol {\mathcal {S}}_{k-1}\Vert ^{2}}, 0}\right\} \Vert \boldsymbol {\mathcal {G}}_{k-1}\Vert ^{-q},\qquad \tag{15}\end{align*} View SourceRight-click on figure for MathML and additional features. where $q > 0, p > 0$ and the related variables are defined after (10). The spectral-scaling secant condition (13) takes into account the second-order information of the objective function, and by extension, it incorporates the proven properties of the MBFGS update (14)–(15) [53]. Furthermore, since we require our line search strategy to fulfill the improved Wolfe conditions [56], the curvature condition holds; that is, $\boldsymbol {\mathcal {S}}_{k-1}^{T} \boldsymbol {\mathcal {Y}}_{k-1} > 0$ . As a result, the expression for $\boldsymbol {\mathcal {Z}}_{k-1}$ in (14) can be simplified as follows:\begin{equation*} \boldsymbol {\mathcal {Z}}_{k-1} = \boldsymbol {\mathcal {Y}}_{k-1} + p\Vert \boldsymbol {\mathcal {G}}_{k-1} \Vert ^{q} \boldsymbol {\mathcal {S}}_{k-1}.\tag{16}\end{equation*} View SourceRight-click on figure for MathML and additional features.

To further enhance the numerical stability of our method, we propose to employ a double parameter variable BFGS metric update [60], which is given by:\begin{align*}&\boldsymbol {\mathcal {H}}^{\theta, \tau }_{k} = \theta _{k} \boldsymbol {\mathcal {I}} - \theta _{k} \frac {\boldsymbol {\mathcal {Z}}_{k-1} \boldsymbol {\mathcal {S}}_{k-1}^{T} + \boldsymbol {\mathcal {S}}_{k-1} \boldsymbol {\mathcal {Z}}_{k-1}^{T}}{\boldsymbol {\mathcal {Z}} _{k-1}^{T} \boldsymbol {\mathcal {S}}_{k-1}} \\&\qquad \qquad \qquad \,\,+ \left({1 + \tau _{k} \frac {\boldsymbol {\mathcal {Z}}^{T} _{k-1} \boldsymbol {\mathcal {Z}} _{k-1}}{\boldsymbol {\mathcal {Z}} _{k-1}^{T} \boldsymbol {\mathcal {S}}_{k-1}}}\right)\frac {\boldsymbol {\mathcal {S}}_{k-1} \boldsymbol {\mathcal {S}}_{k-1}^{T}}{\boldsymbol {\mathcal {Z}}_{k-1}^{T}\boldsymbol {\mathcal {S}}_{k-1}},\tag{17}\end{align*} View SourceRight-click on figure for MathML and additional features. where $\tau _{k} > 0$ is set to satisfy the bound $\tau _{k} = \gamma \theta _{k}, 1 \le \gamma < 2$ . Alternatively, (17) can be written as \begin{equation*} \boldsymbol {\mathcal {H}}^{\theta, \tau }_{k} = \boldsymbol {\mathcal {H}}^{\theta }_{k} + (\tau _{k} - \theta _{k}) \frac {\Vert \boldsymbol {\mathcal {Z}}_{k-1}\Vert ^{2}}{(\boldsymbol {\mathcal {Z}}_{k-1}^{T} \boldsymbol {\mathcal {S}}_{k-1})^{2}}\boldsymbol {\mathcal {S}}_{k-1}\boldsymbol {\mathcal {S}}_{k-1}^{T}.\tag{18}\end{equation*} View SourceRight-click on figure for MathML and additional features.

It is easy to see that if $\boldsymbol {\mathcal {H}}^\theta _{k}$ is symmetric positive definite, the choice of $\theta _{k} > 0, \tau _{k} \ge \theta _{k}$ results in the sufficient descent condition for the search direction (12). The choice of the spectral parameter $\theta _{k} > 0$ is driven by the requirement to minimize the condition number of $\nabla ^{2}\,\,f^{-1}(\boldsymbol {\mathcal {X}}_{k})$ , which can enhance the numerical stability of the tensor-based computational process. Therefore, the quantity $\theta _{k}\boldsymbol {\mathcal {I}}$ should be an approximation of $\nabla ^{2}\,\,f^{-1}(\boldsymbol {\mathcal {X}}_{k})$ . We obtain parameter $\theta _{k}$ as suggested in [61], [62] by minimizing $\Vert \theta _{k} \boldsymbol {\mathcal {Z}}_{k-1} - \boldsymbol {\mathcal {S}}_{k-1} \Vert _{F}$ , which yields \begin{equation*} \theta _{k}^{*} = \frac {\boldsymbol {\mathcal {S}}_{k-1}^{T} \boldsymbol {\mathcal {S}}_{k-1}}{\boldsymbol {\mathcal {Z}}_{k-1}^{T} \boldsymbol {\mathcal {S}}_{k-1}}.\tag{19}\end{equation*} View SourceRight-click on figure for MathML and additional features.

By its derivation, (19) minimizes the spectral condition number of (17) and thereby improves the numerical stability of our method. The optimal upper bound of the parameters $\theta _{k}$ and $\tau _{k}$ was derived in [60], [63], and the parameters $\theta _{k}$ and $\tau _{k}$ are computed by \begin{equation*} \theta _{k} = \frac {1}{2-\gamma }\frac {\boldsymbol {\mathcal {S}}_{k-1}^{T} \boldsymbol {\mathcal {S}}_{k-1}}{\boldsymbol {\mathcal {Z}}_{k-1}^{T} \boldsymbol {\mathcal {S}}_{k-1}}, \quad \tau _{k} = \gamma \theta _{k},~1 \le \gamma < 2.\tag{20}\end{equation*} View SourceRight-click on figure for MathML and additional features.

To ensure the sufficient descent condition for the search direction (12), we restrict $\theta _{k}$ as follows:\begin{equation*} \tilde {\theta }_{k} = \max \left\{{\min \left\{{\theta _{k}, \frac {1}{m_{1}} }\right\}, m_{1}}\right\}, \tilde {\tau }_{k} = \gamma \tilde {\theta _{k}}, \le \gamma < 2, \tag{21}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $m_{1}$ is a small positive constant.

By multiplying (17) by the gradient $\boldsymbol {\mathcal {G}}_{k}$ and after some algebraic transformations, our spectral search direction (12) is given by \begin{equation*} \boldsymbol {\mathcal {N}}^{SCG}_{k} = - \tilde {\theta }_{k} \boldsymbol {\mathcal {G}}_{k} + \beta _{k} \boldsymbol {\mathcal {S}}_{k-1} + \zeta _{k} \boldsymbol {\mathcal {Z}}_{k-1},\tag{22}\end{equation*} View SourceRight-click on figure for MathML and additional features. where the spectral parameter $\tilde {\theta }_{k}$ is given in (21) and the conjugate gradient parameter $\beta _{k}$ is computed as follows:\begin{align*} \beta _{k}({\tilde {\theta }_{k}, \tilde {\tau }_{k}})=&\tilde {\theta }_{k} \frac {\boldsymbol {\mathcal {G}}_{k}^{T} \boldsymbol {\mathcal {Z}}_{k-1}}{\boldsymbol {\mathcal {Z}}_{k-1}^{T} \boldsymbol {\mathcal {S}}_{k-1}} \tag{23}\\ &- \left({1 + \tilde {\tau }_{k} \frac {\boldsymbol {\mathcal {Z}}_{k-1}^{T} \boldsymbol {\mathcal {Z}}_{k-1}}{\boldsymbol {\mathcal {Z}}_{k-1}^{T} \boldsymbol {\mathcal {S}}_{k-1}}}\right)\frac {\boldsymbol {\mathcal {G}}_{k}^{T} \boldsymbol {\mathcal {S}}_{k-1}}{\boldsymbol {\mathcal {Z}}_{k-1}^{T} \boldsymbol {\mathcal {S}}_{k-1}}, \\ \zeta=&\tilde {\theta }_{k} \left({\frac {\boldsymbol {\mathcal {G}}_{k}^{T} \boldsymbol {\mathcal {S}}_{k-1}}{\boldsymbol {\mathcal {Z}}_{k-1}^{T} \boldsymbol {\mathcal {S}}_{k-1}} }\right).\tag{24}\end{align*} View SourceRight-click on figure for MathML and additional features.

Here, we discuss the global convergence properties of the SCG method inherited from the MBFGS update. Notably, the MBFGS update formula in (14) has a favorable property that for each $k$ , it always holds that $\boldsymbol {\mathcal {Z}}_{k-1}^{T}\boldsymbol {\mathcal {S}}_{k-1} > 0$ , which ensures that $\boldsymbol {\mathcal {H}}_{k}$ inherits the positive definiteness of $\boldsymbol {\mathcal {H}}_{k-1}$ [64] as long as we start with the initial positive definite quantity $\boldsymbol {\mathcal {H}}_{0}=\theta _{0}\boldsymbol {\mathcal {I}}$ . It was shown in [53] [Theorem 2.1], [64] [Theorem 3.3] that if the update for $\boldsymbol {\mathcal {H}}_{k}$ is given by (13)–(15), then the global convergence property holds. Therefore, as previously argued [53], [64] the SCG method is seen to possess the global convergence property. For more details regarding the MBFGS method, we refer the reader to the original articles [53], [64], and [65].

SECTION III.

A Riemannian SCG Method for Tensor Completion

This section is devoted to the algorithmic development for solving the tensor completion problem (5).

A. RSCG-TT Iterative Algorithm

With the concepts introduced in Sections II-B and II-C we have all the requirements for performing Riemannian optimization on the TT manifold $\mathcal {M}_{r}$ . The SCG algorithmic framework developed in Section II-C yields Algorithm 1. It can be seen as an extension of SCG scheme introduced in Section II-C, with the Euclidean gradient replaced by the Riemannian gradient. Here, we generalize the quantities we derived in Section II-C to Riemannian settings. When the Euclidean space $\mathbb {R}^{I_{1} \times \cdots \times I_{N}}$ is replaced by the Riemannian TT manifold $(\mathcal {M}_{r}, \mathfrak {g})$ , the components in spectral search direction (22) belong to different tangent spaces; hence, the Euclidean vector addition on $(\mathcal {M}_{r}, \mathfrak {g})$ makes no sense. To modify the vector addition into a suitable operation, we use the vector transport $\tau _{\boldsymbol {\mathcal {X}}_{k-1} \rightarrow \boldsymbol {\mathcal {X}}_{k}}:=\text {P}_{T_{\boldsymbol {\mathcal {X}}_{k}} \mathcal {M}_{r}}(\boldsymbol {\mathcal {X}}_{k-1})$ (see Table 2). Thus, the Riemannian SCG update rule is as follows:\begin{align*} \xi _{k}=&\text {grad} f(\boldsymbol {\mathcal {X}}_{k}), \quad \text {defined in } ~(8) \tag{25}\\ \boldsymbol {\mathcal {S}}_{k-1}=&\tau _{\boldsymbol {\mathcal {X}}_{k-1} \rightarrow \boldsymbol {\mathcal {X}}_{k}}(\alpha _{k-1}\boldsymbol {\mathcal {N}}_{k-1}), \tag{26}\\ \xi _{k-1}^{\text {transp}}=&\tau _{\boldsymbol {\mathcal {X}}_{k-1} \rightarrow \boldsymbol {\mathcal {X}}_{k}}(\xi _{k-1}), \tag{27}\\ \boldsymbol {\mathcal {Y}}_{k-1}=&\xi _{k} - \xi _{k-1}^{\text {transp}}, \tag{28}\\ \boldsymbol {\mathcal {Z}}_{k-1}=&\boldsymbol {\mathcal {Y}}_{k-1} + p\Vert \xi _{k-1}^{\text {transp}}\Vert ^{q} \boldsymbol {\mathcal {S}}_{k-1}, \tag{29}\\ \beta _{k}({\tilde {\theta }_{k}, \tilde {\tau }_{k}})=&\tilde {\theta }_{k} \frac {\xi _{k}^{T} \boldsymbol {\mathcal {Z}}_{k-1}}{\boldsymbol {\mathcal {Z}}_{k-1}^{T} \boldsymbol {\mathcal {S}} _{k-1}} \\ &-\left({1 + \tilde {\tau }_{k} \frac {\boldsymbol {\mathcal {Z}}_{k-1}^{T} \boldsymbol {\mathcal {Z}}_{k-1}}{\boldsymbol {\mathcal {Z}}_{k-1}^{T} \boldsymbol {\mathcal {S}} _{k-1}}}\right)\frac {\xi _{k}^{T} \boldsymbol {\mathcal {S}} _{k-1}}{\boldsymbol {\mathcal {Z}}_{k-1}^{T} \boldsymbol {\mathcal {S}} _{k-1}}, \\ \zeta=&\tilde {\theta }_{k} \left({\frac {\xi _{k}^{T} \boldsymbol {\mathcal {S}}_{k-1}}{\boldsymbol {\mathcal {Z}}_{k-1}^{T} \boldsymbol {\mathcal {S}}_{k-1}} }\right), \\ \alpha _{k}^{0}=&\min _{\alpha } f (\boldsymbol {\mathcal {X}}_{k}) + \alpha \boldsymbol {\mathcal {N}}_{k} \quad \text {defined in }\,\,(37)\tag{30}\\ \alpha _{k}=&\alpha _{k}^{*} \quad \text {defined in}\,\,(38)\text{ and }(39),\tag{31}\\ \boldsymbol {\mathcal {N}}^{SCG}_{k}=&- \tilde {\theta }_{k} \xi _{k} + \beta _{k} \boldsymbol {\mathcal {S}}_{k-1} + \zeta _{k} \boldsymbol {\mathcal {Z}}_{k-1}, \tag{32}\\ \boldsymbol {\mathcal {X}} _{k+1}=&R_{\boldsymbol {\mathcal {X}} _{k}}(\alpha _{k} \boldsymbol {\mathcal {N}}_{k}). \tag{33}\end{align*} View SourceRight-click on figure for MathML and additional features.

Algorithm 1 RSCG-TT Algorithm

Require:

The observed tensor $\boldsymbol {\mathcal {T}} \in \mathcal {M}_{r}$ , sampling set $\Omega $ , multilinear TT rank $\mathbf {r}$ , prescribed tolerance $\epsilon $ , and positive parameters $p$ , $m_{1}$ , $\gamma $ , $q$ .

Ensure:

: The completed tensor $\boldsymbol {\mathcal {X}}$ is an approximation of $\boldsymbol {\mathcal {T}}$ .

1:

Initialize: $\boldsymbol {\mathcal {X}}_{0}~\in ~\mathcal {M}_{r}$ randomly

2:

Compute the initial gradient

$\xi _{0}\gets \text {grad}f (\boldsymbol {\mathcal {X}})$

3:

Compute the initial search direction as the steepest descent

$\theta _{0} \gets 1$ , $\boldsymbol {\mathcal {N}}_{0} \gets -\theta _{0} \xi _{0}$

4:

Compute the initial step size by exact minimizer (37)

$\alpha _{0} \gets \min _{\alpha } f (\boldsymbol {\mathcal {X}}_{0}) + \alpha \boldsymbol {\mathcal {N}}_{0}$

5:

Obtain the next iterate by retraction (33)

$\boldsymbol {\mathcal {X}}_{1} \gets R_{\boldsymbol {\mathcal {X}}_{0}}(\alpha _{0} \boldsymbol {\mathcal {N}}_{0})$

6:

while $\frac {\Vert \boldsymbol {\mathcal {X}}_{\Omega } - \boldsymbol {\mathcal {T}}_{\Omega }\Vert _{F}^{2}}{\Vert \mathcal {T}_{\Omega } \Vert _{F}^{2}} \leq \epsilon $ do

7:

Compute gradient (25)

$\xi _{k} \gets \text {grad}f (\boldsymbol {\mathcal {X}}_{k})$

8:

Compute the spectral search direction (32)

$\boldsymbol {\mathcal {N}}^{SCG}_{k} \gets - \tilde {\theta }_{k} \xi _{k} + \beta _{k} \boldsymbol {\mathcal {S}}_{k-1} + \zeta _{k} \boldsymbol {\mathcal {Z}}_{k-1}$

9:

Compute the step length by updating rules (30) and (31)

$\alpha _{k} \gets \alpha _{k}^{*}$

10:

Compute the next iterate by retraction (33)

$\boldsymbol {\mathcal {X}}_{k+1} \gets R_{\boldsymbol {\mathcal {X}}_{k}}(\alpha _{k} \boldsymbol {\mathcal {N}}_{k})$

11:

end while

The resulting optimization scheme is given in Algorithm 1. We call it the Riemannian SCG algorithm for tensor completion via TT. Algorithm 1 represents a geometrical version of the nonlinear SCG algorithm proposed in Section II-C. The major difference is that we have to ensure that every point $\boldsymbol {\mathcal {X}}_{k}$ in the optimization sequence stays on the Riemannian manifold $\mathcal {M}_{r}$ . To move in the direction of the steepest ascent of $f(\boldsymbol {\mathcal {X}}_{k})$ , we use the Riemannian gradient $\xi _{k} \in T_{\boldsymbol {\mathcal {X}}_{k}}\mathcal {M}_{r}$ , which points in the direction of the greatest increase within tangent space $T_{\boldsymbol {\mathcal {X}}_{k}}\mathcal {M}_{r}$ . The geometrical version of the algorithm is visualized in Fig. 2a and consists of the following core concepts.

FIGURE 2. - SCG on a Riemannian TT manifold. (a) Riemannian SCG iteration. (b) Minimizing along the search direction 
$\boldsymbol {\mathcal {N}}$
 on 
$\mathcal {M}$
. The step length 
$\alpha $
 is obtained by minimizing the tangent line passing through 
$\boldsymbol {\mathcal {X}} \in \mathcal {M}_{r}$
 with direction 
$\boldsymbol {\mathcal {N}}_{k} \in T_{\boldsymbol {\mathcal {X}}_{k}}\mathcal {M}_{r}$
. We minimize along the search curve to obtain an optimal estimate of the step length 
$\alpha $
.
FIGURE 2.

SCG on a Riemannian TT manifold. (a) Riemannian SCG iteration. (b) Minimizing along the search direction $\boldsymbol {\mathcal {N}}$ on $\mathcal {M}$ . The step length $\alpha $ is obtained by minimizing the tangent line passing through $\boldsymbol {\mathcal {X}} \in \mathcal {M}_{r}$ with direction $\boldsymbol {\mathcal {N}}_{k} \in T_{\boldsymbol {\mathcal {X}}_{k}}\mathcal {M}_{r}$ . We minimize along the search curve to obtain an optimal estimate of the step length $\alpha $ .

In the first step, we initialize tensor $\boldsymbol {\mathcal {X}}_{0}$ with random entries and move in the direction of the negative Riemannian gradient $\boldsymbol {\mathcal {N}}_{0}= - \theta _{0} \xi _{0}$ . This produces a new iterate that is not an element of $\mathcal {M}_{r}$ . We bring the new iterate $\boldsymbol {\mathcal {X}}_{0} + \alpha _{0}\boldsymbol {\mathcal {N}}_{0}$ back to manifold $\mathcal {M}_{r}$ using retraction (33). The spectral search direction $\boldsymbol {\mathcal {N}}_{k} \in T_{\boldsymbol {\mathcal {X}}_{k}}\mathcal {M}_{r}$ is computed by using the updating rule in (32). It requires taking a linear combination of the Riemannian gradient $\xi _{k}$ and quantities $\boldsymbol {\mathcal {S}}_{k-1}$ and $\zeta _{k-1}$ , which belong to a different tangent space $T_{\boldsymbol {\mathcal {X}}_{k-1}}\mathcal {M}_{r}$ . We map $\boldsymbol {\mathcal {S}}_{k-1}$ and $\zeta _{k-1}$ to the current tangent space with the vector transport $\tau _{\boldsymbol {\mathcal {X}}_{k-1} \rightarrow \mathcal {X}_{k}}$ . To generate the next iteration in the direction of sufficient descent, we compute the step length $\alpha _{k}$ using a nonmonotone line search strategy based on the improved Wolfe condition proposed in [56]. Next, with the spectral conjugate direction $\boldsymbol {\mathcal {N}}_{k}$ , we perform a step along the search curve $\gamma $ and obtain the next iteration by retraction (33) so that $\boldsymbol {\mathcal {X}}_{k} \in \mathcal {M}_{r}$ . Evaluating cost function (5) reduces the computation of the entries of the sparse tensor $\text {P}_{\Omega }\boldsymbol {\mathcal {X}}$ since the observed tensor $\text {P}_{\Omega }\boldsymbol {\mathcal {T}}$ is provided as input. The convergence condition is reached when the relative error between observed tensor $\text {P}_{\Omega }\boldsymbol {\mathcal {T}}$ and its approximation $\text {P}_{\Omega }\boldsymbol {\mathcal {X}}$ is smaller than a prescribed tolerance $\epsilon $ . We find the step length $\alpha $ , which is a solution to the cost function, defined on the geodesic search curve $\gamma (\alpha):= R_{\boldsymbol {\mathcal {X}}_{k}}(\alpha \boldsymbol {\mathcal {N}}_{k})$ , where $R_{\boldsymbol {\mathcal {X}}_{k}}(\alpha \boldsymbol {\mathcal {N}}_{k})$ is the retraction of $\alpha \boldsymbol {\mathcal {N}}_{k}$ at point $\boldsymbol {\mathcal {X}}_{k}$ . The cost function for the step length $\alpha $ is given as \begin{equation*} \phi (\alpha):= f(\gamma (\alpha)):= f(R_{\boldsymbol {\mathcal {X}}_{k}}(\alpha \boldsymbol {\mathcal {N}}_{k})). \tag{34}\end{equation*} View SourceRight-click on figure for MathML and additional features.

We determine the step length $\alpha _{*}$ such that the value $\phi (\alpha _{*})$ provides the sufficient reduction in cost function (5) and generates a descent direction such that $\boldsymbol {\mathcal {N}}_{k}^{T} \xi _{k} < 0$ . The one-dimensional minimizer for (34) given in [37] \begin{equation*} \alpha _{k}^{*}:= \text {argmin}_{\alpha > 0} \phi (\alpha):= \text {argmin}_{\alpha > 0}f(R_{\boldsymbol {\mathcal {X}}_{k}}(\alpha \boldsymbol {\mathcal {N}}_{k})) \tag{35}\end{equation*} View SourceRight-click on figure for MathML and additional features.

We minimize along the search curve to obtain the optimal value of the step length. The concept is illustrated in Fig. 2b. Our tensor completion problem (5) is a smooth quadratic function. Therefore, a good initial guess for the step length can be obtained with exact minimization of the tangent space without retraction.\begin{equation*} \min _{\alpha } f (\boldsymbol {\mathcal {X}} + \alpha \boldsymbol {\mathcal {N}}) = \frac {1}{2} \min _{\alpha }\Vert P_{\Omega }\boldsymbol {\mathcal {X}} \!+\! \alpha P_{\Omega }\boldsymbol {\mathcal {N}} \!- \!P_{\Omega }\boldsymbol {\mathcal {T}}\Vert _{F}^{2}\tag{36}\end{equation*} View SourceRight-click on figure for MathML and additional features.

The closed-form solution of (36) is derived by setting the $\alpha $ -derivative to 0 and is given by \begin{equation*} \alpha _{*} = \frac {\langle P_{\Omega }\boldsymbol {\mathcal {N}}, P_{\Omega }(\boldsymbol {\mathcal {X}} - \boldsymbol {\mathcal {T}}) \rangle }{\langle P_{\Omega }{\boldsymbol {\mathcal {N}}}, P_{\Omega }\boldsymbol {\mathcal {N}}\rangle }. \tag{37}\end{equation*} View SourceRight-click on figure for MathML and additional features.

To guarantee the global convergence of our RSCG-TT method, we use the nonmonotone line search strategy based on the improved Wolfe condition [56] with a combination of curvature conditions as follows:\begin{align*} \phi (\alpha)\leq&\phi _{k}(0) + \min \{\epsilon _{c} \vert \phi _{k}(0)\vert, \delta \alpha \phi _{k}(0) + l_{k} \} \tag{38}\\ \phi '_{k}(\alpha)\ge&\sigma \phi '_{k}(0), \tag{39}\\ l_{k}=&\frac {1}{k^{2}} \tag{40}\end{align*} View SourceRight-click on figure for MathML and additional features. where $\epsilon _{c} > 0$ , $0 < \delta < \sigma < 1$ , $l_{k}$ is a positive term so that a positive summable sequence $\{l_{k}\}$ satisfies condition $\sum _{k \ge 1} l_{k} < +\infty $ . As noted by [56], the positive term in (40) allows slight increase in the function value, and it helps to avoid the numerical drawback of the standard Wolfe condition [66] due to the existence of the numerical round-off errors or when the tolerance error is very low [56].

B. Complexity Analysis of the RSCG-TT Method

We analyze the time complexity of RSCG-TT for one iteration in Algorithm 1. The computational complexity of RSCG-TT is influenced by the following computations: the Riemannian gradient $\mathrm {grad} f(\boldsymbol {\mathcal {X}}_{k})$ in (8), the search direction $\boldsymbol {\mathcal {N}}_{k}$ in (32), the step-length $\alpha _{k}$ in (37), and the next iteration $\boldsymbol {\mathcal {X}}_{k+1}$ obtained by the retraction $R_{\boldsymbol {\mathcal {X}}_{k}}(\alpha _{k} \boldsymbol {\mathcal {N}}_{k})$ in (33). The computational complexities of TT decomposition and the Riemannian operators are given in [44] and [67], respectively. For simplicity, we assume $I=\max \{I_{1}, I_{2}, \cdots, I_{N}\}$ and the upper bound on TT rank is $R_{\mathrm {max}}=\max \{R_{0}, R_{1}, \cdots, R_{N}\}$ . The Riemannian gradient $\mathrm {grad} f(\boldsymbol {\mathcal {X}}_{k})$ is computed as a projection of the sparse Euclidean gradient in (7) onto the tangent space of $\boldsymbol {\mathcal {X}}$ , which results in a cost of $\mathcal {O}(NIR_{\mathrm {max}}^{3} + N \vert \Omega \vert R_{\mathrm {max}}^{2})$ . The estimates for the search direction $\boldsymbol {\mathcal {N}}_{k}$ and the next iteration $\boldsymbol {\mathcal {X}}_{k+1}$ can be calculated in $\mathcal {O}(NIR_{\mathrm {max}}^{3})$ , and the cost for the step length $\alpha _{k}$ is $\vert \Omega \vert (N-1) R_{\mathrm {max}}^{2}$ . Hence, for one iteration of RSCG-TT, the total computational cost is dominated by the computation of the Riemannian gradient and is given by \begin{equation*} \mathcal {O}(NIR_{\mathrm {max}}^{3} + N \vert \Omega \vert R_{\mathrm {max}}^{2}).\tag{41}\end{equation*} View SourceRight-click on figure for MathML and additional features.

SECTION IV.

Numerical Experiments

We assess the numerical performance of the RSCG-TT method with real fMRI data from the Center of Biomedical Research Excellence (COBRE), which is available from the COllaborative Informatics and Neuroimaging Suite (COINS) data exchange repository [68].1 We compare RSCG-TT with two other tensor completion stochastic gradient descent (SGD) methods, namely, Tucker-SGD and CPD-SGD.

A. Image Acquisition and Processing

We employed data from the COBRE study of the Mind Research Network (MRN). The rs-fMRI data consists of 149 volumes of $\text {T2}^{*}$ -weighted functional images each, acquired through a gradient-echo EPI sequence: $\text {TR} =2 \,\,\text {s}$ , $\text {TE}=29 \,\,\text {ms}$ , $\text {flip angle}=75^\circ $ , number of axial slices = 33 in sequential ascending order, $\text {slice thickness}= 3.5 \,\,\text {mm}$ , $\text {slice gap} = 1.05 \,\,\text {mm}$ , field of view = 240 mm, $\text {matrix size} \,\,=64 \times 64$ and voxel size = 3.75 mm $\times \,\,3.75$ mm $\times \,\,4.55$ mm. To remove the T1 equilibration effects, the first four volumes were discarded. Slice-timing correction was used to realign images. The data were spatially normalized to the standard Montreal Neurological Institute space, resampled to $3 \,\,\text {mm} \times 3 \,\,\text {mm} \times 3 \,\,\text {mm} $ voxels, and smoothed using a Gaussian kernel with a full width at a half maximum of 5 mm.

B. Execution Details

All experiments were performed with Python in TensorFlow on a Linux workstation with 4 Quad-Core Intel Xeon 3.1 GHz processors and 16 GB RAM. For all experiments, the input is a tensor $\text {P}_{\Omega }\boldsymbol {\mathcal {T}}$ projected onto the sampling set $\Omega $ that reflects the positions of the observed entries, where $\boldsymbol {\mathcal {T}}$ is the fully observed true tensor, and $*$ is the Hadamart (element-wise) product. The completed tensor $\hat {\boldsymbol {\mathcal {X}}}$ is computed as \begin{equation*} \hat {\boldsymbol {\mathcal {X}}} = (\boldsymbol {\mathcal {I}} - \text {P}_{\Omega }\boldsymbol {\mathcal {I}})* \boldsymbol {\mathcal {X}} + \text {P}_{\Omega }\boldsymbol {\mathcal {T}},\tag{42}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $\boldsymbol {\mathcal {I}}$ is the tensor of all ones the same size as $\boldsymbol {\mathcal {T}}$ , and $\boldsymbol {\mathcal {X}}$ is the output tensor estimated by each algorithm. RSCG-TT (Algorithm 1) was implemented using TT decomposition in the TensorFlow (T3F) [67] toolbox with its inbuilt operations to manipulate the tensor in TT format as well as its Riemannian operations. To find a step length $\alpha $ , we used the nonmonotone Wolfe line-search algorithm in [56].

The following parameters were used in the implementation of the proposed method:

$\epsilon =10^{-8}$ , $p=0.001$ , $m_{1}=10^{-8}$ , $\gamma =1.2$ , $q=3$ , $\delta =0.0001$ , and $\sigma =0.9$ . Here, the values for $p,m_{1},\gamma, q$ are the values recommended in [60], and the values for $\delta, \sigma $ were selected based on the typical values used in the nonmonotone Wolfe line-search algorithms suggested in [55], [56], [69]. We further fine-tuned the selection of the optimization hyperparameters based on the recommended ranges in the literature using a grid search with cross validation on the validation set $\Omega _{\Gamma }$ . We created a training set based on the sampling set $\Omega $ by randomly selecting the indices $i_{k}$ for each tensor dimension $I_{k}$ from a uniform distribution on $\{1, \cdots, I_{k}\}$ . We generated a test set $\Omega _{\mathrm {T}}$ and a validation set $\Omega _{\Gamma }$ with random sampling from the missing value set $\overline {\Omega }$ , where 80% of the set was allocated to the test set and 20% was allocated to the validation set. The stopping criteria for RSCG-TT were set as follows: reaching a maximum 500 iterations or achieving the convergence tolerance $\epsilon \le 10^{-8}$ .

C. Experimental Design

Our goal is to study the performance of the proposed method in terms of its ability to recover missing voxels with different missing patterns and proportions in real fMRI data. We conduct hybrid experiments in different tensor dimensions and study the effectiveness of exploiting the multidimensional structure through 3D and 4D representations of fMRI data in terms of reconstruction errors. In addition, we compare the numerical performance and convergence rate of the proposed algorithm with the state-of-the-art SGD optimization [70], [71] for tensor completion in the form of Tucker and CPD decompositions. The experiments were carried out considering the percentage and pattern of missing values and tensor dimensionality. The design conditions are summarized in Table 3. We executed 50 runs for each setup and reported the averages.

TABLE 3 fMRI Data Completion Design Factors
Table 3- 
fMRI Data Completion Design Factors

D. Application to Real fMRI Data

As we stated in Section II-B, tensor completion performs best on low-rank datasets. We exploit the low-rank property of fMRI data [19] and cast fMRI tensor completion as a specific case of the tensor completion problem on the TT manifold (5). TT decomposition applied on fMRI data captures the global correlation between spatial and temporal modes and thereby allows the optimal estimation of missing voxels. We evaluate our method on two types of missing value patterns. First, random missing values are simply used, and the other type is a more realistic structural missing values pattern based on 3D spatial ellipsoids. We briefly describe the background and the experimental setup of these test cases in Sections IV-F and IV-E–​IV-G.

E. Magnetic Susceptibilities in fMRI Data

FMRI is inherently susceptible to geometric distortions caused by magnetic field inhomogeneity [8] given its relatively long total readout times. Furthermore, this data distortion is often significant in the phase encoding direction. The primary source of field inhomogeneity is differences in the susceptibilities of neighboring tissues [72]. Specifically, there are significant spatial variations at tissue borders/edges. The most vulnerable areas in brain MRI to geometric distortions are the frontal lobe proximal to the paranasal sinuses, as well as the temporal lobe proximal to the mastoid air cells and ear canals [72], [73]. Geometric distortion is often most severe in regions where air-filled sinuses border bone or tissue, including the frontal and temporal lobes. Consequently, these spatial variations produce rapid changes in magnetic field gradients. Different techniques for correcting fMRI image distortion have been reported [3], [72], [74]–​[77]. Most of these approaches exclude brain regions with significant variations in the magnetic field gradients in the phase encoding direction or slice selection direction based on a field quality map. The field quality map is computed based on the magnitude or phase derivative variance [3], [72], [74], [77]. The field quality maps are robust in identifying noisy areas in fMRI images and are used to address geometric distortions in fMRI studies. The acquired quality maps are used to develop binary quality masks to threshold the brain images. One such method is the quality map phase denoising (QPMD) method for fMRI data [3]; this method identifies the noisy regions based on the partial derivatives $\nabla _{x,y, z}$ of the phase image, where the high gradient values correspond to low signal quality, and therefore, the corresponding voxels are excluded from the analysis. This results in MNAR missing voxels in areas, including the sinuses, which are susceptible to gradient changes, and the ventral/medial prefrontal areas. This study proposed addressing this issue through 4D fMRI tensor completion, where the missing voxels are represented by a 3D brain mask.

F. Random Missing Value (RMV) Pattern

This problem is identical to the one described in [7], [8] and corresponds to the MAR pattern [6] observed in fMRI studies. To simulate an RMV pattern, we generated a sampling set $\Omega $ w.r.t different missing value rates ($MR$ ) defined as \begin{equation*} MR_{RMV} = \frac {m}{\prod _{k=1}^{N}I_{k}},\tag{43}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $m$ is the number of missing entries. The indices of observed entries in the sampling set $\Omega $ are chosen by randomly sampling each dimension index $I_{n}$ from a uniform distribution on $\{1, I_{n}\}$ .

G. Structural Missing Values (SMV) Pattern

This test case is equivalent to the MNAR fMRI pattern described in Section IV-E. To evaluate the proposed algorithm on a realistic MNAR test case, we generate analytical ellipsoids with missing voxels that satisfy the 3D ellipsoid equation \begin{equation*} \frac {(x-x_{0})^{2}}{r_{x}^{2}} + \frac {(y-y_{0})^{2}}{r_{y}^{2}} + \frac {(z-z_{0})^{2}}{r_{z}^{2}} = 1,\tag{44}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $(x_{0}, y_{0}, z_{0})$ is the origin of the ellipsoid and $(r_{x}, r_{y}, r_{z})$ is the radius in the direction of each spatial dimension. A spatial location $(x_{0}, y_{0}, z_{0})$ was manually selected in a brain volume, and ellipsoid (44) with missing voxels was injected into this region. Furthermore, to simulate the case where several 3D fMRI brain volumes are impacted by spatial ellipsoid (44), we place the spatial ellipsoid at multiple randomly selected timepoints in the 4D fMRI brain scan. The number of affected timepoints is selected to simulate the percentage of missing values in the temporal sense. In addition, we study the effect of the size of the spatial ellipsoid to evaluate the estimation capabilities of the proposed method that are close to a real-world scenarios.

Next, we define the quantities that we use to measure the performance of algorithms completing the data affected by SMV patterns as follows:\begin{align*} V_{e^{3D}}=&(4/3) \pi r_{x} r_{y} r_{z},_{\text {(voxels)}^{3}}, \tag{45}\\ V_{e^{4D}}=&(4/3) \pi r_{x} r_{y} r_{z} \times T, _{\text {(voxels)}^{3} \times \text {time}},\tag{46}\\ V_{3D}=&X \times Y \times Z, _{\text {(voxels)}^{3}}.\tag{47}\end{align*} View SourceRight-click on figure for MathML and additional features. where $V_{e^{3D}}$ and $V_{e^{4D}}$ are 3D spatial and 4D hypervolumes of ellipsoid (44), $V_{3D}$ is the total volume of the fMRI brain scan, $X, Y, Z$ are the numbers of voxels in each spatial dimension, and $T$ is the number of timepoints comprising the fMRI brain scan. The metadata of the generated ellipsoids are given in Table 4. The missing value rate metrics for the SMV pattern are given as \begin{align*} MR_{V_{e^{3D}}}=&\frac {V_{e^{3D}}}{V_{3D}}, \tag{48a}\\ MR_{T}=&\frac {M_{T}}{T}, \tag{48b}\end{align*} View SourceRight-click on figure for MathML and additional features. where $MR_{V_{e^{3D}}}$ is the spatial missing value rate measured as the ratio of the spatial volume of the ellipsoid with missing voxels in (45) to the total spatial volume in (47). $MR_{T}$ is the temporal missing value rate across the entire 4D fMRI brain scan, where $M_{T}$ is the number of timepoints affected by the artifact in (44). In the SMV experiment, we use the discrete values missing ratio set $MR_{T}(\%) \in \{5, 10, 15, 20\}\%$ , and for each value of $MR_{T}$ , we vary the volume of the ellipsoid artifact according to values in Table 4 and the tensor dimension values with levels as per Table 3. Conceptually, the 4D fMRI tensor completion process for the SMV pattern is visualized in Fig. 3. The 4D fMRI hypervolume is corrupted adding spatial ellipsoids with missing voxels into a 3D whole-brain volume at randomly selected timeframes. We feed the entire 4D fMRI tensor with corrupted 3D whole-brain volumes at timepoints $T_{i}$ into the proposed algorithm, which completes the 4D tensor simultaneously for all corrupted 3D spatial brain volumes.

TABLE 4 Ellipsoid Volumes
Table 4- 
Ellipsoid Volumes
FIGURE 3. - 4D fMRI tensor completion of missing voxels corrupted by the SMV pattern. (a) 4D fMRI hypervolume (3D voxels 
$\times $
 T) is corrupted by the spatial ellipsoids with missing voxels embedded in 3D brain volumes. (b) Completed 4D fMRI hypervolume (3D voxels 
$\times $
 T).
FIGURE 3.

4D fMRI tensor completion of missing voxels corrupted by the SMV pattern. (a) 4D fMRI hypervolume (3D voxels $\times $ T) is corrupted by the spatial ellipsoids with missing voxels embedded in 3D brain volumes. (b) Completed 4D fMRI hypervolume (3D voxels $\times $ T).

H. Rank Estimation

The tensor completion problem in (5) requires that the rank of a high-dimensional dataset be known in advance. Specifically, for TT decomposition, we have to determine the multilinear TT rank in (2). The TT ranks $R_{n}$ have an upper bound determined by a CPD approximation with tensor rank $R$ and accuracy $\epsilon _{\mathrm {TT}}$ , such that $R_{n} \le R$ and TT decomposition can be computed with a relative accuracy of $\sqrt {N-1}\epsilon _{\mathrm {TT}}$ [44], where $N$ is the tensor dimension. We estimate the CPD-rank $R$ and multilinear Tucker-rank with the aid of automatic relevance determination (ARD) [78] and the L-curve algorithm [79], [80]. The stopping criterion for the ARD algorithm is set to when the relative tolerance of the negative log-likelihood is below 10−8, which results in the explained variance being in the range of 0.9998-0.9999. The ranks computed by the ARD algorithm are $R_{\mathrm {CPD}_{\mathrm {4D}}} = 81, R_{\mathrm {CPD}_{\mathrm {3D}}} = 46$ for CPD decomposition and $\mathbf {r}_{\mathrm {Tucker}_{\mathrm {4D}}} = (53, 63, 46, 91), \mathbf {r}_{\mathrm {Tucker}_{\mathrm {3D}}} = (1630, 43, 144)$ for Tucker decomposition.

We use the computed CPD-rank $R$ as the maximal rank constraint $R_{\mathrm {max}}$ and as the input to the TT-SVD algorithm [44], and the values of the TT rank are determined by a quasi-optimal TT approximation. The multilinear $\mathbf {r}_{\mathrm {4D}}$ and $\mathbf {r}_{\mathrm {3D}}$ TT ranks are estimated as a result of the TT-rounding algorithm with the imposed maximal rank constraints $R_{{\mathrm {4D}}_{\mathrm {max}}} = R_{\mathrm {CPD}_{\mathrm {4D}}} = 81$ , $R_{3D_{\mathrm {max}}} = R_{\mathrm {CPD}_{\mathrm {3D}}} =46$ , and $R_{2D_{\mathrm {max}}} = 144$ for 4D, 3D, and 2D tensor representations accordingly. The maximal rank for the $2D$ representation is selected as a column rank of the reshaped $2D$ matrix of the input data tensor. The following TT rank values were estimated for each tensor dimensionality: $\mathbf {r}_{\mathrm {4D}_{\mathrm {TT}}} = (R_{0}, R_{1}, R_{2}, R_{3}, R_{4}) = (1, 81, 81, 81, 1)$ , $\mathbf {r}_{\mathrm {3D}_{\mathrm {TT}}} = (R_{0}, R_{1}, R_{2}, R_{3}) = (1, 46, 46, 1)$ , and $\mathbf {r}_{\mathrm {2D}_{\mathrm {TT}}} = (R_{0},R_{1}, R_{2}) = (1, 144, 1)$ . The ranks $R_{0}$ and $R_{\mathrm {N}}$ are restricted to one by the definition of TT decomposition, whereas ranks $R_{n}$ are selected to be the minimum values of the column ranks of the unfolding matrices $\mathbf {X}_{ < n > } \in \mathbb {R}^{I_{1}\,\,I_{2} \cdots I_{n} \times I_{n+1} \cdots I_{N} }$ [44] and the maximal rank $R_{\mathrm {max}}$ .

I. Quantitative Evaluation

We compare RSCG-TT with two other methods: Tucker-SGD and CPD-SGD. The compared algorithms employ the SGD optimization method called Adaptive Moment Estimation (Adam) [70], which is known for its prominent performance. The Adam’s update is given as follows:\begin{align*} x_{k+1} \!=\! x_{k} - \frac {d}{\sqrt {v_{k}(f_{k}(x_{k}), \beta _{2}, v_{k-1})}\!+\!\epsilon }m_{k}(f_{k}(x_{k}), \beta _{1}, m_{k-1}),\!\!\! \\\tag{49}\end{align*} View SourceRight-click on figure for MathML and additional features. where $d$ , $\epsilon $ , $\beta _{1}$ , and $\beta _{2}$ are hyperparameters and $m_{k}$ and $v_{k}$ are the first- and second-moment estimates of gradient $\nabla f_{k}(x_{k})$ . We selected the default values for the hyperparameters and learning rate $\alpha $ : $\epsilon =10^{-8}$ , $\beta _{1}=0.9$ , $\beta _{2}=0.999$ , and $\alpha =0.001$ , as suggested in [70]. All three comparison algorithms are implemented in the TensorLy [81] framework.

To measure the numerical performance of the algorithms, we report the relative square error (RSE) between the completed tensor $\hat {\boldsymbol {\mathcal {X}}}$ and the true tensor $\boldsymbol {\mathcal {T}}$ , which is defined as \begin{equation*} \text {RSE} = \frac {\Vert \hat {\boldsymbol {\mathcal {X}}} - \boldsymbol {\mathcal {T}} \Vert _{F}}{\Vert \boldsymbol {\mathcal {T}} \Vert _{F}}.\tag{50}\end{equation*} View SourceRight-click on figure for MathML and additional features.

However, to assess the reconstruction quality of the missing values, we use the tensor completion score (TCS) [14], which is given as \begin{equation*} \text {TCS} = \frac {\Vert (\boldsymbol {\mathcal {I}} - P_\Omega \boldsymbol {\mathcal {I}}) * (\hat {\boldsymbol {\mathcal {X}} } - \boldsymbol {\mathcal {T}})\Vert _{F}}{\Vert (\boldsymbol {\mathcal {I}} - P_\Omega \boldsymbol {\mathcal {I}}) * \boldsymbol {\mathcal {T}} \Vert _{F}}.\tag{51}\end{equation*} View SourceRight-click on figure for MathML and additional features.

The RSE metric in (50) is used to evaluate the overall global numerical performance, whereas the TCS in (51) is used to assess the RSE in the missing values [14]. To accommodate the specific needs of fMRI data, we define a new threshold metric based on (51), which measures the recovery of missing voxels in the data regions after thresholding $Z$ -scores by two times the standard deviation \begin{equation*} \text {TCS}_{Z} = \frac {\Vert (\boldsymbol {\mathcal {I}} - P_\Omega \boldsymbol {\mathcal {I}}) * (\hat {\boldsymbol {\mathcal {X}}} _{Z} - \boldsymbol {\mathcal {T}} _{Z})\Vert _{F}}{\Vert (\boldsymbol {\mathcal {I}} - P_\Omega \boldsymbol {\mathcal {I}}) * \boldsymbol {\mathcal {T}} _{Z} \Vert _{F}}.\tag{52}\end{equation*} View SourceRight-click on figure for MathML and additional features.

We detect the stagnation and stop the algorithms when a relative residual of the objective function in $\epsilon _{\Omega }$ between the successive iterations drops below the prescribed tolerance $\epsilon $ :\begin{equation*} \epsilon _{\Omega }(f(\boldsymbol {\mathcal {X}}_{k})) = \frac {\vert f(\boldsymbol {\mathcal {X}}_{k}) - f(\boldsymbol {\mathcal {X}}_{k-1})\vert }{\vert f(\boldsymbol {\mathcal {X}}_{k-1}) \vert } \le \epsilon\tag{53}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $\epsilon =10^{-8}$ and the maximum number of iterations $\text {max}_{\text {iter}}=500$ .

Furthermore, we evaluate whether there is a significant effect between study groups using one-way analysis of variance (ANOVA) and reported $F$ -statistics along with $F$ -values and $p$ -values. Post hoc analyses with two-tailed parametric t-tests and corrections for multiple comparisons that use the false discovery rate (FDR) are performed to determine whether the difference between the performance of the contrasts of interest is significant and are described by the $p$ -value, $T$ -value, and $H_{0}$ -value. The latter shows the result from evaluating the null hypothesis and that there is a significant difference between two contrasts (i.e., $H_{0}=1$ indicates that the null hypothesis is rejected, and if $H_{0}=0$ , the null hypothesis is not rejected). We also provided the mean ($M$ ) of the measure of interest and the standard deviation ($SD$ ). In this paper, we use the 95% ($p < 0.05$ ) confidence interval (CI) to reject the null hypothesis. Of note, the relevant statistical results of group fMRI analyses are corrected for multiple comparisons with the familywise error (FWE), where $p < 0.05$ .

SECTION V.

Results

This section is divided into three subsections. The first subsection describes the data completion results with the RMV pattern (see Section IV-F) as a performance baseline for RSCG-TT. The second and third subsections are used to evaluate the robustness of RSCG-TT on more challenging tasks such as SMV pattern completion (see Section IV-G) and resting network estimation. Throughout the paper, we provide the results averaged across $N=50$ subjects from the COBRE dataset described in Section IV-A. In this paper, the individual subject datasets are normalized to have unit variance and a zero mean before they are input into each tensor completion algorithm. Fig. 4 presents examples of three randomly selected subjects with fMRI scans corrupted by the RMV pattern and subsequently recovered by the RSCG-TT algorithm as well as other algorithms for comparison.

FIGURE 4. - Visual comparison of 4D fMRI data completion with RSCG-TT, CPD-SGD, and Tucker-SGD on multiple subjects when subject scans are corrupted by the RMV pattern with 
$MR_{MRV}(\%)=50\%$
. The subject-specific scans are shown from left to right using axial projection: original scan, corrupted scan, and completed scan. (a)-(c) RSCG-TT. (d)-(f) CPD-SGD. (g)-(i) Tucker-SGD.
FIGURE 4.

Visual comparison of 4D fMRI data completion with RSCG-TT, CPD-SGD, and Tucker-SGD on multiple subjects when subject scans are corrupted by the RMV pattern with $MR_{MRV}(\%)=50\%$ . The subject-specific scans are shown from left to right using axial projection: original scan, corrupted scan, and completed scan. (a)-(c) RSCG-TT. (d)-(f) CPD-SGD. (g)-(i) Tucker-SGD.

A. RMV Pattern

In this section, RSCG-TT is evaluated for RMV pattern completion that uses 4D, 3D, and 2D representations. We compare the accuracy and convergence of RSCG-TT with the state-of-the-art CPD-SCD and Tucker-SCG methods for different problem sizes and tensor dimensionalities.

1) Cost Comparison by Missing Value Rates

We investigate the performance of RSCG-TT using the 4D spatiotemporal structure of fMRI scans from the COBRE dataset with 10−90% voxels missing at random. See Section IV-F for a detailed description of the setup. The size of a 4D fMRI tensor is $\boldsymbol {\mathcal {X}} \in \mathbb {R}^{53\times 63 \times 46 \times 144}$ . We show an example of completed fMRI brain volume with $MR_{RMV}(\%)=60\%$ at specific timepoints in Fig. 5. We use the TCS in (51) and its threshold counterpart $\text {TCS}_{Z}$ in (52) to evaluate the quality of data completion. We summarize the results of 4D tensor completion for the RMV pattern using the RSCG-TT algorithm with different $MR_{RMV}(\%)$ in Fig. 6. As shown in Fig. 5, RSCG-TT yields very low TCS values, and the recovered image in Fig. 5 is almost the same as the original image. The qualitative assessment of the completed images with $MR_{RMV}(\%)~\in \{25\%, 50\%, 75\%, 90\%\}$ , shown in Fig. 6a–d, suggests that RSCG-TT can recover the 4D fMRI tensor with high accuracy when there is a significant amount of missing values. We present the convergence curves of the mean RSE and the objective function w.r.t. to the number of iterations by applying RSCG-TT in Fig. 6e–f. The convergence curves shown in Fig. 6e–f illustrate the monotonic conversion of cost function (5). Furthermore, for the RMV pattern, the RSCG-TT algorithm exhibits rapid convergence even for larger $MR_{RMV}(\%)$ . The loss curve in Fig. 6e demonstrates that the stationary point is reached in fewer than fifty iterations.

FIGURE 5. - 4D fMRI tensor completion for the RMV pattern with 
$MR_{RMV}(\%)=60\%$
 at the 69th timepoint by applying the RSCG-TT algorithm. Orthogonal projections of the image are shown from left to right: coronal, sagittal, and axial views. (a) Original fMRI image. (b) fMRI image with randomly missing voxels within the brain mask. (c) Completed fMRI image with the following performance metrics 
$\text {TCS} = 0.773 \times 10^{-3}$
 and 
$\text {TCS}_{Z} = 1.04 \times 10^{-3}$
.
FIGURE 5.

4D fMRI tensor completion for the RMV pattern with $MR_{RMV}(\%)=60\%$ at the 69th timepoint by applying the RSCG-TT algorithm. Orthogonal projections of the image are shown from left to right: coronal, sagittal, and axial views. (a) Original fMRI image. (b) fMRI image with randomly missing voxels within the brain mask. (c) Completed fMRI image with the following performance metrics $\text {TCS} = 0.773 \times 10^{-3}$ and $\text {TCS}_{Z} = 1.04 \times 10^{-3}$ .

FIGURE 6. - RMV pattern 4D fMRI data completion with the RSCG-TT algorithm. The completed 4D fMRI tensor for the RMV pattern with different missing ratios and the RSCG-TT algorithm. Left to right: the original image, its copy with the prescribed missing ratio, and the completed image. (a) 
$MR_{RMV}(\%)=25\%$
, (b) 
$MR_{RMV}(\%)=50\%$
, (c) 
$MR_{RMV}(\%)=75\%$
, (d) 
$MR_{RMV}(\%)=90\%$
. Convergence curves for the RMV pattern using the 4D tensor structure and RSCG-TT algorithm as a function of the number of iterations with missing values 
$\in \{25\%, 50\%, 75\%\}$
. (e) Cost function value. (f) Mean RSE.
FIGURE 6.

RMV pattern 4D fMRI data completion with the RSCG-TT algorithm. The completed 4D fMRI tensor for the RMV pattern with different missing ratios and the RSCG-TT algorithm. Left to right: the original image, its copy with the prescribed missing ratio, and the completed image. (a) $MR_{RMV}(\%)=25\%$ , (b) $MR_{RMV}(\%)=50\%$ , (c) $MR_{RMV}(\%)=75\%$ , (d) $MR_{RMV}(\%)=90\%$ . Convergence curves for the RMV pattern using the 4D tensor structure and RSCG-TT algorithm as a function of the number of iterations with missing values $\in \{25\%, 50\%, 75\%\}$ . (e) Cost function value. (f) Mean RSE.

2) Performance of RSCG-TT w.r.t. Tensor Dimensionality

In this section, we show the results of studying the performance of RSCG-TT for different tensor dimensions and different $MR_{RMV}(\%)$ values. We conduct fMRI tensor completion experiments for three different dimensions: yielding $153524 \times 144$ (2D), $3339 \times 46 \times 144$ (3D), and $53 \times 63 \times 46 \times 144$ (4D). The $MR_{RMV}(\%)$ varies from 10% to 90%. We cast the original 4D fMRI tensor to 2D and 3D tensors with TT ranks of $\mathbf {r}_{2\text {D}_{\mathrm {TT}}}=(1,144,1)$ and $\mathbf {r}_{3\text {D}_{\mathrm {TT}}}=(1,46,46,1)$ , respectively, and the 4D tensor has a TT rank of $\mathbf {r}_{4\text {D}_{\mathrm {TT}}}=(1,81,81,81,1)$ . The rank estimation procedure is described in Section IV-H. We summarize the effect of tensor dimensionality on the performance of RSCG-TT by applying the RMV patterns in Figs. 7, 8, Tables 5, and 6. The main effect of tensor dimensionality after the post hoc tests and the distribution of the mean TCS of RSCG-TT on each $MR_{RMV}(\%) \in \{10-90\%\}$ are shown in Fig. 7, Tables 6, and 7. Fig. 8 shows that when the fMRI scan is represented by a 4D tensor, the RSCG-TT algorithm has the lowest TCS for each $MR_{RMV}(\%)$ and numerically outperforms the 2D and 3D representations. Furthermore, it is clearly observed that the performance of the RSCG-TT method degrades as the dimension of the data decreases from 4D to 3D and 2D. The superior performance of 4D tensor completion can be explained by the ability of the 4D TT manifold to take full advantage of the spatiotemporal correlations among the modes of the fMRI tensor.

TABLE 5 Mean and Standard Deviation of the TCS for the RMV Pattern With Different Percentages of Missing Values by Applying the RSCG-TT Algorithm. The Plots Corresponding to This Table Can be Seen in Fig. 7
Table 5- 
Mean and Standard Deviation of the TCS for the RMV Pattern With Different Percentages of Missing Values by Applying the RSCG-TT Algorithm. The Plots Corresponding to This Table Can be Seen in Fig. 7
TABLE 6 Comparison of the Mean TCS by Tensor Dimension for the RMV Pattern With the RSCG-TT Algorithm. The Plots Corresponding to This Table Can be Seen in Fig. 7
Table 6- 
Comparison of the Mean TCS by Tensor Dimension for the RMV Pattern With the RSCG-TT Algorithm. The Plots Corresponding to This Table Can be Seen in Fig. 7
TABLE 7 Comparison of the Mean Voxel Intensity Between the Original fMRI Image and the Completed Images Using the RSCG-TT Algorithm for the RMV Pattern and Different $MR_{RMV}(\%)$ . The Plots Corresponding to This Table Can be Seen in Fig. 9
Table 7- 
Comparison of the Mean Voxel Intensity Between the Original fMRI Image and the Completed Images Using the RSCG-TT Algorithm for the RMV Pattern and Different 
$MR_{RMV}(\%)$
. The Plots Corresponding to This Table Can be Seen in Fig. 9
FIGURE 7. - Effect of the tensor dimensionality on the distribution of the mean TCS (N=50) for the RMV pattern with the RSCG-TT algorithm. The boxplots summarize the distribution of the mean TCS scores for 2D, 3D, and 4D tensor dimensions. The median values are represented by black lines inside the boxplot, and the tops of the whisker lines indicate the 25th and 75th percentile values. The mean values are plotted in blue, and outliers are represented by red circles. Error bars represent the standard error of the mean. (a) Main effect of the tensor dimensionality on the mean TCS: 2D vs 3D vs 4D. Post hoc analyses with two-tailed t-tests (FDR corrected) indicate that the mean TCS for the 4D tensor (
$M=8.354 \times 10^{-4}, SD=1.184 \times 10^{-4}$
) is significantly lower than that for the 2D (
$M=1.715 \times 10^{-2}, SD=3.632 \times 10^{-3}$
) and 3D (
$M=5.406 \times 10^{-3}, SD=8.285 \times 10^{-4}$
) tensors with 
$p < 0.0001$
. **** indicates 
$p < 0.0001$
 (post hoc and FDR corrected), which means significantly different. Distribution of the mean TCS (N = 50) with 
$MR_{RMV}(\%)\,\,\in \,\,\{10-90\%\}$
. (b) Distribution of the mean TCS for tensor dimension 
$D=2$
. (c) Distribution of the mean TCS for tensor dimension 
$D=3$
. (d) Distribution of the mean TCS for tensor dimension 
$D=4$
. The mean TCS results are listed in Table V.
FIGURE 7.

Effect of the tensor dimensionality on the distribution of the mean TCS (N=50) for the RMV pattern with the RSCG-TT algorithm. The boxplots summarize the distribution of the mean TCS scores for 2D, 3D, and 4D tensor dimensions. The median values are represented by black lines inside the boxplot, and the tops of the whisker lines indicate the 25th and 75th percentile values. The mean values are plotted in blue, and outliers are represented by red circles. Error bars represent the standard error of the mean. (a) Main effect of the tensor dimensionality on the mean TCS: 2D vs 3D vs 4D. Post hoc analyses with two-tailed t-tests (FDR corrected) indicate that the mean TCS for the 4D tensor ($M=8.354 \times 10^{-4}, SD=1.184 \times 10^{-4}$ ) is significantly lower than that for the 2D ($M=1.715 \times 10^{-2}, SD=3.632 \times 10^{-3}$ ) and 3D ($M=5.406 \times 10^{-3}, SD=8.285 \times 10^{-4}$ ) tensors with $p < 0.0001$ . **** indicates $p < 0.0001$ (post hoc and FDR corrected), which means significantly different. Distribution of the mean TCS (N = 50) with $MR_{RMV}(\%)\,\,\in \,\,\{10-90\%\}$ . (b) Distribution of the mean TCS for tensor dimension $D=2$ . (c) Distribution of the mean TCS for tensor dimension $D=3$ . (d) Distribution of the mean TCS for tensor dimension $D=4$ . The mean TCS results are listed in Table V.

FIGURE 8. - Effect of tensor dimensionality on fMRI data completion by missing value rates for tensor dimensions 
$D$
=2,3,4 on the RMV pattern. The mean TCS results are shown. (
$N=50$
). (a) TCS. (b) 
$\mathrm {TCS_{Z}}$
.
FIGURE 8.

Effect of tensor dimensionality on fMRI data completion by missing value rates for tensor dimensions $D$ =2,3,4 on the RMV pattern. The mean TCS results are shown. ($N=50$ ). (a) TCS. (b) $\mathrm {TCS_{Z}}$ .

Fig. 8b shows the metric $\mathrm {TCS}_{Z}$ is relatively higher than the TCS. This observation illustrates that RSCG-TT performs better in the areas of the fMRI scan that have high activation since the $\mathrm {TCS}_{Z}$ metric takes only higher values ($\vert Z \vert > 2$ ) into consideration, while the TCS takes all values into consideration. Both the 4D and 3D representations still yield better numerical performance in the range of $0.639 \times 10^{-3}$ to $1.162\times 10^{-2}$ compared with the 2D representation.

We performed ANOVA and showed that the main effect of tensor dimensionality (2D vs 3D vs 4D) was significant; $F(2, 1347)=553.85, p < 0.0001$ . Post hoc analyses with two-tailed t-tests and corrections for multiple comparisons with FDR indicated that the mean TCS for the 4D tensor ($M=8.354 \times 10^{-4}, SD=1.184 \times 10^{-4}$ ) was significantly lower than that for the 2D ($M=1.715 \times 10^{-2}, SD=3.632 \times 10^{-3}, t(98)=31.757, p < 0.0001$ ) and 3D ($M=5.406 \times 10^{-3}, SD=7.285 \times 10^{-4}, t(98)=38.615, p < 0.0001$ ) tensors (see Table 6 and Fig. 7). We further studied the ability of RSCG-TT to fully recover original images corrupted with varying $MR_{RMV}(\%)$ levels for different tensor dimensions in the RMV pattern. We illustrate the results of the experiment in Fig. 9 and Table 7. We quantitatively evaluated the statistical significance between the mean voxel intensity values of the original fMRI images and the images after data completion for tensor dimensions $D = 2, 3, 4$ and $MR_{RMV}(\%)\,\,\in \{10-90\% \}$ . Post hoc analyses with two-tailed t-tests (FDR corrected) and the significance level $p < 0.05$ showed that RSCG-TT can recover the original image with $MRV_{RMV}(\%)$ values in range of {10−60%} for 3D, {10−80%} for 4D tensor, and only 10% for 2D tensor format. As shown in Table 7, and Fig. 9, the mean voxel intensity values between the original and the completed images were statistically different for 2D format with $MR_{RMV}\,\,\in \,\,\{20-90\%\}$ . In contrast, the high-order tensor formats did not fully recover only the original data for the extremely high values of $MR_{RMV}(\%)\,\,\in \,\,\{70-90\%\}$ for 3D, and for 4D with $MR_{RMV}(\%) = 90\%$ . Together, the results presented in this section suggest that tensor dimensionality does have an effect on fMRI data completion. Specifically, our results demonstrate that the original 4D spatiotemporal representation of fMRI data has a higher accuracy than the 2D and 3D representations of the original data. In summary, our experiments show that RSCG-TT can recover corrupted fMRI data in presence of high missing value rates up to 80% when the data are kept in the natural 4D format.

FIGURE 9. - Comparison of 2D, 3D, 4D tensor completion on fMRI data for missing value rates 
$\in \{10-90\% \}$
 in the RMV pattern using RSCG-TT algorithm. (a) Left to right: original, corrupted, and completed fMRI images for 2D, 3D, 4D tensor dimensions using sagittal views are shown. (b) Main effect of the tensor dimensionality on the ability to recover the original image for the RMV pattern using different percentages of missing value rates. The boxplots summarize the distribution of the mean voxel intensity values for the 
$MR_{RMV}(\%)\,\,\in \{10-90\% \}$
 and tensor dimensions 
$D = 2, 3, 4$
. Post hoc analyses with two-tailed t-tests (FDR corrected) indicate that RSCG-TT can recover the original image with 
$MRV_{RMV}(\%)$
 in the range of {10−60%} for 3D, and for 4D tensor with 
$MRV_{RMV}(\%)\,\,\in \,\,\{10-80\%\}$
. **** indicates 
$p < 0.0001$
 (post hoc and FDR corrected), which means significantly different. i
FIGURE 9.

Comparison of 2D, 3D, 4D tensor completion on fMRI data for missing value rates $\in \{10-90\% \}$ in the RMV pattern using RSCG-TT algorithm. (a) Left to right: original, corrupted, and completed fMRI images for 2D, 3D, 4D tensor dimensions using sagittal views are shown. (b) Main effect of the tensor dimensionality on the ability to recover the original image for the RMV pattern using different percentages of missing value rates. The boxplots summarize the distribution of the mean voxel intensity values for the $MR_{RMV}(\%)\,\,\in \{10-90\% \}$ and tensor dimensions $D = 2, 3, 4$ . Post hoc analyses with two-tailed t-tests (FDR corrected) indicate that RSCG-TT can recover the original image with $MRV_{RMV}(\%)$ in the range of {10−60%} for 3D, and for 4D tensor with $MRV_{RMV}(\%)\,\,\in \,\,\{10-80\%\}$ . **** indicates $p < 0.0001$ (post hoc and FDR corrected), which means significantly different. i

3) Comparison of RSCG-TT With the State-of-the-Art Methods

In this section, we compare the performance of RSCG-TT and the state-of-the-art methods on the reconstruction task of 4D fMRI data with $MR_{RMV}(\%) \in \{10\%, 90\%\}$ with an increment of 10% corrupted by the RMV pattern. We evaluated two aspects of the performance: the numerical accuracy in terms of the RSE and TCS and the computational cost w.r.t. the convergence rate and the CPU time. We summarize the computational cost results of all algorithms in Fig. 10 and present the mean metrics of the numerical reconstruction performance results in Fig. 11 and Table 8. The main effect of the tensor completion method (CPD-SGD vs Tucker-SGD vs RSCG-TT) after post hoc analyses is shown in Table 9 and Fig. 12.

TABLE 8 Mean and Standard Deviation of the TCS With the RMV Pattern With Different Percentages of Missing Values for RSCG-TT and the State-of-the-Art Methods. The Plots Corresponding to This Table Can be Seen in Fig. 11
Table 8- 
Mean and Standard Deviation of the TCS With the RMV Pattern With Different Percentages of Missing Values for RSCG-TT and the State-of-the-Art Methods. The Plots Corresponding to This Table Can be Seen in Fig. 11
TABLE 9 Comparison of the Mean TCS Results of the Tensor Completion Methods for the RMV Pattern and Tensor Dimension D=4. The Plots Corresponding to This Table Can be Seen in Fig. 12
Table 9- 
Comparison of the Mean TCS Results of the Tensor Completion Methods for the RMV Pattern and Tensor Dimension D=4. The Plots Corresponding to This Table Can be Seen in Fig. 12
FIGURE 10. - Computational cost comparison of RSCG-TT and the state-of-the-art methods with different percentages of missing values in the RMV pattern. (a) The convergence curves with 
$MR_{RMV}(\%) = 50\%$
 w.r.t. the number of iterations. (b) The convergence curves with 
$MR_{RMV}(\%) = 50\%$
 w.r.t. to the CPU time.
FIGURE 10.

Computational cost comparison of RSCG-TT and the state-of-the-art methods with different percentages of missing values in the RMV pattern. (a) The convergence curves with $MR_{RMV}(\%) = 50\%$ w.r.t. the number of iterations. (b) The convergence curves with $MR_{RMV}(\%) = 50\%$ w.r.t. to the CPU time.

FIGURE 11. - Comparison of RSCG-TT, Tucker-SGD, and CPD-SGD 4D tensor completion on fMRI data for missing value rates 
$\in \{75\% \}$
 in the RMV pattern. The 1st, 69th, 119th, and 144th timepoints are shown from left to right with a coronal projection. (a) Original fMRI scan. (b) Missing fMRI data. (c) CPD-SGD. (d) Tucker-SGD. (e) RSCG-TT. Comparison of the numerical accuracies of RSCG-TT and the state-of-the-art methods with different percentages of missing values in the RMV pattern. The mean results are shown (
$N=50$
). (f) RSE comparison. (g) TCS comparison. (h) 
$\text {TCS}_{Z}$
 comparison.
FIGURE 11.

Comparison of RSCG-TT, Tucker-SGD, and CPD-SGD 4D tensor completion on fMRI data for missing value rates $\in \{75\% \}$ in the RMV pattern. The 1st, 69th, 119th, and 144th timepoints are shown from left to right with a coronal projection. (a) Original fMRI scan. (b) Missing fMRI data. (c) CPD-SGD. (d) Tucker-SGD. (e) RSCG-TT. Comparison of the numerical accuracies of RSCG-TT and the state-of-the-art methods with different percentages of missing values in the RMV pattern. The mean results are shown ($N=50$ ). (f) RSE comparison. (g) TCS comparison. (h) $\text {TCS}_{Z}$ comparison.

FIGURE 12. - Main effect of the tensor completion method on the mean TCS (
$N=50$
) for the RMV pattern and 
$D=4$
. The boxplots summarize the distribution of the mean TCS of the following tensor completion methods: CPD-SGD, Tucker-SGD, and RSCG-TT. The median values are represented by black lines inside the boxplot, and the tops of whisker lines represent the 25th and 75th percentile values. The mean values are plotted in blue, and outliers are represented by red circles. The error bars represent the standard error of the mean. Post hoc analyses with t-tests (FDR corrected) indicate that the mean TCS of RSCG-TT (
$M=8.354 \times 10^{-4}, SD=1.184 \times 10^{-4}$
) was significantly lower than those of the CPD-SGD (
$M=3.921 \times 10^{-2}, SD=1.087 \times 10^{-2}$
) and Tucker-SGD (
$M=7.831 \times 10^{-3}, SD= 1.918 \times 10^{-3}$
) tensor completion methods with 
$p < 0.0001$
. **** indicates 
$p < 0.0001$
 (post hoc and FDR corrected), which means significantly different.
FIGURE 12.

Main effect of the tensor completion method on the mean TCS ($N=50$ ) for the RMV pattern and $D=4$ . The boxplots summarize the distribution of the mean TCS of the following tensor completion methods: CPD-SGD, Tucker-SGD, and RSCG-TT. The median values are represented by black lines inside the boxplot, and the tops of whisker lines represent the 25th and 75th percentile values. The mean values are plotted in blue, and outliers are represented by red circles. The error bars represent the standard error of the mean. Post hoc analyses with t-tests (FDR corrected) indicate that the mean TCS of RSCG-TT ($M=8.354 \times 10^{-4}, SD=1.184 \times 10^{-4}$ ) was significantly lower than those of the CPD-SGD ($M=3.921 \times 10^{-2}, SD=1.087 \times 10^{-2}$ ) and Tucker-SGD ($M=7.831 \times 10^{-3}, SD= 1.918 \times 10^{-3}$ ) tensor completion methods with $p < 0.0001$ . **** indicates $p < 0.0001$ (post hoc and FDR corrected), which means significantly different.

We show the convergence rate of the objective function for all methods w.r.t to the iteration in Fig. 10a and present the convergence curves of the objective function w.r.t to CPU time in Fig. 10b. The convergence curves presented in Fig. 10 and the numerical results in Fig. 11 demonstrate that RSCG-TT outperforms both competitor algorithms in terms of numerical accuracy measures (the RSE and TCS) and computational costs. In Fig. 10b, we compare the computational efficiency of RSCG-TT and the state-of-the-art algorithms as a function of the number of iterations and the elapsed time. As shown in Fig. 10a, RSCG-TT converges rapidly compared with both the CPD-SGD and Tucker-SGD algorithms. The state-of-the-art algorithms require a much larger number of iterations to converge. As shown in Fig. 10, while the total elapsed time to converge is similar, the cost function accuracy of RSCG-TT is superior to the state-of-the-art algorithms by two to three fold. We present the mean ($N=50$ ) TCS and RSE results of 4D tensor completion with RSCG-TT and compare these results with the results of the state-of-the-art methods: CPD-SGD and Tucker-SGD in Fig. 11f–h. For the fixed rank model in the 4D experiments, we estimate the fixed rank parameters as $\mathbf {r}_{\mathrm {4D}_{\mathrm {TT}}}=(1,81,81,81,1)$ for the TT rank, $\mathbf {r}_{\mathrm {Tucker}_{\mathrm {4D}}} = (53, 63, 46, 91)$ for the Tucker-rank, and $R_{\mathrm {CPD}_{\mathrm {4D}}} = 81$ for the CPD-rank, as described in Section IV-H. In terms of the RSE, both RSCG-TT and Tucker-SCG demonstrate good numerical performance w.r.t. the RSE for RSCG-TT ($\text {RSE}_{{\text {RSCG-TT}}}$ ) in the range of $0.391 \times 10^{-3}$ to $0.857 \times 10^{-3}$ , and the RSE for Tucker-SGD ($\text {RSE}_{\text {Tucker-SGD}}$ ) is in the range from $5.529 \times 10^{-3}$ to $1.709 \times 10^{-2}$ . The RSE for CPD-SGD ($\text {RSE}_{\text {CPD-SGD}}$ ) is in the range of $2.782 \times 10^{-2}$ to $5.261 \times 10^{-2}$ , which is higher in order of magnitude than the RSE of both RSCG-TT and Tucker-SCG. The RSE results for the RMV pattern in the case of $D=4$ are shown in Fig. 11f. In terms of the TCS, we noted a similar range of values as we observed for the RSE of all the compared algorithms: $\text {TCS}_{{\text {RSCG-TT}}} \approx 0.639 \times 10^{-3} - 1.154 \times 10^{-3}$ , $\text {TCS}_{\text {Tucker-SGD}} \approx 3.981 \times 10^{-3} - 5.987 \times 10^{-3}$ , and $\text {TCS}_{\text {CPD-SGD}} \approx 2.243 \times 10^{-2} - 2.893 \times 10^{-2}$ . The TCS results for the RMV pattern in the case of $D=4$ are shown in Figs. 11g, 11h, and Table 8. As seen in Fig. 10, CPD-SGD and Tucker-SCG demonstrate a lower computational efficiency compared with RSCG-TT in terms of both the convergence rate and numerical accuracy of the objective function w.r.t. to the CPU time (see Figs. 10a and 10b). Moreover, it is worth noting that RSCG-TT results in consistently smaller $\mathrm {TCS}_{Z}$ values compared with the state-of-the-art methods, as demonstrated in Fig. 11h. This can be interpreted as RSCG-TT having a higher sensitivity in order to complete the data in the areas with very low activation values. Fig. 11a–e presents the results on the completed 4D fMRI data for $MR_{RMV}(\%) = 75\%$ with the state-of-the-art methods and RSCG-TT. A visual inspection of the completed images clearly shows that the CPD-SGD in Fig. 11c has blurred areas and suffers from poor reconstruction, the Tucker-SGD in Fig. 11d shows better performance, and the images completed by RSCG-TT are almost indistinguishable from the original images shown in Fig. 11a.

Based on the ANOVA results, we identified that the main effect of the tensor completion method (CPD- SGD vs Tucker-SGD vs RSCG-TT) was significant; i.e., $F(2, 1347)=628.83, p < 0.0001$ . Post hoc analyses (FDR corrected) with t-tests indicate that the mean TCS of RSCG-TT ($M=8.354 \times 10^{-4}, SD=1.184 \times 10^{-4}$ ) was significantly lower than those of the CPD-SGD ($M=3.929 \times 10^{-2}, SD=1.087 \times 10^{-2}, t(98)=25.003, p < 0.0001$ ) and Tucker-SGD ($M=7.831 \times 10^{-3}, SD=1.918 \times 10^{-3}, t(98)=25.735, p < 0.0001$ ) tensor completion methods (see Table 9 and Fig. 12).

B. SMV Pattern

In this section, we evaluate RSCG-TT with the SMV pattern described in Section IV-G. We investigate the joint effect of the spatiotemporal missing value rate and the effect of the tensor dimensionality to estimate the missing voxels that originated from the SMV pattern. We show an example of SMV pattern completion in Fig. 13.

FIGURE 13. - 4D fMRI tensor completion on the SMV pattern with 
$MR_{T}=15\%$
 and 
$MR_{V_{e^{3D}}}=1.75\%$
 at the 20th timepoint. The orthogonal projections of the image are shown from left to right: coronal, sagittal, and axial views. (a) Original fMRI image. (b) fMRI image with the spatial ellipsoid in (48a) filled with missing voxels of size 1 within a brain mask. (c) Completed fMRI image with the following performance metrics: 
$\text {TCS} = 6.931 \times 10^{-5}$
 and 
$\text {TCS}_{Z} = 3.92 \times 10^{-3}$
.
FIGURE 13.

4D fMRI tensor completion on the SMV pattern with $MR_{T}=15\%$ and $MR_{V_{e^{3D}}}=1.75\%$ at the 20th timepoint. The orthogonal projections of the image are shown from left to right: coronal, sagittal, and axial views. (a) Original fMRI image. (b) fMRI image with the spatial ellipsoid in (48a) filled with missing voxels of size 1 within a brain mask. (c) Completed fMRI image with the following performance metrics: $\text {TCS} = 6.931 \times 10^{-5}$ and $\text {TCS}_{Z} = 3.92 \times 10^{-3}$ .

1) Performance of RSCG-TT for Tensors of Different Dimensions by Varying the Spatiotemporal Missing Effect Size

In this section, we study the joint effect of the missing value rates in the spatial and temporal dimensions affected by the SMV pattern for tensors of different dimensions with RSCG-TT. We summarize the results of fMRI data completion for the SMV pattern with different $MR_{T}(\%)$ and $MR_{V_{e^{3D}},(\%)}$ values and RSCG-TT in Figs. 13 and 14. The results shown in Fig. 13 are part of the experiment when the SMV pattern is placed at randomly selected timepoints $T_{i}=\{1,4,5,14,20,33,\,\,37,\,\,62,\,\,64,\,\,65,\,\,71,\,\,78,\,\,94,\,\,104,\,\,110, 113,120,122,129,135,136\}$ and the temporal missing value rates are $MR_{T}(\%)=15\%$ . Fig. 13b shows an example in which a spatial ellipsoid with missing voxels of size 1 (see Table 3) is placed in the brain volume at spatial location $(x,y,z)=(2,32,22)$ and timepoint $T_{i}=20$ . The completed fMRI image at timepoint $T_{i}=20$ is shown in Fig. 13c. Fig. 14 illustrates the joint effect of both spatiotemporal factors $MR_{T}(\%)$ and $MR_{V_{e^{3D}},(\%)}$ on the quality of data completion. As we can observe from Fig. 14, the simultaneous increase in both $MR_{T}(\%)$ and $MR_{V_{e^{3D}},(\%)}$ impacts the numerical accuracy of data completion. However, as shown in Fig. 14, RSCG-TT obtains good quality data completion in terms of the TCS in the range of $1 \times 10^{-4} - 2 \times 10^{-3}$ for the problem sizes of the spatial ellipsoids $\in ~\{1,2,3,4\}$ and $MR_{T}(\%)$ up to 20%. The observed results for the SMV pattern are in agreement with those from previous works [13] and [14]. The effect size of the SMV pattern proportionally decreases the number of neighborhood voxels and, as a result, reduces the inferential power of the tensor completion methods.

FIGURE 14. - Results of 4D fMRI tensor completion on fMRI data with RSCG-TT and the SMV pattern by varying the spatial volume of the missing values for different percentages of missing temporal ratios. Left: The original brain volume at the 20th timepoint. (a) Left to right. Brain volumes at the 20th timepoint with embedded spatial ellipsoids with missing voxel sizes 
$\in ~\{1, 2, 3, 4, 5, 6\}$
 (see Table IV). (b)-(e) Top to bottom. Completed brain volumes for each spatial ellipsoid size and 
$MR_{T} \in \{5\%, 10\%, 15\%, 20\%\}$
.Comparison of fMRI tensor completion for 3D and 4D tensor representations on the SMV pattern for different percentages of spatiotemporal missing value ratios. The mean TCS results are shown (
$N=50$
). (f) 
$MR_{T}(\%)= 5\%$
. (g) 
$MR_{T}(\%)= 10\%$
. (h) 
$MR_{T}(\%)= 15\%$
. (i) 
$MR_{T}(\%)=20\%$
.
FIGURE 14.

Results of 4D fMRI tensor completion on fMRI data with RSCG-TT and the SMV pattern by varying the spatial volume of the missing values for different percentages of missing temporal ratios. Left: The original brain volume at the 20th timepoint. (a) Left to right. Brain volumes at the 20th timepoint with embedded spatial ellipsoids with missing voxel sizes $\in ~\{1, 2, 3, 4, 5, 6\}$ (see Table IV). (b)-(e) Top to bottom. Completed brain volumes for each spatial ellipsoid size and $MR_{T} \in \{5\%, 10\%, 15\%, 20\%\}$ .Comparison of fMRI tensor completion for 3D and 4D tensor representations on the SMV pattern for different percentages of spatiotemporal missing value ratios. The mean TCS results are shown ($N=50$ ). (f) $MR_{T}(\%)= 5\%$ . (g) $MR_{T}(\%)= 10\%$ . (h) $MR_{T}(\%)= 15\%$ . (i) $MR_{T}(\%)=20\%$ .

2) Effect of the Tensor Dimensionality on Tensor Completion for the SMV Pattern

In this section, we study the effect of tensor dimensionality on the estimation of missing voxels with RSCG-TT when fMRI scans are affected by the SMV pattern. We summarize the effect of tensor dimensionality for the SMV pattern in Figs. 15, 16, 17, Table 10 and Table 11.

TABLE 10 Mean and Standard Deviation of the TCS for the SMV Pattern With Different Percentages of Temporal Missing Values $MR_{T}(\%)$ and Tensor Dimensions $D \in\{2, 3, 4\}$ and With RSCG-TT. The Plots Corresponding to This Table Can be Seen in Fig. 16 and Fig. 15
Table 10- 
Mean and Standard Deviation of the TCS for the SMV Pattern With Different Percentages of Temporal Missing Values 
$MR_{T}(\%)$
 and Tensor Dimensions 
$D \in\{2, 3, 4\}$
 and With RSCG-TT. The Plots Corresponding to This Table Can be Seen in Fig. 16 and Fig. 15
TABLE 11 Comparison of the Mean TCS Results for the RSCG-TT Algorithm by Tensor Dimension for the SMV Pattern
Table 11- 
Comparison of the Mean TCS Results for the RSCG-TT Algorithm by Tensor Dimension for the SMV Pattern
FIGURE 15. - Qualitative comparison of 2D, 3D, and 4D tensor completion using RSCG-TT algorithm on fMRI data for spatiotemporal missing value rates 
$MR_{T}(\%)\,\,\in \{5-20\% \}$
 in the SMV pattern. Left to right: original, corrupted, and completed images for 2D, 3D, 4D representations are shown with sagittal projection. (a) 
$MR_{T}(\%) = 5\%$
. (b) 
$MR_{T}(\%) = 10\%$
. (c) 
$MR_{T}(\%) = 15\%$
. (d) 
$MR_{T}(\%) = 20\%$
.
FIGURE 15.

Qualitative comparison of 2D, 3D, and 4D tensor completion using RSCG-TT algorithm on fMRI data for spatiotemporal missing value rates $MR_{T}(\%)\,\,\in \{5-20\% \}$ in the SMV pattern. Left to right: original, corrupted, and completed images for 2D, 3D, 4D representations are shown with sagittal projection. (a) $MR_{T}(\%) = 5\%$ . (b) $MR_{T}(\%) = 10\%$ . (c) $MR_{T}(\%) = 15\%$ . (d) $MR_{T}(\%) = 20\%$ .

FIGURE 16. - Results of fMRI tensor completion affected by the SMV pattern by varying the spatial volumes of missing values for different percentages of missing temporal ratios. The mean TCS results are shown (
$N=50$
). (a) 
$MR_{T}(\%)= 5\%$
. (b) 
$MR_{T}(\%)= 10\%$
. (c) 
$MR_{T}(\%)= 15\%$
. (d) 
$MR_{T}(\%)=20\%$
.
FIGURE 16.

Results of fMRI tensor completion affected by the SMV pattern by varying the spatial volumes of missing values for different percentages of missing temporal ratios. The mean TCS results are shown ($N=50$ ). (a) $MR_{T}(\%)= 5\%$ . (b) $MR_{T}(\%)= 10\%$ . (c) $MR_{T}(\%)= 15\%$ . (d) $MR_{T}(\%)=20\%$ .

FIGURE 17. - Effect of tensor dimensionality on the mean TCS (N = 50) for the SMV pattern with the RSCG-TT algorithm. The boxplots summarize the distribution of mean TCS scores for tensor dimensions 2D, 3D, and 4D. The median values are represented by black lines inside the boxplot, and the tops of the whisker lines indicate the 25th and 75th percentile values. The mean values are plotted in blue, and outliers are represented by red circles. The error bars represent the standard error of the mean. The median values are represented by black lines inside the boxplot, and the tops of the whisker lines indicate the 25th and 75th percentile values. The mean values are plotted in blue, and outliers are represented by red circles. The error bars represent the standard error of the mean. (a) The main effect of tensor dimensionality on the mean TCS: 2D vs 3D vs 4D. Post hoc analyses with two-tailed t-tests (FDR corrected) indicate that the mean TCS for the 4D tensor (
$M=4.575 \times 10^{-3}, SD=4.474 \times 10^{-3}$
) was significantly lower than those for the 2D (
$M=2.75 \times 10^{-2}, SD=4.849 \times 10^{-3}$
) and 3D (
$M=8.467 \times 10^{-3}, SD=6.398 \times 10^{-4}$
) tensors with 
$p < 0.0001$
. **** indicates 
$p < 0.0001$
 (post hoc and FDR corrected), which means significantly different. (b) The distribution of the mean TCS with 
$MR_{T}(\%)~\in ~\{5\%, 10\%, 15\%, 20\%\}$
. The mean TCS results are listed in Table X.
FIGURE 17.

Effect of tensor dimensionality on the mean TCS (N = 50) for the SMV pattern with the RSCG-TT algorithm. The boxplots summarize the distribution of mean TCS scores for tensor dimensions 2D, 3D, and 4D. The median values are represented by black lines inside the boxplot, and the tops of the whisker lines indicate the 25th and 75th percentile values. The mean values are plotted in blue, and outliers are represented by red circles. The error bars represent the standard error of the mean. The median values are represented by black lines inside the boxplot, and the tops of the whisker lines indicate the 25th and 75th percentile values. The mean values are plotted in blue, and outliers are represented by red circles. The error bars represent the standard error of the mean. (a) The main effect of tensor dimensionality on the mean TCS: 2D vs 3D vs 4D. Post hoc analyses with two-tailed t-tests (FDR corrected) indicate that the mean TCS for the 4D tensor ($M=4.575 \times 10^{-3}, SD=4.474 \times 10^{-3}$ ) was significantly lower than those for the 2D ($M=2.75 \times 10^{-2}, SD=4.849 \times 10^{-3}$ ) and 3D ($M=8.467 \times 10^{-3}, SD=6.398 \times 10^{-4}$ ) tensors with $p < 0.0001$ . **** indicates $p < 0.0001$ (post hoc and FDR corrected), which means significantly different. (b) The distribution of the mean TCS with $MR_{T}(\%)~\in ~\{5\%, 10\%, 15\%, 20\%\}$ . The mean TCS results are listed in Table X.

Fig. 15 presents a visual comparison of the obtained completed images with RSCG-TT for 2D, 3D, and 4D tensor formats corrupted by the SMV pattern with different $MR_{T}(\%)$ . Despite the challenging missing SMV pattern, simultaneous temporal and volumetric corruption, successful reconstruction was obtained for 4D and 3D, but not for the 2D tensor format. In particular, the completed 4D and 3D fMRI images appear visually almost indistinguishable from the original ground truth 4D fMRI tensor. RSCG-TT obtained good quality of data completion for $MR_{T}(\%)$ in the range of {5−15%}. However, RSCG-TT were unable to successfully recover fMRI data in form of 2D representation for all $MR_{T}(\%)$ . The poor performance of RSCG-TT for 2D tensor format can be explained by lack of spatiotemporal structure when the original 4D fMRI is flattened to 2D form, in which, the natural strength of the relationships is lost.

Table 10 and Fig. 16 show quantitative evaluations of RSCG-TT for 2D, 3D, and 4D formats in the SMV pattern and different $MR_{T}(\%)$ . We calculated the mean TCS ($N=50$ ) between the data completion results and the ground-truth original 4D fMRI data. In terms of TCS, the reconstruction of 2D representation demonstrated suboptimal performance with the mean TCS in the range from 0.159 to 0.396. Conversely, RSCG-TT was able to obtain good numerical performance for both 4D and 3D formats w.r.t. the mean TCS for 4D in the range from $0.3289 \times 10^{-3}$ to $0.0146 \times 10^{-3}$ , and the TCS for 3D format in the range from 0.005 to 0.057. To evaluate the statistical significance between tensor formats using RSCG-TT in the SMV pattern, we performed ANOVA followed by post hoc analyses with two-tailed t-tests corrected for multiple comparisons with $p < 0.05$ (see Table 11 and Fig. 15). The boxplots shown in Fig. 17 summarize the distribution of mean TCS metric for tensor dimensions 2D, 3D, and 4D. The results underline the fact that spatiotemporal data completion leads to statistically significant improvements compared with volumetric 3D, and flattened 2D tensor formats. Furthermore, from Figs. 15, 16 and 17, it can be noticed that the natural spatiotemporal 4D tensor representation outperforms both alternative views of the fMRI tensor casted to the 3D and 2D dimensional forms across all $MR_{T}(\%)$ .

We present the main effect of tensor dimensionality on the results for the SMV pattern in Fig. 17a and Table 11 and the distribution of the mean TCS in Fig. 17b. The main effect of tensor dimensionality (2D vs 3D vs 4D) in the presence of the SMV pattern was significant; i.e., $F(2, 71997)=40362.579, p < 0.0001$ . The subsequent post hoc analyses with t-tests (FDR corrected) indicated that the mean TCS for the 4D tensor ($M=4.575 \times 10^{-3}, SD=1.474 \times 10^{-3}$ ) was significantly lower than that for the 2D ($M=2.758 \times 10^{-1}, SD=4.832 \times 10^{-2}, t(11998)=431.28, p < 0.0001$ ) and 3D ($M=8.475 \times 10^{-2}, SD=6.405 \times 10^{-2}, t(11998)=96.735, p < 0.0001$ ) tensors. These results suggest that tensor dimensionality does have an effect on fMRI data completion when the original data are affected by the SMV pattern.

C. Evaluation of the Quality of the Tensor Completion Methods on Downstream fMRI Analyses

In this section, we evaluated the effect of each tensor completion method on the sensitivity of fMRI analyses to estimate the resting network (RSN) components. We decomposed the rs-fMRI datasets (see Section IV-A) into functional networks using spatial independent component analysis (ICA) to derive maximally independent RSNs. We conducted group ICA (GICA) [82], [83] to evaluate the sensitivity of RSCG-TT and the state-of-the art tensor completion methods CPD-SGD and Tucker-SGD to voxel activation. We generate four datasets for each of the $N=50$ subjects, one original dataset and three datasets from the completion algorithms for $MR_{RMV}(\%)=50\%$ . We used GICA to decompose the datasets and used direct GICA3 [83], [84] back-reconstruction to obtain subject-specific RSN components. We used the Infomax algorithm [85] and estimated $C=50$ RSN components as suggested by [86]. The whole analysis was implemented with the GIFT toolbox.2 The Infomax algorithm was repeated the process ten times, and the final decomposition was chosen based on the most consistent run selection scheme [87]. We estimated 16 independent components (IC) to be RSNs (as opposed to noise artifacts) by inspecting the aggregate ICs and the power spectra [88]. The RSNs were divided into different groups according their functional and anatomical properties [88] and categorized as auditory (AUD), visual (VIS), sensorimotor (SM), cognitive control (CC), frontoparietal (FP), and default-mode networks (DMN).

1) Group-Level Sensitivity Comparison by Tensor Completion Method

To compare the group-level sensitivity of the three completion methods, we computed voxel-wise $t$ -statistics from a one-sample $t$ -test of the subject-level RSNs and plotted the $t$ -values as a statistical image called the T-map. The threshold of the T-maps is $p < 0.05$ , and the T-maps are corrected for multiple comparisons with FWE. We evaluated the group-level sensitivity of each method by studying its ability to recover independent RSNs across six functional RSNs. We present the T-maps of the RSNs comprising six functional networks: DMN, AUD, VIS, SM, CC, and FP in Fig. 18 and Supplementary Fig. S.1. We focused on a reliable estimation of major RSNs, as they are important biomarkers in a number of disorders; see, e.g., [88]–​[90].

FIGURE 18. - Estimation of the DMN component by GICA following different tensor completion algorithms. The group-level 2D and 3D T-maps for the reference DMN, CPD-SGD, Tucker-SGD, and RSCG-TT are shown. The threshold of the T-maps (one-sample t-test) is 
$p < 0.05$
, and the T-maps are corrected for multiple comparisons by FWE. (a)-(c) Reference DMN component. (d)-(f) DMN component after CPD-SGD. (g)-(i) DMN component after Tucker-SGD. (j)-(l) DMN component after RSCG-TT.
FIGURE 18.

Estimation of the DMN component by GICA following different tensor completion algorithms. The group-level 2D and 3D T-maps for the reference DMN, CPD-SGD, Tucker-SGD, and RSCG-TT are shown. The threshold of the T-maps (one-sample t-test) is $p < 0.05$ , and the T-maps are corrected for multiple comparisons by FWE. (a)-(c) Reference DMN component. (d)-(f) DMN component after CPD-SGD. (g)-(i) DMN component after Tucker-SGD. (j)-(l) DMN component after RSCG-TT.

The T-maps following different tensor completion methods for the canonical DMN component are shown in Fig. 18, and the T-maps for other RSNs are shown in Supplementary Fig. S.1. All six RSNs resulting from RSCG-TT are matched with the reference RSNs (see Fig. 18 and Supplementary Fig. S.1). The results illustrate that the RSN components estimated from the datasets completed by RSCG-TT are the most similar to those estimated from the original datasets. In contrast, it is evident from Fig. 18 and Supplementary Fig. S.1 that the spatial clusters comprising the RSNs are generally smaller by a certain extent following the CPD-SGD and Tucker-SGD methods. We characterize the performance of each tensor completion method w.r.t. the proportion of significant activations in Table 12 and Supplementary Fig. S.3. In summary, the results presented in Table 12, Fig. 18, Supplementary Fig. S1 and Supplementary Fig. S.3 demonstrate that the $t$ -statistics, the proportion of significant voxels, and the spatial cluster extents are higher in RSCG-TT, indicating better sensitivity.

TABLE 12 Peak Activations of RSN Components After the Tensor Completion Algorithms. Cluster ID: the Enumerated Cluster ID of Peak Activations Within the RSN Component; $t_{\max}$ : the Maximum t-Statistic in Each Cluster; Coordinate: $x, y, z$ (in mm) of ${t_{\mathrm{max}}}$ in MNI Space; Cluster Size: the Cluster Size (in mm3); $V_{l}$ : the Number of Voxels in Each Cluster; $V_{l}(\%)$ : the Percentage of Activated Voxels
Table 12- 
Peak Activations of RSN Components After the Tensor Completion Algorithms. Cluster ID: the Enumerated Cluster ID of Peak Activations Within the RSN Component; 
$t_{\max}$
: the Maximum t-Statistic in Each Cluster; Coordinate: 
$x, y, z$
 (in mm) of 
${t_{\mathrm{max}}}$
 in MNI Space; Cluster Size: the Cluster Size (in mm3); 
$V_{l}$
: the Number of Voxels in Each Cluster; 
$V_{l}(\%)$
: the Percentage of Activated Voxels

To quantitatively compare the RSN components across all tensor completion methods, we performed a two-sample t-test on the spatial maps between the reference RSN component estimated from the original datasets and the RSN component generated after each tensor completion method. Fig. 19 and Supplementary Fig. S.2 illustrate the T-map contrasts yielded from the two-sample t-tests: the reference RSN − CPD-SGD RSN, reference RSN − Tucker-SGD RSN, and reference RSN − RSCG-TT. The threshold of the T-map contrasts is at the significance level of $p < 0.05$ (FWE-corrected). The T-maps in Fig. 19 and Supplementary Fig. S.2 show significant changes in sensitivity to voxel activation between the reference RSN and the RSN component after each tensor completion method for the studied functional RSNs, except for RSCG-TT. We further studied the statistical significance of the T-map contrasts between the reference RSN component and RSN component after each tensor completion method by executing post hoc t-tests (FWE-corrected and $p < 0.05$ ). Supplementary Table S.1 shows that there are statistically significant group differences in sensitivity to voxel activation for both the reference RSN − CPD-SGD RSN and reference RSN − Tucker-SGD RSN contrasts ($p_{\mathrm {corr}} < 0.0001$ ). Thus, compared to CPD-SGD and Tucker-SGD, RSCG-TT recovers RSN components much better. The good performance of RSCG-TT in preserving the signal of interest is most likely due to its efficiency in the estimation of missing voxels.

FIGURE 19. - Comparison of the reference DMN with the DMN component that uses GICA following different tensor completion methods. The T-maps (the two-sample t-tests) between the reference DMN component and the DMN component after each tensor completion method are shown. The threshold of the T-maps is 
$p < 0.05$
, and the T-maps are corrected for multiple comparisons by FWE. (a) Reference DMN - CPD-SGD DMN. (b) Reference DMN - Tucker-SGD DMN. (c) Reference DMN - RSCG-TT DMN.
FIGURE 19.

Comparison of the reference DMN with the DMN component that uses GICA following different tensor completion methods. The T-maps (the two-sample t-tests) between the reference DMN component and the DMN component after each tensor completion method are shown. The threshold of the T-maps is $p < 0.05$ , and the T-maps are corrected for multiple comparisons by FWE. (a) Reference DMN - CPD-SGD DMN. (b) Reference DMN - Tucker-SGD DMN. (c) Reference DMN - RSCG-TT DMN.

FIGURE 20. - Differential manifold concepts. (a) A smooth m-dimensional submanifold of 
$\mathbb {R}^{n}$
. A coordinate chart 
$\phi $
 maps a neighborhood 
$\mathcal {S} \cap \mathcal {M}$
 of a point 
$\mathcal {X} \in \mathcal {M}$
 to a subset 
$\mathcal {V} \in \mathbb {R}^{m}$
. (b) A tangent space 
$T_{\mathcal {X}}\mathcal {M}$
 of an embedded submanifold 
$\mathcal {M}, \mathcal {X} \in \mathcal {M}$
. The tangent vectors 
$\gamma '_{1}$
 and 
$\gamma '_{2}$
 are realizations of the two curves 
$\gamma _{1}$
 and 
$\gamma _{2}$
.
FIGURE 20.

Differential manifold concepts. (a) A smooth m-dimensional submanifold of $\mathbb {R}^{n}$ . A coordinate chart $\phi $ maps a neighborhood $\mathcal {S} \cap \mathcal {M}$ of a point $\mathcal {X} \in \mathcal {M}$ to a subset $\mathcal {V} \in \mathbb {R}^{m}$ . (b) A tangent space $T_{\mathcal {X}}\mathcal {M}$ of an embedded submanifold $\mathcal {M}, \mathcal {X} \in \mathcal {M}$ . The tangent vectors $\gamma '_{1}$ and $\gamma '_{2}$ are realizations of the two curves $\gamma _{1}$ and $\gamma _{2}$ .

FIGURE 21. - Geometrical concepts within the framework of Riemannian optimization. (a) Retraction mapping 
$R_{\boldsymbol {\mathcal {X}}}:T_{\boldsymbol {\mathcal {X}}}\mathcal {M} \rightarrow \mathcal {M}$
. (b) Vector transport mapping 
$\tau _{\boldsymbol {\mathcal {X}} \rightarrow \boldsymbol {\mathcal {Y}}}:T_{\mathcal {X}}\mathcal {M} \rightarrow T_{\boldsymbol {\mathcal {Y}}}\mathcal {M}$
.
FIGURE 21.

Geometrical concepts within the framework of Riemannian optimization. (a) Retraction mapping $R_{\boldsymbol {\mathcal {X}}}:T_{\boldsymbol {\mathcal {X}}}\mathcal {M} \rightarrow \mathcal {M}$ . (b) Vector transport mapping $\tau _{\boldsymbol {\mathcal {X}} \rightarrow \boldsymbol {\mathcal {Y}}}:T_{\mathcal {X}}\mathcal {M} \rightarrow T_{\boldsymbol {\mathcal {Y}}}\mathcal {M}$ .

SECTION VI.

Discussions

A significant problem in fMRI research involving group studies is the loss of statistical power and the sensitivity of downstream analyses. We have demonstrated that the RSCG-TT algorithm can significantly increase the statistical power and enables the estimation of RSNs even in the presense of a large amount of missing voxels. As a result, RSCG-TT $t$ -statictics had substantially larger the spatial extent of significant clusters. The cluster size of significant clusters was 58% (see Supplementary Fig. S.3) larger on average than the cluster sizes found by the-state-of-the-art completion methods. As a result of the loss of statistical power, the sizes of significant clusters were reduced after the CPD-SGD and Tucker-SGD methods. RSCG-TT has advantages over the-state-of-the-art methods because the TT manifold allows the algorithm to take full advantage of the global correlation structure between modes of an fMRI tensor. Our study demonstrates that RSCG-TT method is a valuable approach for fMRI data completion that can increase the sensitivity of downstream analyses in identification of important biomarkers such as RSNs.

Detection of the statistically significant effect size in human neuroscience continues to be an ongoing field of active research. The empirical validity of the inferential methods in fMRI is still being studied [91]. To evaluate the significance of the RSNs, we employed the voxel-based inference, a parametric statistical principled correction for multiple comparisons to control the FWE rate. However, due to principled protection from false positives (Type I error), the voxel-based statistics may result in lower sensitivity to voxel activation [92]. Additional methods to control the FWE rate include those making use of cluster-level statistics: cluster-based thresholding [93] and threshold-free cluster enhancement (TFCE) [94]. However, they have their own drawbacks studied in [91], [95]. The voxel-based inference that is used in this work has been widely used, and is desirable due to its ability for rigorous control of false positives in fMRI [96].

A. Numerical Performance of RSCG-TT

We investigated the performance of the RSCG-TT algorithm for the completion of different missing value patterns and as a function of tensor dimensionality. Unlike most works on tensor completion that primarily utilized the RMV pattern, we applied the proposed method to realistic 4D representations in which some selective voxels were missing in spatial ellipsoids across multiple timepoints. To the best of our knowledge, this is the first work applying tensor completion algorithm to the realistic scenario of missing voxels using SMV pattern. The experimental results demonstrate that the proposed RSCG-TT method exhibits rapid convergence rate with the order of multitude differences compared with the performance of the state-of-the-art methods and tensor decompositions such as CPD-SGD and Tucker-SGD. The fast convergence rate can be attributed to the theoretical foundations of our RSCG-TT algorithm (see Section II-C1) since we minimize the spectral-scaling condition, which implies a lower condition number, and the number of iterations required for the algorithm to converge decreases. The results demonstrate the superior performance of RSCG-TT in terms of two accuracy measures compared with the state-of-the-art SGD-based algorithm such as Adaptive Moment Estimation using CPD and Tucker tensor decompositions. We observe that the mean TCS (see Fig. 14 and Table 10) is in the range of $3.289 - 9.407 \times 10^{-4}$ for the 4D fMRI tensor impacted by the SMV pattern with a realistic temporal missing value rate ${MR}_{T}(\%)~\in ~\{5, 10\}\%$ . The observed desirable numerical performance of the RSCG-TT can be attributed to several factors, such as the 4D data structure is fully utilized, TT decomposition is employed, and the use of Riemannian manifold. TT decomposition improves the accuracy by taking into account the global correlations across tensor modes. RSCG-TT exhibits the best numerical results in terms of both the TCS and RSE compared with Tucker-SGD, and CPD-SGD has the worst numerical accuracy. The poor performance of CPD-SGD can be explained by the fact that CPD decomposition is better suited for the recovery of unique factors, whereas TT and Tucker decompositions perform best in data completion scenarios.

B. Effect of Tensor Dimensionality on fMRI Data Completion

We studied the numerical performance of the proposed algorithm with fMRI data represented as 4D, 3D, and 2D tensors. The experimental results suggest that the proposed algorithm performs best when we take the full 4D nature of fMRI data into account with a 4D tensor. Our experiments have shown a reasonable completion threshold confirmed by both quantitative and visual assessments in terms of TCS when it is below 10−2. The 3D tensor representation has a worse performance than 4D tensor representation (see Fig. 8 and Fig. 14). The 2D tensor representation results in a high TCS above 10−2 for the majority of missing value rates for both the RMV and SMV patterns (see Fig. 8 and Fig. 14). Our results suggest that the 4D tensor is the natural structure of fMRI data represented as 3D space $\times $ time and us allows to utilize the richness of both spatial and temporal information to its full potential.

C. Role of a Missing Value Pattern on the Capability of a Full Data Recovery

Based on the experimental results presented in Section V-B, we conclude that the RMV pattern is relatively easy to complete, which allows for a small TCS and consequently better numerical performance. RSCG-TT is able to successfully recover the data corrupted by the RMV pattern using 4D tensor format even with a very high missing ratio, 80% of the total number of voxels covered by a brain mask (see Fig. 8 and Fig. 9). As missing voxels are random, the neighborhood voxels contain enough information to infer them, which enables the successful completion of the data tensor. On the other hand, as confirmed by earlier studies [13], [14], the data corrupted by the SMV pattern are more challenging to complete than those affected by the RMV pattern. The experiments indicate that a simultaneous increase in both ${MR}_{T} (\%)$ and $MR_{V_{e^{3D}}} (\%)$ results in a reduction in the numerical performance of RSCG-TT to complete the data tensor (see Fig. 14). In this case, since the missing voxels follow the SMV pattern, the neighborhood voxels do not have sufficient information for the inference, and therefore, the missing voxels for the SMV pattern are more difficult to recover.

D. Comparison of the Computational Complexities of RSCG-TT State-of-the-Art Methods

In this section, we compare the computational complexities of RSCG-TT and the state-of-the-art methods, as summarized in Table 13. Here, similar to Section III-B, we assume that $R_{\mathrm {max}}$ is the upper bound on the tensor decompositions under consideration. The CPD-SGD, and Tucker-SGD methods are based on the distributed Adam’s SGD, which averages the parameters after each iteration. The update for one entry of the SGD scheme requires $\mathcal {O}(1) $ . The number of tensor entries for the CPD and Tucker models is $I=\vert \Omega \vert $ . Therefore, the total computational complexity for one iteration w.r.t. the free parameters and the number of observed entries in the CPD model is $\mathcal {O}(\vert \Omega \vert NR_{\mathrm {max}})$ , and the estimate of the Tucker model can be calculated in $\mathcal {O}(\vert \Omega \vert NR_{\mathrm {max}} + R_{\mathrm {max}}^{N})$ [43], [97].

TABLE 13 Summary of Computational Complexities of RSCG-TT and the State-of-the-Art Tensor Factorization Algorithms. For Simplicity We Assume That the Length of Every Mode is Equal to $I$ and $R_{\mathrm{max}}$ is the Upper Bound on the Considered Tensor Decompositions
Table 13- 
Summary of Computational Complexities of RSCG-TT and the State-of-the-Art Tensor Factorization Algorithms. For Simplicity We Assume That the Length of Every Mode is Equal to 
$I$
 and 
$R_{\mathrm{max}}$
 is the Upper Bound on the Considered Tensor Decompositions

To complete the entire 4D fMRI tensor, RSCG-TT, on average, requires $3\times 10^{3}$ ms of CPU time (see Fig. 10b) on a 2.7Hz Xeon processor with 8-16Gb of memory. In Fig. 10b, we compare RSCG-TT and the state-of-the-art completion algorithms w.r.t. the computational time. As we can note from Fig. 10b, the computational time is another key differentiator between the methods. It should be noted from Fig. 10b that RSCG-TT is at least two orders of magnitude faster than CPD-SGD, and Tucker-SGD per unit of time. Moreover, RSCG-TT demonstrates a sharp decrease in the value of the objective function within the same time interval compared with the state-of-the-art methods. The presented results demonstrate the ability of RSCG-TT to approach the solution quicker in terms of both computational measures, i.e., complexity per unit of iteration and the complexity per unit of time. In contrast, the state-of-the-art methods require higher computational time to achieve a reasonable reduction in the value of the model cost function. Therefore, the state-of-the-art algorithms require a significantly higher computational time to approach the solution. The results shown in Fig. 10 can be explained by the properties of the algorithms listed in Table 13. The rate of change in the objective function value of RSCG-TT method is superlinear since it belongs to the family of nonlinear conjugate gradient algorithms [55], [59]. The state-of-the-art methods exhibit a linear time complexity due to their design inherited from the gradient descent scheme [55], [59]. With respect to computational time, the CPD-SGD method is clearly less efficient among the three methods, requiring the largest computational time, whereas RSCG-TT performs the best. The other notable state-of-the-art tensor completion algorithms are TT-ALS [98], [99], and TMac-TT [17]. The former is based on the Alternating Least Square (ALS) scheme, and the latter is based on parallel matrix factorization using adaptive rank estimation. In each TT-ALS step, all TT-cores but one is kept fixed, and the overall optimization problem is reduced to a small optimization problem of a single core. Compared with the complexity per iteration of RSCG-TT, the TT-ALS procedure uses the fourth power of rank instead of square. While the space complexity is similar to RSCG-TT, the TT-ALS possesses a slower theoretical [100] and practical [99] convergence property.

The advantages of TMac-TT over RSCG-TT include the adaptive rank estimation and bypassing the computationally expensive SVD. Despite the advantages such as adaptive rank estimation, the exponential computational complexity in tensor dimension $\mathcal {O}(3(N-1)I^{N}R_{\mathrm {max}}$ makes TMac-TT method impractical in large-scale tensor completion scenarios. Although as shown in Table 13, the CPD-SGD method possesses a better computational and space complexity (total space taken by the algorithm with respect to the input size), the determination of canonical tensor rank is NP-hard [101] problem. Second, the best rank-$R$ CPD decomposition might not exist since the space of low-rank tensors is not closed [102]. It is worth noting that the Tucker-SGD model offers a more flexible tensor model. However, the space complexity of the Tucker format scales exponentially in the tensor dimension $N$ , which makes the Tucker-based methods not practical for tensor dimensions $N \ge 4$ due to the curse of dimensionality.

From this perspective, our assessment suggests that TT-ALS, TMac-TT, CPD-SGD, and Tucker-SGD either provide suboptimal computational complexity or may not be able to deal with large-scale high-order tensors. Finally, as illustrated in Table 13, Fig. 10, and above considerations, RSCG-TT is theoretically and practically superior to the state-of-the-art algorithms, and it is computationally preferable for the data completion tasks.

E. SMV Pattern in Practical Scenarios

In this section, we discuss a practical application of the RSCG-TT method when fMRI data are analyzed in the complex-valued domain. fMRI data are natively acquired as complex-valued spatiotemporal images; however, most studies have only focused only on the magnitude data in fMRI data. Several studies have shown that employing fMRI in their native form with both the magnitude and phase improves the sensitivity of analyses [3], [3], [72], [74]–​[77]. Since phase images pose a unique challenge, a typical solution is to remove noisy regions from fMRI data prior to analyses. Consequently, regions are eliminated through thresholding and edge reducing techniques or by using QMPD methods [3], [72], [74]–​[77]. The QMPD maps are used to generate quality masks to exclude voxels that should not be further analyzed, as described in Section IV-E. The SMV pattern studied in this paper aims to demonstrate one of the applications of the RSCG-TT method when 3D brain regions are excluded from the analysis. Here, we highlight the importance of our findings, demonstrating the viability of SMV pattern fMRI data completion, which can be used in the broader context of the MNAR voxel pattern. Finally, we would like to conclude that, indeed, taking the 4D structure of fMRI data into account provides significant performance gains and is very exciting on its own. We hope this will help encourage and accelerate research in the very promising area of tensors for fMRI data not only for data completion but for the development of other tools for the analyzing fMRI data when computational challenges might be even more significant.

SECTION VII.

Conclusion

We introduce a novel approach to the fMRI data completion problem based on TT decomposition along with a corresponding algorithm for its solution. We develop Riemannian nonlinear spectral conjugate gradient method called RSCG-TT to solve a higher-order fMRI data completion in the form of a 4D tensor. The RSCG-TT algorithm exploits the second-order information and offers additional guardrails to ensure that computational performance is coupled with numerical stability. Since we deal with high-order datasets, we extend our method to a TT format, which provides scalable numerical algebra operations and hierarchical compressed representations of structurally diverse datasets. Exploiting the Riemannian structure and the efficient utilization of the covariance information by TT decomposition boosts the numerical performance significantly when the proposed method is compared with state-of-the-art SGD-based methods, such as Tucker-SGD and CPD-SGD. Parallelizing tensor computations can further improve the execution time of RSCG-TT. RSCG-TT can benefit from the use of sparse data structures, and graphical processing units (GPU) specialized in high-performance computing (HPC) to reduce the elapsed time. The topic of shortening the computational time of RSCG-TT is the subject of future research.

One of the significant advantages of our approach is that we suggest viewing fMRI data as 4D spatiotemporal hypervolumes observed during a scanning session. This view takes into account the natural representation of fMRI data in the form of a fourth-order tensor represented as 3D space $\times $ time. As demonstrated by the experiments in Section V, representing a 4D tensor as a TT tensor allows us to fully exploit the global structure of fMRI data and introduces a joint learning scheme, which captures the intrinsic relationship between the spatial and temporal modes. We consider the practical use of the RSCG-TT algorithm when some select brain volumes of a 4D fMRI scan are affected by the exclusion of brain regions that resemble the MNAR missing voxel pattern. Our experiments suggest that the RSCG-TT algorithm can estimate the missing brain voxels affected by the MNAR pattern and provides a reliable recovery even in the areas with very low activation thresholds and Z-scores below two times the standard deviation. Studies with other lower-dimensional representations demonstrate that taking the full 4D structure of fMRI data into account provides significant gains compared to when 3D and 2D representations of the data are considered. We evaluated the ability of RSCG-TT to recover the RSN components after data completion. Our results suggest that RSCG-TT provides advantages in retaining the RSN components compared to the state-of-the-art tensor completion methods.

Another promising direction in data completion research includes fMRI data completion in the native k-space. The k-space intrinsically possesses very desirable properties, such as fMRI image sparsity in wavelet bases and the incoherence between Fourier and inverse wavelet bases. These properties might allow one to significantly reduce the order of a model, i.e., the TT-rank, and to employ nonlinear data completion through a nonsmooth formulation of the optimization problem.

Future potential studies include a number of exciting directions, such as simultaneous estimations of an optimal multilinear TT-rank and robust tensor completion, complex-valued tensor completion, incorporating nonparametric cluster-level statistics, and working directly in the $k$ -space with a nonsmooth objective function.

ACKNOWLEDGMENT

The authors would like to thank Dr. Vince Calhoun for providing the fMRI COBRE dataset used in this study and Dr. Evrim Acar for the useful comments and feedback.

Appendix A

List of Abbreviations

AbbreviationExpansion
2D

two-dimensional.

3D

three-dimensional.

4D

four-dimensional.

ARD

Automatic Relevance Determination.

BGFS

Broyden-Fletcher-Goldfarb-Shanno.

CPD

Canonical Polyadic Decomposition.

CPD-SGD

Stochastic Gradient Descent based on CPD.

EM

Expectation Minimization.

fMRI

Functional Magnetic Resonance Imaging.

MAR

Missing at Random.

MBFGS

Modified Broyden-Fletcher-Goldfarb-Shanno.

MNAR

Missing not at Random.

MRI

Magnetic Resonance Imaging.

RMV

Random Missing Values.

RCT

Riemannian Curvature Tensor.

RSCG-TT

Riemannian nonlinear Spectral Conjugate Gradient via Tensor Train.

RSE

Relative Square Error.

SCG

Scaled Conjugate Gradient.

SGD

Stochastic Gradient Descent.

SMV

Structural Missing Values.

TCS

Tensor Completion Score.

$\mathrm {{TCS}}_{Z}$

Tensor Completion Score with Z-score above two times the standard deviation.

TT

Tensor Train.

Tucker-SGD

Stochastic Gradient Descent based on Tucker Decomposition.

RSN

Resting State Networks.

Appendix B

Preliminaries on Riemannian Optimization

Definition 1 (Manifold):

A real $n$ -dimensional manifold is a topological space $\mathcal {M}$ , where every point $\mathcal {X} \in \mathcal {M}$ has a neighbourhood homeomorphic to Euclidean space $\mathbb {R}^{n}$ .

Definition 2 (Smooth Manifold[103]):

A subset $\mathcal {M}~\in ~\mathbb {R}^{n}$ is an $m$ -dimensional smooth manifold of $\mathbb {R}^{n}$ if each $\mathcal {X}~\in ~\mathcal {M}$ has an open neighbourhood $\mathcal {S} \in \mathbb {R}^{n}$ that $\mathcal {S} \cap \mathcal {M}$ is diffeomorphic $\mathcal {V} \in \mathbb {R}^{m}$ . A diffeormorphism $\phi: \mathcal {S} \cap \mathcal {M} \rightarrow \mathcal {V}$ is called the coordinate chart of $\mathcal {M}$ , while its inverse $\phi ^{-1}: \mathcal {V} \rightarrow \mathcal {S} \cap \mathcal {M}$ is called a parametrization of $\mathcal {S} \cap \mathcal {M}$ . We illustrate the concept of a smooth manifold in Fig. B(a).

Definition 3 (A Riemannian Manifold ($\mathcal {M}$ ,$\mathfrak {g}$ )):

(B.1) is a real smooth manifold $\mathcal {M}$ equipped with a Riemannian metric $\mathfrak {g}$ .

Definition 4 (A Riemannian Metric$\mathfrak {g}$ (B.2)):

on a smooth manifold $\mathcal {M}$ is a smoothly chosen inner product $\mathfrak {g}_{\mathcal {X}}:T_{\mathcal {X}}M \times T_{\mathcal {X}}M \rightarrow \mathbb {R}$ on each of the tangent spaces $T_{\mathcal {X}}\mathcal {M}$ of $\mathcal {M}$ such that:

  1. $\mathfrak {g}(\xi,\eta) = \mathfrak {g}(\xi,\eta)\,\,\forall \,\,\xi,\eta \in T_{\mathcal {X}}\mathcal {M}$ ;

  2. $\mathfrak {g}(\xi,\xi) \ge 0 ~\forall ~\xi \in T_{\mathcal {X}}\mathcal {M}$ ;

  3. $\mathfrak {g}(\xi,\xi) = 0$ if and only if $\xi =0$ .

Definition 5 (Tangent Vector, Space, and Bundle):

A vector $\xi $ is called a tangent vector of $\mathcal {M}$ at a point $\mathcal {X} \in \mathcal {M}$ if there is a small curve $\gamma:R \rightarrow \mathcal {M}$ such that \begin{equation*} \gamma (0) = x, ~\gamma ' = \lim _{t \rightarrow {0}} \frac {\gamma (t) - \gamma (0)}{t} = \xi \end{equation*} View SourceRight-click on figure for MathML and additional features.

The set of tangent vectors of $\mathcal {M}$ at $x$ forms a tangent space of $\mathcal {M}$ at $\mathcal {X}$ : $T_{\mathcal {X}}\mathcal {M}:= \{\gamma '(0)|\gamma: \mathbb {R} \rightarrow \mathcal {M} \,\,\text {smooth}, \gamma (0) = x (\text{B}.3)$ . The tangent bundle is a disjoint union of all tangent spaces, \begin{equation*} T\mathcal {M}:= \bigcup _{\mathcal {X} \in \mathcal {M}} \{\mathcal {X}\}\times T_{\mathcal {X}}\mathcal {M},\tag{B.4}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $\{(\mathcal {X},\xi) \in \mathbb {R}^{I_{1}I_{2} \cdots I_{N}} \times \mathbb {R}^{I_{1}I_{2} \cdots I_{N}}:\mathcal {X} \in \mathcal {M}, \xi \in T_{\mathcal {X}}\mathcal {M}\}$ The concept of a tangent space is illustrated in Fig. B(b). We restrict the Euclidean inner product of two tensors to the tangent bundle $\mathfrak {g}_{\mathcal {X}}:T_{\mathcal {X}}\mathcal {M} \times T_{\mathcal {X}}\mathcal {M}$ , \begin{equation*} \langle \mathcal {X}, \mathcal {Y} \rangle = \mathrm {tr}(\mathcal {X}^{T} \mathcal {Y}) ~\text {with} ~\mathcal {X}, \mathcal {Y} \in \mathbb {R}^{I_{1}I_{2} \cdots I_{N}}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Therefore, we can convert a smooth manifold $\mathcal {M}$ into a Riemannian manifold with a Riemannian metric \begin{equation*} \mathfrak {g}_{\mathcal {X}} = \langle \xi, \eta \rangle = \mathrm {tr}(\xi ^{T} \eta)\tag{B.5}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $\mathcal {X}~\in ~\mathcal {M}$ and $\xi $ , $\eta ~\in ~T_{\mathcal {X}}\mathcal {M}$ , and tangent vectors $\xi, \eta $ are tensors $\in ~\mathbb {R}^{I_{1}I_{2} \cdots I_{N}}$ . The inner product defined by the Riemannian metric (B.2) induces the following norm on $T_{\mathcal {X}}\mathcal {M}$ :\begin{equation*} \Vert \xi \Vert = \sqrt {\mathfrak {g}_{\mathcal {X}}(\xi, \xi)}. \tag{B.6}\end{equation*} View SourceRight-click on figure for MathML and additional features.

We further define the orthogonal projection of tensor $\mathcal {X}~\in ~\mathbb {R}^{I_{1}I_{2} \cdots I_{N}}$ onto $T_{\mathcal {X}}\mathcal {M}$ as \begin{equation*} P_{T_{\mathcal {X}}\mathcal {M}}: \mathbb {R}^{I_{1}I_{2} \cdots I_{N}} \rightarrow T_{\mathcal {X}}\mathcal {M}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Definition 6 (Retraction):

The retraction $R_{\mathcal {X}}$ is a smooth mapping from a tangent vector $\xi \in T_{\mathcal {X}}\mathcal {M}$ to a point $\mathcal {X} \in \mathcal {M}$ such that \begin{equation*} R_{\mathcal {X}}: T_{\mathcal {X}}\mathcal {M} \rightarrow \mathcal {M}, \tag{B.7}\end{equation*} View SourceRight-click on figure for MathML and additional features. and it satisfies the local rigidity condition [37].

By its properties, it guarantees a natural way to move on the manifold along a search direction. The retraction $R$ can be computed by the projection operator with the rank truncation by TT-SVD [44] as follows:\begin{align*} R_{\mathcal {X}}(\xi) \rightarrow \mathrm {P}_{\mathbf {r}}^{\mathrm {TT{-}SVD}_{\mathbf {r}}}(\mathcal {X} + \xi), \quad \mathrm {P}_{\mathbf {r}}^{\mathrm {TT{-}SVD}_{\mathbf {r}}} \rightarrow \mathbb {R}^{I_{1}\times \cdots \times I_{N}}. \\\tag{B.8}\end{align*} View SourceRight-click on figure for MathML and additional features.

Definition 7 (Vector Transport):

A vector transport on manifold $\mathcal {M}$ is a smooth mapping that transports a tangent vector $\xi \in T_{\mathcal {X}}\mathcal {M}$ at $\mathcal {X} \in {\mathcal {M}} (\text{B}.9)$ to a vector in the tangent space at a point $\mathcal {Y} \in T_{\mathcal {Y}}\mathcal {M}$ .

It was shown in [37] that for the embedded submanifold of $\mathbb {R}^{I_{1}\times \cdots I_{N}}$ , the vector transport is given by the orthogonal projection of $\text {P}_{T_{\mathcal {Y}}\mathcal {M}}\xi $ onto the tangent space $T_{\mathcal {Y}}\mathcal {M}$ as follows:\begin{align*}&\tau _{\mathcal {X} \rightarrow \mathcal {Y}}: T_{\mathcal {X}}\mathcal {M} \rightarrow T_{\mathcal {Y}}\mathcal {M}, \tag{B.10}\\&\xi _{\mathcal X} \rightarrow \text {P}_{T_{\mathcal {Y}}\mathcal {M}}\xi. \tag{B.11}\end{align*} View SourceRight-click on figure for MathML and additional features.

The concept of vector transport $\tau _{\mathcal {X} \rightarrow \mathcal {Y}}$ is depicted in Fig. B(b).

Definition 8:

(Riemannian gradient). Given a smooth real function on $f: \mathcal {M} \rightarrow \mathbb {R}^{I_{1} \times \cdots \times I_{N}},\,\,\mathcal {X} \in \mathcal {M}$ on a Riemannian manifold $\mathcal {M}$ , its gradient grad$f(x)$ is defined as a unique tangent vector in $T_{\mathcal {X}}\mathcal {M}$ satisfying \begin{equation*} \langle \text {grad}f(x), \xi \rangle _{x} = \text {D} f(x)[\xi], \quad \forall \xi \in T_{\mathcal {X}}\mathcal {M}, \tag{B.12}\end{equation*} View SourceRight-click on figure for MathML and additional features. where we denote the directional derivative as $Df$ .

Since the TT-manifold $\mathcal {M}_{r}$ is embedded in Euclidean space $\mathbb {R}^{I_{1} \times \cdots \times I_{N}}$ [47], the Riemannian gradient for TT decomposition is computed as the orthogonal projection of the Euclidean gradient onto tangent space (B.3) [37] \begin{equation*} \text {grad} f(\mathcal {X}) = \text {P}_{T_{\mathcal {X}}\mathcal {M}_{r}}(\nabla f(\mathcal {X})), \tag{B.13}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $f: \mathbb {R}^{I_{1} \times \cdots \times I_{N}} \rightarrow \mathbb {R}$ is the cost function with Euclidean gradient $\nabla f(\mathcal {X})$ and $\mathcal {X} \in \mathcal {M}_{r}$ .

Definition 9 (Geodesic):

A smooth curve $\gamma: I \rightarrow \mathcal {M}$ , where ($I \in \mathbb {R}$ is any interval), is a geodesic if $\gamma ''(\alpha) = \nabla _{\gamma '}\gamma '(\alpha)=0 \,\,\forall \alpha \in I (\text{B}.14)$ .

SECTION A.

Riemannian Manifold Learning for Tensor Decomposition in a TT Format

Riemannian optimization on tensor manifolds is a growing field, and it is a generalization of the theory and algorithms from unconstrained Euclidean optimization to problems on smooth manifolds [33], [37]. It consists of optimizing a tensor valued function on a curved manifold instead of Euclidean space. In the Riemannian setting, the Euclidean notion of straight lines and ordinary differentiation is replaced with geodesics and covariant differentiation. Mathematically speaking, Riemannian tensor optimization considers finding an optimum of a scalar cost function of a tensor variable $f(\boldsymbol {\mathcal {X}})$ defined on a smooth differentiable manifold, $\mathcal {M}$ in (B.1), equipped with a Riemannian metric structure $\mathfrak {g}$ , given in (B.2), that is both symmetric and positive definite [104], [105] as:\begin{equation*} \underset {\boldsymbol {\mathcal {X}} \in \mathcal {M}}{\mathrm {argmin} f(\boldsymbol {\mathcal {X}})}. \tag{B.15}\end{equation*} View SourceRight-click on figure for MathML and additional features.

1) Riemannian Optimization Concepts

The continuous optimization of Riemannian manifolds requires us to construct a descent search curve $\gamma $ such that:\begin{align*} \gamma '(\alpha)=&-\text {grad} f(\gamma (\alpha)) \quad \forall \alpha, \tag{B.16}\\ \gamma (\alpha):=&R_{\boldsymbol {\mathcal {X}}}(\alpha \eta),\tag{B.17}\end{align*} View SourceRight-click on figure for MathML and additional features. where $\boldsymbol {\mathcal {X}} \in \mathcal {M}$ , a search direction $\eta \in T_{\boldsymbol {\mathcal {X}}}\mathcal {M}$ , $T_{\boldsymbol {\mathcal {X}}}\mathcal {M}$ is the tangent space in (B.3) to the manifold at the point $\mathcal {X}$ , and $\alpha $ is the scalar step size following the geodesics (B.14) along the curve $\gamma $ . The interpretation of moving along the curve $\gamma $ can be seen as moving in the search direction $\eta $ , while every iteration $\boldsymbol {\mathcal {X}}_{k}$ is constrained to the manifold $\mathcal {M}$ . In practice, since it is not feasible to compute geodesics in a closed form, the computation of geodesics is replaced with a smooth mapping, called retraction $R_{\boldsymbol {\mathcal {X}}}:T_{\boldsymbol {\mathcal {X}}}\mathcal {M} \rightarrow \mathcal {M}$ (B.7). The concept of retraction is illustrated in Fig. B(a). Additionally, to compute the new search direction, we have to linearly combine the current gradient vector and the previous search direction. Technically, we need to transport the information from one tangent space $T_{\boldsymbol {\mathcal {X}}}\mathcal {M}$ to another tangent space $T_{\boldsymbol {\mathcal {Y}}}\mathcal {M}$ using the vector transport $\tau _{\boldsymbol {\mathcal {X}}\rightarrow \boldsymbol {\mathcal {Y}}}$ in (B.9). An intuitive drawing of a vector transport is depicted in Fig. B(b). Finally, with the retraction $R$ , the updating formula for the curvilinear search is given as \begin{equation*} \boldsymbol {\mathcal {X}}_{k+1} = R_{\boldsymbol {\mathcal {X}}_{k}}(\alpha _{k} \eta _{k}),\tag{B.18}\end{equation*} View SourceRight-click on figure for MathML and additional features. where the step size $\alpha _{k}$ is computed to satisfy the Wolfe conditions [59] and $\eta _{k}$ is the search direction satisfying the descent condition [55].

2) Manifold Structure for TT Decomposition

In [47], the authors show that the set $\mathcal {M}_{r}$ of the tensors of fixed multilinear rank $\mathbf {r}=(R_{0}, R_{1}, R_{2}, \cdots, R_{N})$ in TT format form a smooth embedded submanifold of $\mathbb {R}^{I_{1}I_{2} \cdots I_{N}}$ .\begin{equation*} \mathcal {M}_{r} = \{ \boldsymbol {\mathcal {X}} \in \mathbb {R}^{I_{1}I_{2} \cdots I_{N}}|\text {rank}_{\mathrm {TT}}(\boldsymbol {\mathcal {X}}) = \mathbf {r} \}, \tag{B.19}\end{equation*} View SourceRight-click on figure for MathML and additional features. where the dimension of $\mathcal {M}_{r}$ is given by \begin{equation*} \text {dim}~(\mathcal {M}_{r}) = \sum _{n=1}^{N} R_{n-1}I_{n} R_{n} - \sum _{n=1}^{N-1} R_{n}{^{2}},\tag{B.20}\end{equation*} View SourceRight-click on figure for MathML and additional features. and the tensor $\mathcal {X}$ allows TT decomposition (1) with multilinear TT rank $\mathbf {r}$ in (2).

We use the inner product in (4) defined for TT decomposition as a metric on $\mathcal {M}_{r}$ . Therefore, when we equip $\mathcal {M}_{r}$ with this metric, $\mathcal {M}_{r}$ becomes a Riemannian manifold $\left(\mathcal {M}_{r}, \mathfrak {g}\right)$ . As a result, the Riemannian structure of the TT manifold allows us to use any method from the optimization on Riemannian manifolds with the Riemannian gradient defined in (B.13).

References

References is not available for this document.