Constrained Magnetic Resonance Spectroscopic Imaging by Learning Nonlinear Low-Dimensional Models

Magnetic resonance spectroscopic imaging (MRSI) is a powerful molecular imaging modality but has very limited speed, resolution, and SNR tradeoffs. Construction of a low-dimensional model to effectively reduce the dimensionality of the imaging problem has recently shown great promise in improving these tradeoffs. This paper presents a new approach to model and reconstruct the spectroscopic signals by learning a nonlinear low-dimensional representation of the general MR spectra. Specifically, we trained a deep neural network to capture the low-dimensional manifold, where the high-dimensional spectroscopic signals reside. A regularization formulation is proposed to effectively integrate the learned model and physics-based data acquisition model for MRSI reconstruction with the capability to incorporate additional spatiospectral constraints. An efficient numerical algorithm was developed to solve the associated optimization problem involving back-propagating the trained network. Simulation and experimental results were obtained to demonstrate the representation power of the learned model and the ability of the proposed formulation in producing SNR-enhancing reconstruction from the practical MRSI data.

SNR as the molecules of interest typically have very low concentrations, in vivo applications of MRSI have been limited to single-voxel spectroscopy or low-resolution, time-consuming acquisitions [2], [3].
In the past several years, a number of constrained MRSI methods have been proposed to improve the speed, resolution and SNR tradeoffs. The key methodologies in these works typically rely on constructing a reduced-complexity signal model of the high-dimensional MRSI data, by exploiting either sparsity in the spatial and/or spectral domains [4]- [10] or low-rankness of the desired spatiospectral function [11]- [17]. Additional spatial constraints have also been introduced to further take advantages of other anatomical prior information readily available in an MR experiment [18]- [21]. Recently, a subspace MRSI method called SPICE has been developed to successfully achieve fast, high-resolution MRSI by jointly designing both data acquisition and processing in a subspace imaging framework [22]- [24]. The key feature of this approach is modeling individual voxel spectra as a linear combination of a small number of basis functions, which can be predetermined using specially designed navigator data, thus significantly reducing the degrees-of-freedom and allowing better tradeoffs for speed, resolution and SNR. However, to capture more complicated spatiospectral variations, the dimension of the linear subspace can increase substantially, reducing its efficiency, thus motivating the need for a more general nonlinear low-dimensional model.
Learning a general nonlinear model for high-dimensional functions is a challenging problem. In the context of MRSI, locally linear embedding (LLE) [25] and Laplacian eigenmaps (LE) [26] have been applied to classify spectra from normal and diseased tissues by estimating the low-dimensional manifolds where each class of spectra was assumed to reside. But incorporating such classification models into the imaging process is difficult for which more accurate representation is required. Meanwhile, the recent success of deep neural network based methods in learning complex functional mapping and extracting nonlinear features from high-dimensional data presents new opportunities to address the model learning problem [27]- [29]. A number of works have been proposed in MRSI, mainly focusing on spectral quantification [30]- [32] or spectral artifact removal [33]. A common approach among these methods is to directly learn the entire inverse function that maps the noisy and artifact-containing signals to the desired artifact-free ones or the spectral parameters (e.g., 0278-0062 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
molecule concentrations or lineshape parameters), by training a deep neural network (DNN) [30], [33], [34]. This approach requires the learned function to simultaneously capture all the nuances in the noise, artifacts, as well as underlying true signals at all the possible SNRs, thus dramatically increasing the complexity of this function and the learning problem. Moreover, it is also sensitive to SNR levels and data acquisition designs. We proposed in this work a different approach to learn a nonlinear low-dimensional model that captures the inherent degrees-of-freedom in spectroscopic signals and to incorporate such a learned representation for constrained MRSI reconstruction. Specifically, we recognized that a general NMR spectrum can be characterized by a small number of parameters based on the well-defined spectral quantification model, thus should reside in a low-dimensional manifold embedded in the original high-dimensional space. Accordingly, we designed a deep autoencoder network (DAE) to capture this manifold. The effectiveness of DAE in learning nonlinear lowdimensional representation for high-dimensional data has been well-documented [27]- [29] but requires a large amount of training data. To this end, we acquired a small amount of training data to estimate the distributions of moleculedependent parameters and then used the spectral fitting model to generate the data needed to train the DAE. A regularization formulation was devised to integrate the learned representation and the spatiospectral encoding model for constrained MRSI reconstruction. An efficient algorithm was developed to solve the associated optimization problem. The proposed model was evaluated against linear dimensionality reduction and demonstrated a more efficient representation of MRSI data. Simulation and experimental results were obtained to demonstrate the capability of the proposed method in producing improved spatiospectral reconstructions over existing methods.
The rest of the paper is organized as follows. Section II provides background on MR spectroscopy modeling and the need for nonlinear dimension reduction, as well as a brief review of autoencoder based neural networks. Section III presents details on the proposed DAE-based learning method, reconstruction formulation, and optimization algorithm. Section IV summarizes the experimental results followed by discussion and conclusion in Sections V and VI.

II. BACKGROUND A. Spectroscopy Signal Modeling
The imaging problem in MRSI can be defined as recovering a high-dimensional spatiotemporal function ρ(r, t) from a set of (k,t)-space measurements d p (k, t) that can be expressed as (1) where V is the imaging volume of interest, s p (r) is the sensitivity map for the pth coil used in data acquisition, δ f (r) denotes the B 0 field inhomogeneity, k the coordinates for the Fourier encoding space and n p (k, t) the measurement noise. The k-space sampling can be designed based on resolution and time requirements. While ρ(r, t) is very highdimensional and challenging to recover at a high resolution from the typically noisy d p (k, t), there is abundant prior information that can be exploited for improved reconstruction. Specifically, the FID signals at individual voxels can be generally modeled as where c m and φ m (t) represent the concentration and timedomain basis function of the mth molecule, respectively, and e m (t; θ m ) captures molecule-dependent time-domain modulation functions that can be described by a few experiment and physiology relevant parameters in θ m . The Fourier transform of φ m (t) is a resonance structure containing the relative frequencies, amplitudes and phases of different spectral peaks for the mth molecules, which is data independent and can be predicted by spin physics. A most common specification of e m (t; θ m ) is the exponential lineshape model below [35]: where {T * 2,m } and {δ f m } denote the tissue and physiological condition dependent lineshape and frequency shifts for each molecule, respectively, and e −β(r)t 2 captures additional lineshape distortion due to system imperfection such as intravoxel B 0 inhomogeneity and eddy currents. Accordingly, all spectral signals specified by Eq. (3) with varying {c m }, {T * 2,m }, {δ f m } and β can be treated as points residing on a low-dimensional manifold embedded in the high-dimensional vector space [36], [37]. This manifold can be well-approximated by a linear subspace if the parameters are in a relatively narrow range. This property has been exploited in the recent subspace-based reconstruction and spectral quantification methods to successfully achieve fast, high-resolution MRSI (by modeling spectroscopic signals as a linear combination of a number of basis functions) [22]- [24], [38]. However, as the ranges of the parameters increase, e.g., due to physiological variations, increased system nonideality or the presence of a certain pathology, the dimension of the subspace can increase substantially to ensure an accurate approximation, reducing its efficiency. Therefore, a general nonlinear low-dimensional model that can accurately capture all the physiologically relevant spectral variations is desired. A toy example is shown in Fig. S1 (Supplementary Materials) to illustrate this point.

B. Deep Autoencoders
A number of methods have been developed to learn nonlinear features from MRSI data for classification purpose, e.g., [25], [26], [39]. The incorporation of learned features into constrained reconstruction requires representations with much higher accuracy and flexibility. Motivated by the recent success of deep neural network in representation learning [27]- [29], we propose in this work to use a deep autoencoder to learn an accurate and efficient low-dimensional model for MR spectroscopy signals. To provide a context for the proposed method, we present in this section a brief introduction to autoencoders and deep autoencoders. An autoencoder (AE) is a special type of artificial neural network that is typically composed of fully connected layers and has been developed to learn the underlying representation of data of various types [27]. Figure 1a illustrates a commonly used basic autoencoder with the first layer being linear units to represent an input vector x ∈ R D , the middle layer being a set of nonlinear units to capture the "encoded features" h ∈ R D of x, and the last layer being also linear units to represent an output vectorx with the same dimension as x. The mapping from x to h is referred to as the encoder, denoted as f (x), while the mapping from h tox is referred to as the decoder, denoted as g(h). For the simple AE in Fig. 1a, the relationships between x, h andx can be mathematically expressed as: where W 1 and W 2 denote the weights from the input layer to the hidden layer and hidden layer to the output layer, respectively, and a(·) contains element-wise nonlinear activation functions with bias vector b. This model allows one to extract nonlinear low-dimensional features of data for which linear dimension reduction techniques do not offer satisfactory performance. Theoretical connections to nonlinear manifold modeling have also been made [40], [41]. Deep autoencoders (DAEs) are deep neural networks that are constructed by stacking multiple nonlinear encoding and decoding layers from the standard AEs (Fig. 1b). The multiple nonlinear layers introduce more complexity, thus offering stronger representation and feature extraction powers than the standard AEs. Mathematically, the features extracted from the multiple nonlinear layers can be denoted as being nonlinear functions at different layers); and the multi-layer encoders and decoders can be expressed generally as f (x; θ f ) and g(h; θ g ), respectively. The vectors θ f and θ g contain all the weights and bias in the encoding and decoding layers. DAEs have been trained to learn compact low-dimensional representations of complicated, high-dimensional data (e.g., audios and images) [27], [29]. Specifically, the training can be done by solving the following problem that minimizes an error metric ε with respect to θ f and θ g such that the output of the networkx s = g( f (x s ; θ f ); θ g ) can accurately approximate the corresponding input x s (s being the sample index), i.e., Regularization can be imposed on θ f , θ g , f (·; θ f ) and/or g(·; θ g ) such that the encoded features are not required to be lower-dimensional than the input and still being able to avoid overfitting or simply learning an identity transform [28], [42], [43]. A common choice for ε is the L 2 error, i.e., ε = x s −x s 2 2 . Applications of DAEs to MR image reconstruction and MRSI spectral quantification have been recently explored [32], [44], where the networks were trained to generate images or spectral parameters directly from the measured data by extracting low-dimensional features.

A. Low-Dimensional Spectral Model Learning Using DAE
Recognizing that an inherent low-dimensional representation should exist for the MR spectra specified by Eq. (3), we propose here a strategy to learn this general nonlinear model for MR spectroscopic signals using a DAE. Two major issues arise for this learning task. First, deep neural networks require a large number of high-quality training data which is a generally luxury for many MRSI applications. Secondly, proper designs of architecture and training procedure are needed to learn a representation of the signals that can be useful in the imaging process (e.g., reconstruction from noisy measurements).
We address the first issue by combining the physical model in Eq. (3), quantum mechanical (QM) simulations, and experimentally acquired training data. More specifically, the molecule basis functions, φ m (t), can be obtained from QM simulations [45], [46]. These resonance structures are molecular structure specific and can be assumed to be invariant with respect to different subjects and experiments. Meanwhile, we acquired high-SNR, low-resolution CSI data from healthy volunteers. These data were subject to spectral fitting from which empirical distributions of the spectral parameters, i.e., c m , T * 2,m , and δ f m , were obtained. The empirical distributions were fitted to parametric Gaussian distributions to allow the generation of more randomly distributed parameters (Fig. 2). Finally, a large collection of FIDs can then be synthesized by combining φ m (t) and randomly generated c m , T * 2,m , and δ f m for training a DAE. Randomly distributed β corresponding to Gaussian linewidths with 1 Hz mean and 0.5 Hz standard deviation were generated and incorporated into the training data. Additional metabolite-dependent phases can also be incorporated to better capture realistic signal variations which were assumed to follow a uniform distribution for simplicity. To effectively extract the low-dimensional features of these complex-valued signals, a special multi-layer DAE was designed. More specifically, we rearranged all the data by concatenating the real and imaginary parts as inputs. The first hidden layer of the proposed DAE was designed to have more units than the input layer such that the data can be lifted to a higher-dimensional space to i) increase the model's capacity and ii) enable the extraction of complex nonlinear features. This information is fed into the commonly used spectral fitting model (blue box) to generate a large collection of FID data for training, denoted as X.
(Most right column) A DAE network with fully-connected layers was trained to encode X into L-dimensional features which can accurately reconstruct the original data. The mean-squared error was minimized for training. See the texts for detailed descriptions of the network design.
This layer was then followed by a "bottleneck" structure that gradually reduces the dimensionality with the middle layer being linear to ensure a sufficient dynamic range for the final encoded low-dimensional features. The overall data generation and DAE-based learning strategy are illustrated in Fig. 2. More comparison on the network designs can be found in the supplementary materials. The entire trained network is denoted by a function mapping C(·) with fixed parameters, which minimizes the error between input and output, i.e., C(X) − X 2 2 . We will use 31 P spectroscopy data in this work to demonstrate the capability of the proposed approach considering their clearly defined spectral features, absence of nuisance signals and macromolecule baseline (see Discussion for more details on extensions to MRSI of other nuclei). To this end, the QM simulated resonance structures include the commonly observed 31 P-containing metabolites, as described in [46]. The CSI data to estimate the empirical parameter distributions were acquired on a 7T scanner (more details in the Results section).

B. Constrained MRSI Using the Learned Model
With the trained DAE C(·) to extract the low-dimensional features of spectroscopy data, the key issue is how to incorporate this learned model into the imaging process. To this end, we proposed a constrained reconstruction formulation that allows an effective integration of the imaging acquisition model and the learned model through a regularization functional. Specifically, the reconstruction is formulated as follows: where X is a matrix representation of the spatiotemporal function of interest with a size of N × T and each row being a length-T voxel FID, B models the effects induced by B 0 inhomogeneity, denotes a point-wise multiplication, F the Fourier transform, the sampling operator, and d is a vector containing the (k, t)-space data. The first term enforces the imaging acquisition model and data consistency, the second term imposes spatial edge-preserving regularization with D w being a weighted finite-difference operator [20], [22], and the third term incorporates the prior that each reconstructed FID (X n ) should reside in the low-dimensional manifold captured by C(·). · F denotes the Frobenius norm. Compared to the common approach of directly learning the highly complex inverse mapping (through either a single neural network or cascaded neural networks), the proposed method represents a different but rigorous way of integrating physicsbased forward modeling and data-driven priors within a unified formulation. Solving the optimization problem in Eq. (7) is challenging, as it requires nonlinear programming to handle the regularization term containing the learned neural network, which however is not efficient for the linear least-squares terms.
We describe here an efficient algorithm to solve Eq. (7). Specifically, to decouple the linear least-squares and the nonlinear regularization functional, we introduce an auxiliary variable S = B X and rewrite the problem aŝ B represents the conjugate of B (element-wise). With this form, the alternating direction method of multipliers (ADMM) can then be applied to solve this equivalent problem [47]. The ADMM algorithm decomposes the original problem into simpler subproblems which can be solved efficiently. More specifically, the following three steps were performed iteratively in our proposed algorithm: (I) Update X with fixed S (i) and Y (i) with i denoting the iteration number and Y the Lagrangian multiplier (II) Update S with fixed X (i+1) and Y (i) The first subproblem requires general unconstrained optimization solvers due to the highly nonlinear function C(·). Although the original problem has high dimensionality, it can be solved in a voxel-by-voxel fashion as the Frobenius norm is separable. Furthermore, based on the auto-encoder network design, we can derive the expression of the gradients for individual voxels. More specifically, denote as the cost function for the nth voxel, the gradient for f n (·) can be written as where J C ∈ R T ×T is the Jacobian for the neural network mapping C(·), I is a T × T identity matrix and B (n) denotes a diagonal matrix formed by the nth row of B. For a neural network with L layers and a final linear layer, J C can be derived through back-propagation as where W l contains the linear weights for the lth layer and U l 's are diagonal matrices containing the derivatives for the nonlinear activation functions at the lth layer. For instance, if the rectified linear unit (ReLU) is chosen, the diagonal elements in U l are simply unit step functions u( X l−1 W l + b l k ) with b l being the bias vector and k the output index. These separable problems can then be solved in parallel and efficiently using a quasi-Newton algorithm, i.e., the Broyden−Fletcher−Goldfarb−Shanno (BFGS) method [48], [49].
With the updated X (i+1) , the second subproblem is equivalent to solving the following linear equations which can be easily solved by exploiting the special structures of F H H F and D H w D w as suggested in [50]. The iterations were terminated when either i reaches a certain number or the relative changes in X (i+1) and X (i) fall below a given threshold (e.g., 10 −3 ).

D. Single-Voxel Spectroscopy Denoising
The proposed method can also be applied to spectroscopy data acquired without spatial encoding. In this case, the formulation in Eq. (7) can be simplified intô where x is a vector representing the desired noiseless FID and d is the acquired data. This can be viewed as a special case of the problem in Eq. (12), with the following gradient The problem can again be solved efficiently using the BFGS or other nonlinear optimization algorithms. This simplification can be useful for many single-voxel spectroscopy experiments where denoising has mostly been limited to time-domain or wavelet-based filtering using only the noisy data [2], [51]. The proposed method allows an effective and flexible means to incorporate data-driven priors obtained from other high-quality data to improve the results of general single-voxel acquisitions.

E. Training and Other Implementation Details
We generated 300,000 31 P MR spectra as described in Section III.A for model learning. The molecules we considered in this work are 31 P-containing compounds commonly observed during in vivo 31 P MRS/MRSI experiments, i.e., PCr, α-ATP, β-ATP, γ -ATP, Pi, tNAD, PE, PC, GPC, GPE, and MP [46]. The empirical Gaussian distributions for the spectral parameters were estimated from 5 brain CSI data acquired on a 7T scanner (Siemens Magnetom). The T * 2,m values were lower bounded by 5 ms and upper bounded by 200 ms, and the relative concentrations c m were bounded between 0 and 2. For tNAD, GPC, GPE and MP with unreliably estimated spectral parameters (large variances), the literature values were used with a standard deviations at half of the mean values [46], [52]. Metabolite-dependent phases uniformly distributed between −π/4 and π/4 were included. Among the 300K FIDs, 200K samples were used for training and the remaining 100K for testing. We first implemented the DAE in TensorFlow [53], which was constructed with fully-connected layers in both the encoder and decoder with a dimension structure of T −1000− 250 − 100 − L − 100 − 250 − 1000 − T (T = 512). ReLU activation functions were used in the hidden layers except the middle layer. Training was performed using the mean-squared error (MSE) loss and stochastic gradient descent (SGD) with an Adam optimizer [54], with a batch size of 500, learning rate 0.001, 300 epochs and default parameters for the moment estimates. The trained parameters of the DAE were then ported to Matlab and integrated into the proposed formulation and algorithm.

A. Numerical Simulations
Simulation studies have been conducted to evaluate the proposed low-dimensional model learning and reconstruction method. We first investigated the approximation accuracy Reconstructions of a single spectrum using 16-dimensional DAE and subspace, respectively. Improved accuracy offered by the DAE is clearly visible. The black arrows in (b) indicate spectral features lost by the linear subspace approximation but recovered by our learned model. of our learned low-dimensional representation by comparing the dimensionality reduction errors at different model orders (denoted as L) with those from a linear dimensionality reduction. The linear dimension reduction was done by projecting the data onto an L-dimensional subspace determined by PCA of training data ( [12], [22]). The results are shown in Fig. 3. As can be seen, the learned DAE-based model yielded higher approximation accuracy (lower relative 2 errors) than the linear subspace approximation across different model orders (Fig. 3a), especially at a smaller L. As L increases, the difference becomes smaller. The better representation efficiency offered by the learned model is further demonstrated by comparing the approximations of a fitted experimentally acquired in vivo spectrum (Fig. 3b). A low-dimensional subspace approximation (L = 16) leads to noticeable distortion of spectral features which are captured by the learned model.
A numerical phantom was constructed to validate the proposed MRSI reconstruction method using the learned nonlinear model. Specifically, segmented brain tissue compartments, i.e., gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF), were obtained from an experimentally acquired T 1 -weighted structural image. Regional concentrations and linewidth parameters for the 11 molecules mentioned above were assigned based on literature values [46], [52]. 1 These compartmental images (constant concentrations and linewidths in each region) were then combined using the tissue fraction maps (obtained from segmentation) as weightings to generate continuously varying concentration and linewidth maps, which along with the metabolite basis {φ m } were fed into Eq. (3) to synthesize spatially varying 31 P spectra. To mimic a more practical scenario, we simulated voxeldependent random frequency shifts for different molecules (mean zero and a standard deviation of 10 Hz) and residual B 0 field inhomogeneity induced spectral shifts (mean zero and a standard deviation of 10 Hz). A lesion-like feature was also included to evaluate the proposed method's capability in discerning novel spatiospectral features. The concentrations of PCr and ATP were reduced by a factor of 2 in the lesion area compared to the GM, while the concentrations of Pi, PE, and PC were increased by a factor of 3. Figure 4 shows the simulated maps of PCr, α-ATP, and Pi as well as representative voxel spectra from the GM and lesion, respectively. The T * 2 , B 0 inhomogeneity and random frequency shift maps used for simulation are also illustrated. Noisy data were simulated by adding complex white Gaussian noise corresponding to different SNRs for the PCr peak. The SNR is defined as the ratio between the maximum PCr peak amplitude and noise standard deviation (σ ), i.e., A set of denoising results obtained by the proposed method from simulation data with an SNR of 20 is shown in Fig. 5, with comparison to the noisy data (obtained by the standard Fourier reconstruction) as well as results produced by a spatially regularized reconstruction [20] and a subspace constrained reconstruction (SPICE) also with the spatial regularization term [12], [22]. The spatially regularized reconstruction is equivalent to solving Eq. (7) with λ 1 = 0. For the SPICE reconstruction, we used a 24-dimension subspace estimated from the same data used for training the DAE. Normalized mean-squared errors (MSEs) were calculated for each method. The regularization parameters were manually selected to minimize the MSE for the spatially constrained and SPICE reconstructions. As shown by the results (Fig. 5), the spatially constrained reconstruction improved the SNR but tends to oversmooth the images and leads to additional spectral lineshape distortions (third row). The subspace method provided further improved reconstruction (fourth row), while the proposed method produced the best result (last row), both qualitatively and quantitatively, as demonstrated by the reconstructed metabolite maps (obtained by peak integration), voxel spectra and the normalized MSEs. Figure 6 further compares the errors for different methods under various SNRs. The proposed method consistently achieved the lowest errors in all cases. It is worth noting that the spatial regularization serves an auxiliary role to provide additional improvement in the proposed method thus λ 2 was not optimized and set to a small value, which helps avoid oversmoothing. To further investigate , noisy data (Noisy Data), anatomically constrained reconstruction (Spatial), SPICE reconstruction (Subspace) and the proposed method (Proposed), respectively. The normalized MSEs are shown for each case under the method labels. The first three columns compare the reconstructed maps of PCr, α-ATP and Pi, and the last three columns show selected voxel spectra from the GM, WM and lesion areas. As can be seen, the proposed method produces significant SNR enhancement and the lowest error while preserving spatiospectral features. Fig. 6. Reconstruction errors for different methods (identified by different colored curves) under various SNR levels (SNR defined in texts). As can be seen, the proposed method consistently yields the lowest errors.
the effects of regularization parameters, we performed reconstructions with different values of λ 1 and λ 2 at an SNR of 20. The errors are compared in Fig. 7 and Fig. S2. As expected, a too small regularization parameter has minimal denoising effects (larger errors) while a too large value can also increase the error. But the reconstruction performance remains robust to a large range of parameter values.

B. Experimental Studies
The performance of the proposed method under practical experimental conditions has been evaluated using brain 31 P-MRSI data acquired from healthy volunteers on a 7T system (Siemens Magnetom) using a double-tuned 31 P-1 H surface coil. Data were acquired using a CSI sequence [55] with the following parameters: TR/TE = 170/2.3 ms, field-ofview (FOV) = 200×200×100 mm 3 , matrix size = 32×32×10 (corresponding to a nominal voxel size of approximately 6×6×10 mm 3 ), and 512 FID samples at 5 kHz spectral bandwidth (BW). The total acquisition time was 26 min with elliptical sampling and two signal averages. Gaussian distributed zeroth-order phases were included in generating the data for training the DAEs used for in vivo reconstruction (with mean and standard deviation estimated from the experimental data). The acquisition delay was also considered by defining Fig. 8. Reconstructions from the in vivo data acquired using a surface coil. The first row contains the anatomical images within the imaging volume (T 1 w). The maps of PCr, α-ATP and γ-ATP for the corresponding slices are shown in the subsequent rows and representative voxel spectra in the last two rows. Results from the Fourier reconstruction were on the left while those from the proposed method on the right. Apparent SNR enhancement in both the metabolite maps and spectra can be observed for the proposed method. The visual differences in the FOVs between the anatomical and 31 P images were due to the different sensitivity profiles for the 1 H and 31 P channels.
proper sampling time vectors during data generation and reconstruction. Spatiospectral reconstructions obtained by the conventional Fourier reconstruction and the proposed method are shown in Fig. 8. The Fourier reconstruction produced very noisy results; the proposed method achieved significantly improved SNR and spectral quality, revealing spatiospectral features previously concealed by noise.
Another data set was acquired using a double-tuned volume 31 P-1 H coil on the same 7T system with matched TR, TE and spectral BW. The FOV was modified to 220×220×150 mm 3 and matrix size to 32×32×8 (total acquisition time of about 25 min). The spatiospectral reconstruction results from this data are shown in Fig. 9 and Fig. S3 (Supplementary Materials). Figure 9 compares the results from the SPICE (subspace) reconstruction and the proposed method. As can be seen, the proposed method produced metabolite maps with higher SNR and qualitatively better spectra. Note that while the volume coil has larger and more uniform coverage, the raw data from this experiment were considerably noisier than the previous case. However, the proposed method was still able to achieve significant SNR improvement. A more comprehensive comparison of the noisy Fourier reconstruction, the SPICE reconstruction and the proposed method is shown in Fig. S3.

C. Special Case: Single-Voxel Spectroscopy Denoising
As discussed in Section III.D, the proposed method can also be applied to denoise single-voxel spectroscopy (SVS) data by solving the optimization problem in Eq. (16), which allows the incorporation of data-driven priors to improve the SNR of general single-voxel acquisitions. Figure 10 shows an example of such results to demonstrate this capability. Specifically, a noisy brain 31 P spectrum was acquired in vivo and denoised by the proposed method. In addition, a "reference" spectrum was generated by spectral fitting of this data (the residual was inspected to ensure high-quality fitting) and compared to the denoising result. As can be seen, the proposed method achieved effective noise reduction with excellent preservation of spectral features as compared to the fitted spectrum. The peaks of PCr, α-ATP, γ -ATP, and even the weak signals of β-ATP and Pi were well recovered, although the performance for some of the weaker and overlapping peaks (e.g., GPC and GPE) can be further improved. A more comprehensive and quantitative evaluation of the proposed method for SVS studies will be done in future research.

V. DISCUSSION
The proposed method has several key differences compared to other learning-based denoising/reconstruction strategies. First of all, by not directly learning the inverse transform that maps the noisy, artifact containing data to the desired signals or parameters, we simplified the learning problem and allowed the constructed DAE to focus on learning to extract the physiologically meaningful/molecule-specific lowdimensional features of spectroscopic data (for a particular field strength) instead of the nuisance noise and artifacts. Second, the proposed formulation represents a different way to incorporate deep learning into the imaging process that effectively combines the physics-based data acquisition model and learned signal model. It offers higher flexibility and can work with different noise levels, acquisition parameters and other spatiospectral constraints without having to retrain the network. Third, our algorithm directly solves the resulting optimization problem as opposed to previous works which mapped an iterative process to a cascade of networks that approximately solve a similar regularized reconstruction formulation (e.g., [56], [57]). These "unrolling" based methods can be efficient but still fall into the category of directly learning the entire inverse mapping and can be sensitive to SNR levels and acquisition designs.
An important issue with the proposed method is the selection of regularization parameters. As shown by our comparison of reconstructions obtained with different combinations of λ 1 and λ 2 (Figs. 7 and S2), the denoising performance remains robust to a large range of parameter values. Furthermore, the selection of λ 1 can be done by performing single voxel denoising (due to the nature of this constraint), which according to our simulation and experiment studies serves as a good initial guess for the overall reconstruction problem. The other parameter, λ 2 , can be chosen based on discrepancy principle and then fine tuned together with λ 1 by visual inspection of spatiospectral reconstruction quality. Meanwhile, a careful choice of λ 2 is not required for effective denoising as the learned model is the main contributor, as shown by the result with λ 2 = 0 in Fig. S2. More sophisticated parameter selection strategies can also be explored in future research.
Several other issues are worth investigating in future research. In particular, the network for learning the nonlinear Fig. 9. Results from the data acquired using a volume coil. The maps of PCr and α-ATP are shown for the SPICE (Subspace) and the proposed method (Proposed), along with spatially-resolved spectra from voxels marked by red symbols in the T 1 -weighted anatomical images shown on the most left (T 1 w). As shown by the metabolite maps and spectra, the proposed method yields a qualitatively better reconstruction. The spectra for a voxel from outside the brain (diamond-shaped symbol) exhibit less noise-induced bias for the proposed method than the subspace reconstruction, implying less modeling errors.
low-dimensional representation can be further optimized. For example, structures for better handling complex-valued data, different layer designs (convolutional versus fully-connected), asymmetric encoder and decoder networks, and choices of activation functions may be studied. Characterizations of the connections between network complexity, dimensionality of the nonlinear features, and the complexity of the spectral functions will be topics of both theoretical and practical interest. High-quality in vivo data can be acquired to improve the estimation of spectral parameter distributions or directly augment the neural network training data, which should make the learned model more adaptive to experimental variations not captured by synthesized data alone. One unique advantage of the proposed nonlinear model is that the dimensionality is not affected by the range of the spectral parameters, thus more generalizable to patient populations with larger parameter changes. However, if there exist novel molecular features in certain patient groups that are not present in the training data, patient-specific data need to be collected to adapt the learned model to capture such features. A more general lineshape distortion function h(r, t) can be considered, and incorporated into the system encoding operator and separated from the ideal spectral signals. For the proposed algorithm, the most computationally intensive step is solving the subproblem in Eq. (9), which however is highly parallelizable and should greatly benefit from translating the current implementation to parallel computing platforms.
While we demonstrated the capability of the proposed approach using 31 P-MRSI data, application to other nuclei is possible. Extending the proposed method to 1 H-MRSI will require addressing a number of important issues such as residual nuisance water/lipid signals, presence of macromolecule baseline in short-TE data, more spectral features, and potentially stronger lineshape distortion due to larger intravoxel B 0 inhomogeneity. These issues, although nontrivial, we believe Fig. 10. Application of the proposed method to denoise a single-voxel 31 P spectrum. The noisy data and denoised spectra are shown in the first and second rows, respectively, with comparison to a noiseless spectrum (last row) obtained by performing a spectral fitting of the noisy data. are addressable. For example, a separate nonlinear lowdimensional model can be learned for the macromolecule signals by exploiting their unique spectral structures and lineshapes [34], [58]. This model can be incorporated into a modified reconstruction formulation that imposes the learned models for metabolites and macromolecules separately.
It is also worth mentioning that as the proposed method integrates the data acquisition model and the learned nonlinear model through a regularization formalism, it can be readily extended to incorporate other spatiospectral constraints (e.g., spatiospectral sparsity using non-quadratic regularizations) and even potentially learned spatial priors. The improved SNR can allow faster speeds (e.g., with less number of averages and/or shorter TRs) and higher resolution acquisitions. Although we only demonstrated denoising reconstruction, the proposed formulation can be extended to other scenarios such as sparse sampling in the (k,t)-space with modifications of the sampling operator. These directions are currently being pursued and could lead to new opportunities in synergizing physics-based modeling and machine learning to push the resolution and SNR limits of in vivo MRSI.

VI. CONCLUSION
We have presented a new method to model MRSI data by learning a low-dimensional nonlinear representation and use the learned model for spatiospectral reconstruction. The model was learned using a deep autoencoder based neural network which can accurately capture the inherent low-dimensional features of high-dimensional spectral variations and enable effective dimensionality reduction. The proposed constrained reconstruction method incorporates the learned model through a regularization formalism which was solved by an efficient ADMM-based algorithm. Significantly improved spatiospectral reconstruction over conventional methods achieved by the proposed method has been demonstrated using both simulation and experimental data. The proposed method represents a new means to incorporate deep learning into the imaging process and may be extended in various ways including improved network designs, training data generation, choices of error metrics and combinations with other spatiospectral constraints.