PINQI: An End-to-End Physics-Informed Approach to Learned Quantitative MRI Reconstruction

Quantitative Magnetic Resonance Imaging (qMRI) enables the reproducible measurement of biophysical parameters in tissue. The challenge lies in solving a nonlinear, ill-posed inverse problem to obtain the desired tissue parameter maps from acquired raw data. While various learned and non-learned approaches have been proposed, the existing learned methods fail to fully exploit the prior knowledge about the underlying MR physics, i.e. the signal model and the acquisition model. In this paper, we propose PINQI, a novel qMRI reconstruction method that integrates the knowledge about the signal, acquisition model, and learned regularization into a single end-to-end trainable neural network. Our approach is based on unrolled alternating optimization, utilizing differentiable optimization blocks to solve inner linear and non-linear optimization tasks, as well as convolutional layers for regularization of the intermediate qualitative images and parameter maps. This design enables PINQI to leverage the advantages of both the signal model and learned regularization. We evaluate the performance of our proposed network by comparing it with recently published approaches in the context of highly undersampled $T_{1}$-mapping, using both a simulated brain dataset, as well as real scanner data acquired from a physical phantom and in-vivo data from healthy volunteers. The results demonstrate the superiority of our proposed solution over existing methods and highlight the effectiveness of our method in real-world scenarios.


I. INTRODUCTION
Magnetic Resonance Imaging (MRI) is a well-known method and an indispensable tool in clinical practice.However, the most commonly used protocols are qualitative, where the contrast in the images is determined by a mixture of different tissue parameters and acquisition properties.To overcome this issue, quantitative MRI (qMRI) techniques such as MR relaxometry have been proposed, which allow for the quantification of specific (bio)physical parameters of tissue such as spinlattice (T 1 ) and spin-spin relaxation times (T 2 ).These quantitative measurements allow greater comparability between different devices at different sites and can be used to create more specific clinical guidelines.Typically, in qMRI series of qualitative images with different acquisition parameters This paragraph of the first footnote will contain the date on which you submitted your paper for review This work was supported in part by the Metrology for Artificial Intelligence for Medicine (M4AIM) project that is funded by the German Federal Ministry for Economic Affairs and Climate Action (BMWi) in the framework of the QI-Digital initiative.This work was funded in part by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), Grant 372486779.F. F. Zimmermann (e-mail:felix.zimmermann@ptb.de),C. Kolbitsch, P. Schuenke, and A. Kofler are with Physikalisch-Technische Bundesanstalt (PTB), Braunschweig and Berlin, Germany The problem to be solved in quantitative MRI is obtaining maps of physical parameters from undersampled measurements in Fourier space.Most previous methods consider the two steps of A) reconstructing artifact-free qualitative images and B) obtaining the parameter maps as two disjoint and independent steps (see text for details).We propose a novel end-to-end method making use of prior knowledge about the physics of data acquisition and learned regularization.
are recorded and a non-linear inverse problem is solved to obtain the tissue parameters.The clinical application of MRI and especially qMRI is challenging due to the relatively long measurement times.Hence, the acquired data is typically undersampled in Fourier space (k-space) to reduce the scan time at the cost of making the problem more ill-posed.This leads to artifacts that need to be accounted for by adopting appropriate regularization methods.Both for the reconstruction of purely qualitative images, as well as quantitive parameter maps, different approaches utilizing parallel data recording with multiple receiver coils [1], compressed sensing, and varying regularization schemes [2]- [5], model-based reconstruction [6]- [8], and combinations thereof have been proposed.More recently, neural networks were introduced as learned image reconstructions and as data-driven regularization methods [9]- [13], or for mapping reconstructed images to quantitative maps [14]- [16].
The existing data-driven approaches to solve the qMRI inverse problem, i.e. to obtain the parameter maps from the acquired k-space data, can be broadly categorized as follows.The first type of approach splits the problem into two disjoint tasks: A) image reconstruction and B) parameter regression (see Figure 1).The second type jointly solves both tasks to obtain the parameter maps directly from the k-space data.For the first task of qualitative image reconstruction, datadriven methods have made great progress: State-of-the-art results can be achieved by incorporating deep learning into an unrolled model-based reconstruction by iteratively applying a neural network as a form of learned regularization as well as enforcing consistency with the recorded k-space data and the linear signal model [9], [10], [17]- [19].The second task, parameter regression, is then either carried out by classical regression using non-linear solvers [20]- [23], dictionary matching [24], or by learned methods.Within the latter, a further distinction can be made between pixelwise mappings, such as MyoMapNet for T 1 -mapping [15], [16] and methods using the information of multiple pixels via convolutional neural networks (CNNs), thereby implicitly learning a suitable spatial regularization [12], [14], [25].MANTIS [14], for example, uses a UNet [26] to predict the parameter maps (originally T 2 maps) from qualitative magnitude images with severe undersampling artifacts.DeepT1 [12], proposed for myocardial T 1 -mapping, consists of two separate data-driven parts.First, an iterative reconstruction using recurrent CNNs [18] and data-consistency layers for image reconstruction is trained to reconstruct artifact-free qualitative images.Second, a UNet is trained to predict the quantitative maps from the magnitude of the reconstructed images.A similar approach is used in the recently proposed CoRRECT [13] for motion-corrected R * 2 -mapping.It also uses an unrolled, CNN-regularized image reconstruction, followed by a parameter mapping UNet, with the major difference that both parts are trained simultaneously end-to-end.However, most learning-based methods entirely ignore prior knowledge about the physics of the signal and acquisition model relating the parameters with the quantitative images or use it only during training as part of a loss function [14], [25], [27].Few learned methods are trained in an endto-end manner, i.e. from k-space data to parameter maps, and fully incorporate the physical model at inference time.
PGD-Net [28] unrolls a proximal gradient descent scheme with an approximated signal function, which is implemented as a pre-trained neural network and used as a differentiable proxy of the true MR fingerprinting signal function.DOPAMINE [11] unrolls a first-order gradient descent scheme that includes an implicitly learned regularizer where small residual CNNs operating on the different parameter maps are used to learn the gradient of a regularizer.Neither employs image-space regularization nor makes use of the particular form of the forward model and both only use shallow networks.
Combining the knowledge about the physics of the acquisition model by data-consistency layers with deep learning regularization has greatly improved qualitative image reconstruction [10], [17], [29].Thus, we want to investigate a novel approach to incorporate the full knowledge of the physics of the signal model into a learned qMRI reconstruction, while employing learned regularization in image-and parameterspace.

A. Our Contributions
Our main contributions to the field of deep-learning-based quantitative imaging consist of • Validation of the proposed approach on the task of T 1mapping.We show the transferability of the network trained solely on synthetic data to in-vivo measurements.We also demonstrate the usage of implicit differentiation for efficient evaluation of gradients with respect to all inputs of the commonly data-consistency layers of linear inverse problems as well as the non-linear optimization layer.II.METHOLOGY First, we introduce the notation and formulate the inverse problem to be solved in quantitative imaging.Next, we give a short introduction to differentiable optimization.We present implicit differentiation as a technique to obtain the Jacobian of solutions of optimization problems with respect to the parameters of the problems.Finally, we introduce our proposed PINQI solution to the quantitative imaging problem using differentiable optimization.

A. Problem Statement
By p(r) = [p 1 (r), . . ., p N P (r)] T we denote for each location r the N P relevant physical parameters to be determined, such as relaxation times.For notational brevity, we will write p ∈ R Np•N for the vector representation of the parameters at N = N x • N y discrete 2D positions.In qMRI, one considers the forward model given by where k ∈ C Nr•Nc•N k is the vector of recorded k-space data, A is a linear acquisition operator, q a (non-linear) signal model, and e random noise.As the acquisition is performed N r times with different sampling parameters and changes in the MR sequence influencing the signal model, the forward model is given by A Here, the encoding operators at each acquisition A i : C N → C Nc•N k typically are undersampled Fourier operators.These can be written as A i = S i FC with F a being two-dimensional Fourier transform and S i undersampling operators, masking out all but N k discrete points in k-space.The undersampling masks are typically varied between different acquisitions.For N c receiver coils acquiring data in parallel, the coil sensitivity operator C expands each image to N c different views acquired by the receiver coils by pixelwise multiplying with their respective spatially varying complex-valued sensitivity maps c c ∈ C N , often normalized such that for each pixel ( As the inverse problem of obtaining p * is typically ill-posed, instead the regularized problem with a regularizer R has to be considered.

B. Differentiable Optimization
In many computer vision tasks, data-driven approaches with differentiable layers solving an inner optimization problem within a larger neural network have been used successfully [30]- [32].Yet so far, in medical imaging, optimization of an inner problem is mainly only used as data-consistency layers in unrolled reconstruction networks [10] for linear problems.
In general, to incorporate optimization problems as a layer into a larger network, which can then be trained end-toend with gradient descent algorithms, the gradients of the solution of the inner problem with respect to the inputs must be calculated.For data-consistency layers in linear problems, this can, for example, be achieved by automatic differentiation through the steps performed by the inner optimizer if each operation is differentiable [9].The downsides are 1) all intermediate results have to be kept available, resulting in a linear dependence of the memory required during training on the number required iterations, and 2) algorithms for solving nonlinear problems typically contain non-differentiable operations [23].Alternatively, for linear problems, the gradient of the output with respect to the previous solution estimate can be efficiently calculated with matrix calculus [10], [33].A more general approach to obtain gradients through an optimization layer uses a technique known as implicit differentiation [34].We first revise the concept before applying it in Section II-D.
Let F α : R n → R be the twice continuously differentiable objective of the inner optimization problem solved by the layer, α ∈ R p the parameters to backpropagate the gradient for and f : R p × R n → R n with f (α, x) = ∇ x F α (x) be the gradient of the objective function.For some x * ∈ R n to be a minimizer of F, the condition has to be fulfilled.The well-known implicit function theorem (IFT) in a suitable notation [35], [36] states: Theorem 1 (Implicit Function Theorem): Let f : R p ×R n → R n be a continuous differentiable function, α 0 ∈ R p and x 0 ∈ R n such that f (α 0 , x 0 ) = 0 with non singular Jacobian ∂f ∂x (α 0 , x 0 ) ∈ R n×n .Then, there exist an open set S ⊂ R n with α 0 ∈ S and a unique continuously differentiable function x * : S → R n such that x * (α 0 ) = x 0 and f (α, x * (α)) = 0 for all α ∈ S. Hence, Equation (4) implicitly defines the minimizer x * as a function of α.Differentiating both sides of Equation (4) yields where x * without explicit dependency on α it is treated as a fixed value.By rearranging we obtain an expression for the Jacobian of the solution mapping, i.e Therefore, by the chain rule, given the gradient of an outer loss ∂L ∂x as a row vector at x = x * , the row vector ∂L ∂α at α = α 0 can be written as Finally, dropping the explicit points of evaluation in the notation, we obtain the column vector where we have used the symmetry of the Hessian ∂ 2 F ∂x∂x T .This gives the general blueprint to implement the backpropagation step for optimization layers in deep-learning frameworks.Either the analytical Hessian or the vector-Hessian-product functionality of the framework can be used in conjunction with an iterative solver for linear problems to approximately obtain (9).For an approximate solution with an error with norm ϵ, it has been shown that the error of the estimated gradient is O(ϵ) [37].The derivative with respect to the parameters of the gradient of the inner optimization loss used in Equation ( 9), ∂ 2 F ∂α∂x , can also either be calculated analytically or by autograd functionality.

C. Proposed Reconstruction Network PINQI
For solving the qMRI inverse problem, we propose the following, physics-informed end-to-end approach PINQI: First, we choose the regularization in Equation ( 3) such that the objective is finding with p reg a regularizing prediction for the parameters.We introduce an auxiliary variable y := q(p) and include equality by a quadratic penalty constraint as well as an additional regularizing prediction for the images y reg , min , where all λ are positive regularization strengths.Next, we split Problem (11) into two subproblems which we solve in an alternating fashion [38] within our unrolled reconstruction network.
1) Linear Subproblem: Optimization in y: The subproblem of obtaining artifact-free qualitative images is similar to the problem commonly solved in MR image reconstruction networks [10] by the data-consistency blocks, but extended by the model-based [6]- [8] last term which penalizes a discrepancy between reconstructed qualitative images and predictions based on the estimate for the quantitative parameters.As we consider multi-coil imaging, the minimizer of Equation ( 12 We alternate between solving two subproblems, Problem 1 is a linear data-consistency problem and solved by a differentiable conjugategradient block.Subproblem 2 is solved by a differentiable non-linear optimization block.Y θ denotes a residual UNet operating on qualitative images, P θ the parameter prediction UNet.The predictions of these subnetworks are used as regularizers with (learnable) strength λy and λp, respectively.Consistency between both subproblems is relaxed to a quadratic penalty weighted by λq.For more details regarding the formulations of the two subproblems, see the main text.
2) Non-Linear Subproblem: Optimization in p: By introducing λ p = λp /λq, the second subproblem can be written as finding Due to the non-linear signal function q, this subproblem is non-linear.Hence, we introduce the non-linear optimization layer depicted in Figure 3.This layer uses L-BFGS [22], [23] to approximately solve the problem in the forward pass.
Depending on the concrete signal model, within this layer, different solvers and preconditioning techniques might be chosen [39] instead.Finally, we construct our proposed PINQI as shown in Figure 2 by alternating between both subproblems for a fixed number of iterations.In each iteration i = 1 . . .T , we obtain predictions for the qualitative, artifact-free images y i and for the quantitative parameters p i .For regularization, we use predictions obtained by trained subnetworks with parameters θ ∈ R n as y i reg = y i θ = Y θ (y i−1 , i) and p i reg = p i θ = P θ (y i , i), respectively.All learnable parameters of the UNet subnetworks are shared between iterations.

D. Gradients of the Subproblem Solutions
We propose to train PINQI end-to-end, i.e. we construct an objective function L which depends on the final predicted real-valued quantitative parameter maps.Also, we aim to use gradient descent-based algorithms to optimize the learnable network weights θ and use backpropagation to find the direction of the steepest descent.Thus, we need to be able to calculate the gradients of the solutions found by the solvers of both subproblems with respect to all variables depending on θ.We achieve this by implementing the solvers as differentiable optimization layers as presented in Section II-B.
First, we specialize the general concept to a regularized nonlinear regression problem, i.e. an inner problem of obtaining p * = arg min p F(p) with an objective as used in subproblem 2. Suppose the gradient of the inner objective at x * is continuously differentiable and its Jacobian is invertible.Then, the required gradient of L with respect to the trainable weights can simply be found by applying Equation (9).The resulting equations for the propagation of gradients to λ, p reg and y are shown in Figure 3, which summarizes our proposed non-linear regression layer for this subproblem.Within this layer, we use CG to approximate g, which denotes the product of the inverse Hessian of F with the gradient of L with respect to the output of the solver.We employ automatic differentiation to compute the Hessian-vector product applied inside the CG algorithm, as well as to compute the derivative (with respect to θ) of the gradient (with respect to p) of the inner loss.In Section III-D we investigate the influence of including a single instance of our differentiable non-linear optimization layer at the end of an otherwise unchanged qualitative image reconstruction network.
Similarly, backpropagation through the solver of the linear image reconstruction subproblem 1 can also be efficiently calculated, as demonstrated in [10] for gradients with respect to y reg .Extending upon these results, we use implicit differentiation to derive the gradients with respect to all inputs: For an inner optimization objective of the form and a solution y * ∈ C N , that fulfills the necessary condition ∇F (y * ) = 0, application of Equation ( 9) and defining gives the gradients of the outer loss (in Wirtinger calculus [40]): For multi-coil MRI, the linear operator defining the data recorded in a single acquisition by the j-th coil is of the form A j = SFC j , and, can be used to backpropagate the gradient of the outer loss to the j-th estimated unnormalized sensitivity map c j .Here ⊙ denotes the Hadamard product, g the complex conjugate of g, and k j the data recorded by the j-th coil.

III. APPLICATION TO T 1 -MAPPING
The proposed PINQI can be used for many quantitative MR imaging techniques by adapting q to the respective signal model and A to the acquisition operator.
For our experiments, we chose a T 1 -mapping of the brain using a saturation recovery sequence as a typical, yet conceptually simple, qMRI problem.Here, the signal model q i at the i-th acquisition is given for each pixel by with Re M 0 and Im M 0 denoting the real and imaginary part of the complex initial magnetization M 0 , T 1 the longitudinal relaxation time, and τ i the i-th saturation recovery time, i.e. the delay between the magnetization preparation pulse and the data acquisition.As encoding operator A, we chose a Fourier transform with Cartesian sampling, 4-8-fold undersampling, and 8 receiver coils.
All code relating to a PyTorch implementation of PINQI, the optimization layer, our implementations of the reference methods, the synthetic data generation as well as the MR sequences will be made available after peer-review at github.com/fzimmermann89/PINQI.

A. Training Dataset and Data Aquisition
We utilized synthetic data for training and validation and conducted testing on both synthetic data and real scanner data.
1) Synthetic Data: The synthetic data was generated from the BrainWeb dataset [41], which consists of threedimensional segmentation masks of 20 healthy human brains, which we split into 16/1/3 for training/validation/testing.During the training process, we randomly assigned anatomically plausible values [21] for M 0 and T 1 to each of the 11 tissue classes in every sample on-the-fly.These values were combined with axial slices of the masks, resampled to 192 × 192 pixels, into initial synthetic parameter maps.To increase spatial variability and prevent overfitting to piecewise constant images, we augmented the T 1 -maps by multiplying them with low-variance 2D polynomials.Similarly, we augmented the M 0 -maps by a random spatially slowly varying complex phase and bias field.As additional augmentations during training, we performed flips and rotations <10°.The resulting parameter maps were considered ground truth labels.In addition, we generated sample-specific masks indicating the presence of brain tissue within the maps.During training, these masks were used to weigh down the losses outside the brain region.During the testing phase, all quantitative measures used to report the performance of the methods were restricted to the relevant brain tissue.
The saturation recovery times were set as either 0.5 s, 1 s, 1.5 s, 2 s, and 8 s (synthetic comparison, phantom, and invivo experiments), or as 0.5 s, 0.7 s, 0.9 s, 1.1 s, 1.3 s, 1.6 s, 2 s, and 8 s (ablation study).We used variable density 1Dunder-sampling with 8 randomly generated [42] bird-cagelike receiver coils with random orientation.For each coil, we used a Gaussian amplitude profile with randomly varying halfwidth in [0.2, 0.5] times the field-of-view and a random phase modulated by a slowly varying random 2D polynomial.The undersampling patterns were randomly chosen for each recovery time and for each sample.Finally, for each sample, we added complex Gaussian noise with randomly chosen standard variation σ ∈ (0.001, 0.04) to simulate noisy measurements.
Our synthetic validation and testing datasets consisted of a fixed set of generated labels and simulated noisy undersampled k-space measurements.Here we used the ground truth sensitivity maps, thus assume a fully known forward model.
2) Data Aquisition: For further validation of our proposed method, we acquired two sets of scanner data on a 3 T MRI system (Siemens Verio): 1) data from a physical phantom and 2) data from the brains of healthy volunteers.The examination was approved by the local ethics committee and is in accordance with the relevant guidelines and regulations.Written informed consent was received from all subjects prior to the examination.The phantom consisted of 9 tubes filled with liquids prepared to have T 1 times in the range of approx.250 ms-1750 ms.We used adiabatic saturation pulses, 5 saturation recovery times, spoiled GRE readouts with inside-out order and 1 mm×1 mm×8 mm resolution, 192 × 192 matrix size, 6 deg flip angle, a 32-channel head coil.For each saturation time, the 1D Cartesian undersampling was chosen randomly, analogously to our training.As a reference, we performed fully sampled saturation recovery measurements with 18 delay times and obtained the T 1 values by pixel-wise regression on the reconstructed images.Coil sensitivity maps were either estimated from the autocalibration region, i.e. the fully sampled 12/10/8 central lines, of the 4x/6x/8x undersampled image with the longest τ , or, for the reference, from the fully-sampled image [43]- [45].All MR sequences were implemented using the vendor-agnostic Pulseq framework [46].

B. PINQI: Implementation Details
We set the number of alternations between the subproblems in our implementation of PINQI as T = 5 (empirically chosen).In the non-linear subproblem, we used the L-BFGS algorithm [22], [23] with a trigonometric transformation of the variables [39] to enforce the bounds Re(M 0 ) ∈ (−2, 2), Im(M 0 ) ∈ (−2, 2), and R 1 := 1/T 1 ∈ (−1s −1 , 20s −1 ).These bounds are much wider than any plausible predictions and only served to increase stability at the beginning of training.We initialized the solver in the first iteration with p reg , in subsequent iterations with the result obtained in the previous iteration.The linear problems in Equation ( 12) and Equation ( 9) were approximately solved by CG [23] with the norm of the residual < 10 −6 as stopping criterion.Both regularizing networks, P θ and Y θ , are UNets [26] with residual blocks, each consisting of two SiLU [47] activated convolutions (window size 3) and group normalizations (group size 16) [48].In Y θ , the two convolutions of each block handle all temporal points as batched samples, and a third convolution operates along the temporal direction, handling all spatial points as batched samples [49].In both UNets, we condition on the iteration of the unrolled reconstruction by performing learned projections to scale and shift values for each feature map [50], [51] after the first convolution in each encoder and decoder block.The downscaling in the UNet is done with stride 2, kernel 2 max-pooling operations in the spatial dimensions.Each upscaling is performed with 2x bilinear interpolation in the spatial dimensions followed by a 3x3 convolution.The number of output features at the different layers in Y θ is empirically chosen as (16,32,48,64), and as (32,64,96,128) in P θ .This results in 597,323 and 2,235,308 learnable weights, respectively.We found initializing the network such that each block mainly operates pixel-wise [52] improved training stability.
All regularization strengths λ p,i , λ y,i , and λ q,i are iterationdependent learnable parameters.We enforced the positivity of these parameters by setting λ i = log 1  5 (1 + exp{5 λi }) and optimizing for each λi .We empirically chose an initialization corresponding to λ p,i = 3, λ y,i = 0.1, and λ q,i = 0.1 + 0.05i.The training was performed by minimizing the MSE between the predicted quantitative parameters, R 1 = 1/T 1 , Re(M 0 ), Im(M 0 ), and the corresponding target parameters.Deep supervision [53], [54] was applied by also incorporating the MSE loss of the predicted parameters during all previous iterations, strongly weighted down by a factor of 0.05.We pretrained P θ for one epoch with random linear interpolation between the zero-filled reconstructions of the noisy k-space data and the ground truth (noise-and artifact-free) qualitative images as input with MSE loss on the estimated parameters.
The training was performed for 80 epochs.Both training phases were performed using the Adam optimizer [55] with a maximum learning rate 0.004 for the UNet parameters and 0.001 for all λ's, a linear warmup and cosine learning rate schedule [56], weight decay [57] 0.01, and a batch size of 16.

C. Methods of Comparison
Besides our proposed PINQI approach, we provide results of our implementations of four recently published learned reconstruction methods trained and tested on the same dataset.As the general superiority of the selected learned comparison methods to non-learned methods has been established by the respective authors [10]- [14], we abstain from comparing against non-learned baselines.a) DeepT1 [12]: Our reimplementation of DeepT1 tries to stay as close as possible to the provided description in the original publication while adapting to our synthetic data and the saturation recovery signal model.The task of magnitudeonly image reconstruction by an unrolled reconstruction uses a recurrent CNN (297,538 learnable weights).We replaced the original data-consistency layers, which were only suitable for a single coil setting, with CG-based ones.The parameter estimation from the reconstructed magnitude images was done with a UNet (20,547,906 learnable weights).
b) CoRRECT [13]: We implemented the network of CoRRECT as described in the original publication, using a CNN (187,010 learnable parameters) in the unrolled image reconstruction and a UNet (46,741,698 parameters) for the parameter estimation.We adapted the method to our setting.Besides a supervised reconstruction loss and a self-supervised parameter estimation loss, we also included a supervised MSE loss of the quantitative parameters.This addition was motivated by the availability of ground truth labels in our setting and improved the results on our validation set, ensuring a fair comparison.For combining the loss terms, we empirically found weighting factors of 1, 0.1, and 1, respectively.c) MANTIS [14]: We adopted the MANTIS method for T 1 -mapping and training on our simulated dataset.As MAN-TIS takes zero-filled magnitude images as input, it cannot predict the phase of M 0 .The particular UNet architecture proposed for MANTIS uses 29,248,258 learnable parameters.We used the proposed combination of the MSE of the quantitative parameters and in k-space as loss and tried different weighting of the latter.Although we noticed only a minimal influence on the performance on the validation set, we report results for a network trained with the optimal weighting found in our experiments of 0.01.d) DOPAMINE [11]: We modified the DOPAMINE method for our signal model and supervised training.Our implementation performs 10 iterations of unrolled gradient descent.Each overall stepsize and each weighting of the neural network predicted step were scalar trainable parameters with softplus enforced positivity.The two regularizing CNNs and the network for starting point prediction combined had 317,894 trainable parameters in total.The training was performed on MSE of p.As the training was highly unstable, we slightly modified DOPAMINE by soft-clipping each update in the unrolled gradient descent via a scaled tanh.A clipping threshold higher than the parameters' dynamic range over the training data was sufficient to stabilize training.We emphasize that during inference on the validation set, each update step proposed by the network was much smaller than allowed by the clipping, thus our modification should not negatively influence test performance.

D. Ablation Study
Ablations of different parts of our proposed unrolled network are used to demonstrate the incremental benefits of the major components of our method.For each ablation, the respective learning rate of the parameters of the UNet was tuned based on the validation set MSE.The study was performed for eight acquisition times and 8-fold undersampling.We use an unrolled model-based image reconstruction with a UNet-predicted regularization (5 iterations of network application and data-consistency) [9], [10].The quantitative parameters are predicted by a separate UNet.Both UNets have the same architecture as the corresponding ones in PINQI.We performed a pretraining of the reconstruction network for 5 epochs (optimizing on MSE of the complex-valued qualitative images) followed by end-to-end optimization for 60 epochs.This baseline approach does not use the non-linear signal model q at inference time but benefits from architectural improvements to the CNNs over the comparison methods.We remove the proposed differentiable non-linear optimization layer for subproblem 2 from our network and instead directly consider the network prediction of the parameters as the solution of Equation ( 13).Note that we still iterate between the subproblems and train end-to-end.This ablation results in a learned reconstruction scheme similar to, for example, PGD-Net [28].
e) Gradient Descent: Instead of the proposed L-BFGSbased optimization with implicit gradient calculation, we perform 5 steps of gradient descent on Equation (13) in each alternation.We use standard backward-mode autograd to obtain the gradients of the steps and make the stepsizes for each alternation trainable parameters.The memory consumption during training is approximately 10% higher than in PINQI.
f) Fixed p reg and y reg across all iterations: We only apply the image-space and parameter-space UNets once, i.e. p reg = P θ (y 0 ) and y reg = Y θ (y 0 ), instead of updating the UNet predictions in each iteration of the unrolled reconstruction.While the number of training parameters stays constant, this significantly reduces the computation necessary in a single forward pass.We keep the number of iterations T = 5, iterating between both subproblems to solve Equation (11).
g) Single Iteration / Two Iterations: We set either T = 1 or T = 2, performing either a single iteration or two iterations of PINQI, respectively.Note, that T = 1 differs from the experiment in f), where we still alternate between solving two subproblems.
For further investigation of the advantage of incorporating differentiable non-linear regression into the reconstruction, we used the same UNet architecture as used in PINQI for Y θ also in an unrolled reconstruction of the qualitative images y and compare the following two cases: h) Neural Network Reconstruction and Separate Regression: We optimized the network by minimizing the MSE of the qualitative images.At inference time, we used the resulting reconstructed images to perform a non-linear least-squares regression using BFGS to obtain p.
i) Neural Network with Non-linear Optimization Layer for End-to-End Regression: We include the non-linear regression by means of the proposed optimization layer (Figure 3 in the network.As a result, we were able to train the network in an end-to-end fashion, minimizing the MSE of the quantitative parameters.

A. Comparison with Reference Methods
The results of training PINQI and the comparison methods on the synthetic dataset and application on the test set are shown in Figure 5.Our proposed PINQI method achieves at the highest acceleration factor we investigated, 8-fold, a normalized root mean squared error (nRMSE) of the T 1 maps of less than 0.10 whereas the lowest nRMSE achieved by one of the comparison methods is 0.13 (DeepT1 and the similar performing CoRRECT) and both MANTIS and DOPAMINE have nRMSE exceeding 0.2.The mean absolute error (MAE) of T 1 is again lowest for PINQI (0.05) compared to 0.07, 0.15, and 0.13 for DeepT1, MANTIS, and DOPAMINE, respectively.In terms of the Structural Similarity Index (SSIM) [58] (calculated with 7×7 pixel uniform windows, only considering windows fully inside the brain), PINQI achieves the best, i.e. highest, result of 0.939 compared to the other methods.Examples of parameter maps for T 1 as well as the magnitude of M 0 obtained for a random slice of the test dataset are shown in Figure 4. Here, only PINQI was able to resolve the fine details.At lower undersampling factors, PINQI also achieves superior results compared to all comparison methods.For example, at 4-fold undersampling it yields 0.073/0.042/0.961 in terms of nRMSE/MAE/SSIM compared to the second best results of 0.091/0.054/0.938.

B. Ablation Study
The performance results obtained from the ablations are summarized in terms of nRMSE and MAE of T 1 in Figure 6.The greatest increase in both MAE and nRMSE compared to PINQI as proposed is observed if, instead of the proposed differentiable optimization layer, gradient descent steps with a learned step size is used within the network (Figure 6, e)).This showcases the importance of the proposed optimization layer for PINQI.The next biggest increases are if either the UNets are only applied once to obtain p reg and y reg independent of the iteration of the unrolled scheme (Figure 6, f)), the iteration number is drastically decreased to T = 1 (Figure 6, g)), or the parameter regularization network is completely removed (Figure 6, b)).These observations highlight the importance of learned regularization.The removal of the L-BFGS solver in the non-linear subproblem 13, corresponding to setting λ i p = ∞, resulting in p i = p i θ , also severely degrades the performance.In this ablation, the composition of parameter estimation UNet and q can be interpreted as a learned proximal mapping, as used in other unrolled reconstruction methods [28].We observed further degradation by full removal of the explicit knowledge about the signal function q, i.e.only solving the linear image reconstruction problem with Fourier-space data-consistency and using a UNet for parameter prediction, similar as in other recent methods [12], [13].
The comparison between a learned reconstruction of qualitative images with the separate parameter regression as a post-processing step (Figure 6, h)), and the inclusion of the regression in the network with end-to-end training utilizing the proposed differentiable optimization layer (Figure 6, i)), shows a reduction of the mean nRMSE of the T 1 -maps of >20%.

C. Phantom and In-Vivo Application
We applied the final trained network to the data acquired from the physical phantom and used the reference measurement to obtain the mean T 1 for each region-of-interest (ROI) for comparison.The RMS deviation between our proposed method and these reference values was 35 ms, the full results for all nine ROIs are shown in Table I.
To qualitatively demonstrate the applicability of the proposed method to real, unseen in-vivo measurements, we present the results of the volunteer study in Figure 7.Even though the acquisition was severely undersampled, the network predicts T 1 -maps in agreement with the fully-sampled references and only minor artifacts remain.

V. DISCUSSION
We have demonstrated the superiority of PINQI for saturation recovery T 1 -mapping compared to four current stateof-the-art qMRI reconstruction techniques, as evidenced by the lower T 1 -errors on the synthetic test dataset (Figure 5).Given that only T 1 holds clinical significance among the parameters obtainable in a saturation recovery sequence, and considering that most of the compared methods do not provide access to the phase of M 0 , our quantitative comparison here was primarily focused on T 1 .The methods used for comparison, MANTIS, DeepT1, CoRRECT, and DOPAMINE all have their own characteristics and limitations.MANTIS utilizes a fully learned mapping from magnitude qualitative images with artifacts to quantitative parameter maps, but only incorporates the physical model during training.DeepT1 and CoRRECT enforce data-consistency between qualitative images and recorded k-space data, but do not enforce consistency between the final predicted parameter maps and recorded  data at inference time.DOPAMINE explicitly uses knowledge about the non-linear signal model during inference, but does not employ image-space regularization and can use only shallow CNNs for parameter-space regularization due to the high number of iterations necessary.In contrast, PINQI incorporates data-consistency for the non-linear signal model through nonlinear optimization layers, allowing for the inclusion of prior knowledge about the underlying physics at each iteration of the unrolled reconstruction.Additionally, PINQI makes use of both image-and parameter-space regularization and explicitly formulates the influence of all UNets as a learned regularizer.In total, these distinctions position PINQI as a unique method in qMRI reconstruction.Note that for a fair comparison, we reimplemented and adjusted the comparison methods to our dataset and signal model, as well as optimized hyperparameters rather than relying on the choices of the respective works.We were unable to use DOPAMINE as published, as training was highly unstable.We speculate, that the combination of an iteration-dependent but data-independent stepsize factor and a first-order gradient step without any thresholds was causing high susceptibility to even minor outliers.We had to modify the training to overcome this issue.During training with a batch size of 8, our PyTorch implementation of PINQI utilizes approximately 36 GB of memory in our specific configuration.This is comparable to our implementations of CoRRECT (34 GB), DeepT1 (23 GB), and DOPAMINE (17 GB), but significantly higher than MANTIS (2 GB).It is crucial to note that these figures are highly dependent on the implementation details and the nature of the problem.Therefore, they should be interpreted as rough guidance for estimating memory consumption.
In our ablation study, we highlighted the importance of both image-and parameter-space regularization networks, as well as the utilization of the non-linear optimization layer and the signal model.Removal of any of these major components resulted in severe degradation, as summarized in Figure 6.By employing an unrolling approach and alternating between the two subproblems, we were able to update the predictions used as learned regularizations in each iteration, which proved to be highly beneficial compared to a single application of the UNets.This suggests, that the UNets were able to make use of the gradually restored information within the unrolled algorithm.Moreover, we have demonstrated that the addition of a single non-linear optimization, as a final layer, to an otherwise unchanged learned reconstruction, can significantly reduce the error in the resulting parameter maps compared to performing a separate regression as a post-processing step outside the network (see Figure 6, End-to-End Regression (i) vs. Separate Regression (h)).This approach can be seamlessly integrated into existing reconstruction methods, enabling training with an objective that aligns more closely with the true objective in quantitative MRI.Essentially, it can be viewed as a task-specific loss function, which effectively penalizes reconstruction errors of the intermediate qualitative images that contribute to poor final regression results.Unlike a simple L p loss on the qualitative intermediate images, this approach allows for more efficient learning by focusing on important features [59], [60].We emphasize the potential of adding a single differentiable regression layer to both existing and future quantitative imaging methods as a straightforward means of incorporating knowledge about the signal model.As the implicit differentiation-based numerical gradient calculation only provides an approximation in a neighborhood of the possibly local, minimum obtained by the inner optimization, the usage may necessitate additional regularization, as employed in PINQI.Additionally, gradient noise might be an issue for certain problems.
Finally, we demonstrated that PINQI, even when trained solely on synthetic data, can be successfully applied to real data.This was confirmed both quantitatively, with low deviations from the reference measurements of the physical phantom in Table I, and qualitatively, through the reconstruction of nearly artifact-free maps from undersampled data in Figure 7.In contrast, the comparison methods cannot faithfully reconstruct the parameter maps in this setting with significantly higher undersampling than in the original publications and without specifically acquired in-vivo training data.In our synthetic evaluation, the data-consistent reconstruction methods were able to utilize the ground truth forward model, specifically q and the true sensitivity maps used the acquisition operator A, whereas in the tests on real scanner data, these are not available.Here, we also acknowledge the possibility of self-supervised fine-tuning of our method for adaptation from the assumed forward model during training to the partially unknown and potentially different true forward model during testing, or a shifted distribution of the parameter maps [61]- [64].In particular, the sensitivity maps estimated from undersampled data could be enhanced by an additional subnetwork [9].However, for the sake of simplicity and comparability, we refrained from implementing such modules in our network or in any of the comparison methods.Nevertheless, Equation (18), obtained through implicit differentiation, would facilitate an efficient extension.
Although the formulation of an optimization objective with learned regularization provides some interpretability compared to a direct learned mapping of the quantitative parameters, as in, for example, CoRRECT, MANTIS, or DeepT1, the inner workings of PINQI remain largely a black-box.The unrolling, the repeated application of the subnetworks, and the inner optimizers increase the memory requirements during training as well as the computational requirements at inference time of PINQI compared to the comparison methods, potentially limiting its application in settings without suitable accelerator hardware or in 3D acquisitions.Notably, in our synthetic setting, we demonstrated that the results of the proposed PINQI at an acceleration factor of 8 are superior to those obtained by all other comparison methods at 4-fold acceleration.Despite the increased reconstruction time (2.3 s for 8 slices, using an Nvidia A6000) compared to, for example, DeepT1 (0.6 s) and CoRRECT (0.3 s), opting for faster acquisition at lower errors can be an acceptable trade-off.Finally, PINQI requires a known, twice-differentiable closed-form signal model, which may limit its direct application to techniques such as magnetic resonance fingerprinting.
While we have only demonstrated the application of PINQI to Cartesian undersampled T 1 -mapping, the splitting of PINQI can be trivially adapted to many inverse problems of the form described by Equation ( 3) by adjusting the non-linear signal model and the linear acquisition operator.Thus, while its effectiveness for other signal models remains to be further explored, PINQI represents a general physics-informed network for many quantitative imaging inverse problems.

VI. CONCLUSION
Our proposed method, PINQI, presents a novel approach to quantitative image reconstruction by combining unrolled optimization, differentiable optimization layers, and learned regularization to fully utilize prior knowledge regarding the underlying physics.On a representative qMRI task, T 1 mapping of the brain, we demonstrated superiority compared to three established data-driven reconstruction methods on a synthetic test dataset.Finally, we showed that our model, while trained on synthetic data alone, is transferable to in-vivo data.
Figure1: The problem to be solved in quantitative MRI is obtaining maps of physical parameters from undersampled measurements in Fourier space.Most previous methods consider the two steps of A) reconstructing artifact-free qualitative images and B) obtaining the parameter maps as two disjoint and independent steps (see text for details).We propose a novel end-to-end method making use of prior knowledge about the physics of data acquisition and learned regularization.

Nci
|c i | 2 = 1.A qMRI reconstruction aims to obtain the tissue parameters p from the acquired data k.For uncorrelated Gaussian noise e, the maximum likelihood estimate is p * = arg min p ∥Aq (p) − k∥ 2 .

Figure 2 :
Figure 2: Schematic of the unrolled physics-informed network to solve Equation (10) by quadratic splitting as used by our proposed PINQI.We alternate between solving two subproblems, Problem 1 is a linear data-consistency problem and solved by a differentiable conjugategradient block.Subproblem 2 is solved by a differentiable non-linear optimization block.Y θ denotes a residual UNet operating on qualitative images, P θ the parameter prediction UNet.The predictions of these subnetworks are used as regularizers with (learnable) strength λy and λp, respectively.Consistency between both subproblems is relaxed to a quadratic penalty weighted by λq.For more details regarding the formulations of the two subproblems, see the main text.

Figure 3 :
Figure 3: Proposed non-linear optimization layer, finding p * = arg min p F(p) with an off-the-shelf solver while allowing backpropagation of the gradients (red) to the regularization parameters, λ and preg, and data y.

Figure 4 :
Figure 4: Examplary results of the different methods for one simulated measurement from the test dataset at 8-fold acceleration.More details regarding the different methods are provided in the text.For each method, the magnitude of M0 and deviation from the ground truth are shown in the top two rows.The calculated T1 and deviation from ground truth are shown in the bottom two rows.
b) No Image-Space Regularization: We set λ y = 0 and remove Y θ from PINQI to investigate the importance of the learned regularization in image-space while retaining the regularization in the non-linear subproblem.The training setup and all other parameters remain the same.c) No Parameter-Space Regularization: We set λ p = 0 and remove P θ from PINQI, effectively removing the learned regularization in parameter-space.Instead, we only perform a non-regularized regression on the signal model in the nonlinear subproblem 2. All other parameters remain unchanged.d) No Non-Linear Solver:

Figure 6 :Figure 7 :
Figure 6: Results of the ablation study for T1, highlighting the importance of iteratively applying both regularizing UNets as well the inclusion of the signal function via a non-linear regression layer.Note, here Nr = 8 and the saturation times τ differ from those used in Figure 5.For further details, see the full descriptions of the experiments in Section III-D with corresponding captions.

Table I :
T1-values (in s) and standard deviations over ROIs obtained for the physical phantom by the fully-sampled reference method with 18 saturations delays compared to PINQI with 8-fold undersampling and 5 delays.
[14]arison of our proposed PINQI with four different state-of-the-art learned qMRI reconstruction methods[11]-[14]in terms of nRMSE, MAE, and SSIM of T1 for each sample of the test set at 4-fold (darkest, bottom), 6-fold and 8-fold (brightest, top) undersampling.The mean values over all samples at 4-fold/6-fold/8-fold undersampling are provided as labels.PINQI improves upon all of the comparison methods in all three metrics.