Joint $\text{B}_{0}$ and Image Reconstruction in Low-Field MRI by Physics-Informed Deep-Learning

<italic>Objective:</italic> We present a model-based image reconstruction approach based on unrolled neural networks which corrects for image distortion and noise in low-field (<inline-formula><tex-math notation="LaTeX">$B_{0} \sim$</tex-math></inline-formula> 50 mT) MRI. <italic>Methods:</italic> Utilising knowledge about the underlying physics, a novel network architecture (SH-Net) is introduced which involves the estimation of spherical harmonic coefficients to guarantee a spatially smooth field map estimate. The SH-Net is integrated in an end-to-end trainable model which jointly estimates the <inline-formula><tex-math notation="LaTeX">$B_{0}$</tex-math></inline-formula>-field map as well as the image. Experiments were conducted on retrospectively simulated low-field data of human knees. <italic>Results:</italic> We compare our model to different model-based approaches at distinct noise levels and various <inline-formula><tex-math notation="LaTeX">$B_{0}$</tex-math></inline-formula>-field distributions. Our results show that our physics-informed neural network approach outperforms the purely model-based methods by improving the PSNR up to 11.7% and the RMSE up to 86.3%. <italic>Conclusion:</italic> Our end-to-end trained model-based approach outperforms existing methods in reconstructing image and <inline-formula><tex-math notation="LaTeX">$B_{0}$</tex-math></inline-formula>-field maps in the low-field regime. <italic>Significance:</italic> low-field MRI is becoming increasingly more popular as it enables access to MR in challenging situations such as intensive care units or resource poor areas. Our method allows for fast and accurate image reconstruction in such low-field imaging with <inline-formula><tex-math notation="LaTeX">$B_{0}$</tex-math></inline-formula>-inhomogeneity compensation under a wide range of various environmental conditions.


I. INTRODUCTION
U LTRALOW-FIELD magnetic resonance imaging (MRI)   recently experienced a surge in popularity.Portable, potentially low-cost scanners are easier to operate and maintain, making them particular interesting for use in various clinical and research settings [1], [2], [3], [4], [5], [6], [7].First scanners got FDA and MDR approval for the examinations of the brain [8], [9], while other system focus on portable imaging of the extremities [10], [11].Due to the low field strength (B 0 ∼ 50 mT) scanners can be moved into emergency units and operating rooms during surgery.First evaluations showed potential clinical applications in stroke diagnosis [12].A complete low-field scanner can be constructed for a material cost of about 20.000 € [13] and first open-source prototypes have already been constructed [14], [15], [16].This will improve access to this critical diagnostic modality globally, which is currently not sufficiently provided for half of the world population [17].
The main challenges of portable low-field MRI for diagnostic imaging include 1) Low signal-to-noise ratio (SNR) [17], [18] due to the reduced magnetic field strengths and 2) Strong B 0 -field inhomogeneities caused by magnet design, material imperfections and manufacturing tolerances [1], [19], [20].Improvement of the SNR can be achieved by averaging of repeated acquisitions.However, this approach leads to excessively long scan duration, which hinders clinical application.Geometric distortions in the image caused by strong field inhomogeneities can be corrected, but a B 0 -map describing the field inhomogeneities must be known.In comparison to standard clinical scanner with a field strength of 3 T, the relative B 0 -field inhomogenenities observed in low-field MRI are approximately 1000 times larger [21] and the SNR is roughly 15 times lower [22].Consequently, existing B 0 -field map estimation techniques developed for high-field MRI are not directly suitable for low-field applications.
A method targeting the joint estimation of image and field map at low field strength of about 50 mT was proposed in [19].Models for B 0 -field reconstruction based on the relative phase difference of two acquisitions and B 0 -informed image reconstruction have been derived.Image and field map estimation is carried out using a model-based, alternating optimization scheme.Despite promising results, the reconstruction is, however, computationally expensive.
Recently methods utilized machine learning to estimate the B 0 field map by using neural networks (NNs) [23], [24].In [23] the authors proposed a residual convolutional NN that estimates an undistorted complex image from a single distorted, complex-valued input image.The authors of [24] used a very similar network architecture but estimate the field map from the complex image data which is used in a subsequent model-based reconstruction to correct for the distortions.Although none of the methods were applied in the low-field regime, it has already been shown that strong B 0 -field inhomogeneities expected in low-field MRI can be estimated by NNs [25].
Apart from B 0 -distortion correction, NNs showed promising results in denoising MRI acquisitions [26], [27].The authors of [28] presented an end-to-end deep-learning model to reconstruct images with enhanced SNR from low-field k-space data.In [29] a dual-acquisition 3D super-resolution method for low-field MRI is proposed which suppresses noise and artifacts.A more general solution applicable to various medical imaging modalities is suggested in [30], in which the authors employ a recurrent residual U-Net for denoising.Nevertheless, all these approaches rely exclusively on the suggested deep-learning model, without incorporating any information regarding the underlying physics.Unrolled optimization has proven as a robust, flexible, and powerful deep-learning technique in which the NN is supposed to learn the regularization term of an image reconstruction problem [31], [32].
We present a novel approach wherein we propose an end-to-end trainable model-based reconstruction operator which describes the physics of the data acquisition process including B 0 -inhomogeneties integrated with physics-informed NNs.Our approach combines model-driven and data-driven reconstruction on low-field MRI data for the first time.The field inhomogeneties are parameterised with spherical harmonic (SH) coefficients to guarantee a spatially smooth field map estimate.This approach yields images from low-field MR raw data which are both free of distortion artifacts and have high SNR.

II. PROBLEM FORMULATION
The forward problem can be described by where the forward operator E : C N × R N → C N maps the (unknown) complex-valued ground-truth image x ∈ C N and the (unknown) real-valued static magnetic field deviation ω ∈ R N to a set of measurements y in k-space.The complex-valued Gaussian noise is denoted by ε.We restrict ourselves to equispaced, two-dimensional single-coil Cartesian acquisitions, where the number N of acquired k-space sample points equals the number of image voxels.The values of the discrete vector ω are given in radians, which equals the frequency deviation in Hertz multiplied by 2π.We consider spin echo acquisitions, where the echos are shifted by t shift .More precisely, the time-dependent, discrete signal equation in (1) considering field inhomogeneities ω and image x for a discrete-time point t = [t 1 , t 2 , . .., t N ] T ∈ R N is given by x n e −iω n (t m +t shift ) e −ik m r n + ε m , m = 1, . . ., N.
(2) Hereby, x n = x(r n ) and ω n = ω(r n ) are dependent on the spatial voxel location r ∈ R N and the k-space trajectory k m := k(t m ), while the acquired signal y ∈ C N , with y m := y(t m ) is dependent on the sample time points vector t.

A. Forward Operator
Due to the dependency on space and time, the operator E in (1) becomes computationally expensive to evaluate.To overcome this problem, we introduce an approximation of the term e −iω n t m based on time segmentation as proposed in [33] which is computationally more favorable.The approximation is given by This reduces the number of Fourier transforms from M to L. The acquisition time is divided into uniform time segments.Therefore, we define the vector t ∈ R L with t l = l The approximation is obtained by a linear combination with the coefficients c l,m , which are introduced later in this subsection.For a fixed x, we define the function where 1 L denotes an L-dimensional vector of ones and the exponential function is applied component-wise.We denote the Kronecker product by "⊗" and the Hadamard product by "•".The operator diag : where {e j } N j=1 denotes the standard basis of the space R N , transforms a vector into a diagonal matrix.
For fixed ω, we further define the linear operator B ω by the matrix B ω = [diag(e −iωt 1 ), . . ., diag(e −iωt L )] T ∈ C LN ×N .Note that the linear operator B ω applied to x equals the evaluation of f x at ω, i.e. we have that f x (ω) = B ω .This notational aspect serves to highlight the fact that the function f x is non-linear with respect to the field-map ω, but B ω defines a linear operator with respect to the image x.The transformation from image to Fourier space is achieved by the construction of a block-diagonal Fourier operator F L := I L ⊗ F, where I L denotes the L × L-identity matrix.
In (2) the field inhomogeneities ω cause a phase accumulation.In the impulse response function, this phase term can be neglected, if the field map varies slowly enough.The number of segments L is chosen such that the time segments are sufficiently small and this prerequisite is met [34], [35].
To compute the coefficients c l,m in (3), we define the functions γ : Both matrices contain the exponential expression from (3) whereby Γ ω contains all time points and Γ L ω the segmented time points.The approximation in (3) for all n = 1, . . ., N and m = 1, . . ., L can be compactly written in the form of a system Γ L ω C ω = Γ ω , whose solution is calculated as where (Γ L ω ) † is the Moore-Penrose inverse given by Matrix C ω from (5), contains L row vectors with N coefficient each, i.e.C ω = [c 1 , . .., c L ] T with c l ∈ C N , l = 1, . . ., L. To apply the coefficients to the Fourier transformed vector of L images, we define Cω = [diag(c 1 ), . .., diag(c L )] ∈ C N ×LN .Note again, that the time-segmented operator E L (x, ω) depends on two variables.For fixed ω we define E L ω : C N → C N and for fixed x we define respectively, where g(ω) = Cω .Note that E L ω is linear with respect to x, while E L x is non-linear with respect to ω due to the functions f x (ω) and g(ω).The operators from (7a) and (7b) will be used in the formulation of two sub-problems which we solve with the proposed reconstruction algorithm.The proposed reconstruction operator will be used in an end-to-end training what motivates the detailed notation.

III. METHODS
We use the forward operator defined in the previous Section II-A to formulate the reconstruction problem.Then we introduce the NNs used for regularization, and finally the solution strategies used within the unrolled optimization.

A. Reconstruction Problem
We formulate the reconstruction problem as a joint minimization problem over the field map ω and the image x by where we use two different terms to regularize the problem with respect to the variables ω and x.The regularization of the image is achieved by 2 , i.e. by imposing the sought image x to be close to an NN-estimate x NN .The regularization of the field map ω is analogously given by R ω (9) directly is challenging.Thus, a variable splitting approach is applied to obtain two simpler sub-problems, which are solved alternately.
1) Sub-Problem A: For fixed ω, the resulting sub-problem reads as where E L ω was defined in (7a).Because E L ω is linear with respect to x, due to convexity, the unique solution of (10) can be obtained by solving the system H ω x = b ω with with a conjuate gradient (CG) method.
2) Sub-Problem B: For fixed x the corresponding subproblem reads as Similar as in [36], we consider a first-order Taylor approximation to linearly approximate E L x .In this context, J E L x | ω=ω 0 denotes the Jacobian of the forward operator E L x , evaluated at the point ω = ω 0 .Thus, the linear approximation of E L x at a linearization point ω 0 is given by By (13), we obtain the linearized sub-problem Similar as for sub-problem (10), for solving for ω, we have to solve a linear system G x ω = d x with which can, again, be solved by the CG-method.

B. Neural Network Regularization Terms
We propose two different NN architectures, which provide estimates for image and field map used to regularize the problem (9).

1) Spherical Harmonic Network (SH-Net):
For the field map estimation, we propose a new NN architecture named spherical-harmonic network (SH-Net), following a similar approach described in [24].The off-resonance-network or Of-fResNet (ORN) described in [24] and initially proposed in [23] estimates a B 0 -field map from a single complex-valued distorted image.The estimated field map is interpreted as a non-stationary convolution kernel which is learned by the network.Instead of learning the entire mapping from the complex-valued image to the field map-distribution, we propose a physically motivated NN structure.
A B 0 -field map can be expressed by a linear combination of spherical harmonic coefficients and basis functions.Using the SH-approach automatically enforces spatial smoothness in contrast to directly estimate the field distribution, which might be corrupted by noise.In general, the basis functions Y η ι (θ, φ) are given by the degree ι and order η and depend on the polar angle θ and the azimuthal angle φ.In this work we consider the two-dimensional case.Thus the azimuthal angle is always at 90 • , and the basis functions simplify to Y η ι (θ).Dependent on the radius r, the field distribution can be described by a weighted sum of real spherical harmonic functions and coefficients in the polar coordinate system.The field distribution of a field map ω in polar coordinates dependent on SH coefficients ρ η ι can be defined by Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Fig. 1.SH-Net to estimate a field map prior ω NN from a complex-valued B 0 -uninformed fast Fourier transform (FFT) reconstruction.The network is built of an encoding path with two convolutional layers per stage and a constant number of 16 filters.The last layer is a fully connected layer that maps the values to 25 spherical harmonic coefficients, which are used to calculate a spatially smooth field map by (19).
Equation ( 18) can be compactly written in vectorized form as where Ψ ι,η denotes the application of the set of spherical basis functions of order ι and degree η.Our NN involves an encoder path to estimate spherical harmonic coefficients of the underlying field distribution from the complex-valued input image.Fig. 1 illustrates the architecture of our network.Following empirical evaluation we decided to use a fixed number of 16 feature channels for each convolutional layer.At the last stage, the features are flattened to a vector to which a fully connected layer is applied to estimate the SH coefficients.The network is constructed to generate coefficients up to degree 5, which is enough to sufficiently describe the field map [20].The input image is given by an initial B 0 -uninformed image reconstruction, i.e. by x 0 = F H y. We denote the network which calculates the B 0 -field map from the estimated coefficients using (19) by ω NN = SH Ω (x 0 ).The SH-Net is built on the physical implication that the field map is always smooth -which is enforced by estimating the SH coefficients instead of the field map directly.By this convention, we prevent local structures which could be introduced by using other architectures such as the ORN used in [24].
2) Denoising Network: Because of the low magnetic field used for the data acquisition, the k-space data y is highly corrupted by noise.Thus, in (9), the NN-image prior x NN should correspond to an estimate of the noise-free and un-distorted image which corresponds to the data y.For denoising, we use a NN that operates on the image domain and estimates a denoised image x NN .Assuming to already have access to a good estimate of the field map ω, e.g. by ω NN , we can define the operator E H ω NN to reconstruct an image and correct for geometric distortion.In the image domain, we subsequently apply a U-Net [30], [37], [38], which consists of an encoder and a decoder path, for denoising.Real and imaginary parts of the complex-valued image are considered in separate channels.The estimated image is given by

C. Proposed Unrolled Reconstruction Network
We approach the solution of problem ( 9) by solving the sub-problems ( 10) and ( 15) in an alternating pattern for a fixed number of iterations T .We construct an end-to-end trainable NN which consists of a sequence of T alternations of the blocks required to minimize the previously described sub-problems and which we denote by Rec T Θ,Ω .More precisely, Rec T Θ,Ω is given by where z T is the T -th element of the sequence (z t ) T t=1 defined by x t+1 := arg min 2 , (21) where x 1 := x NN , w 1 := ω NN .Thus, Rec T Θ,Ω maps the measured k-space data to an estimate of the ground truth image and an estimate of the ground truth field map.The NNs are only applied once before unrolling the algorithm to obtain the priors.Algorithm 1 gives an overview of the reconstruction method including both sub-problems.
The strategy is to always define operators E L ω and E L x with the currently best available solutions for ω and x, respectively.This means, that at the very first iteration, the NN priors ω NN and x NN are used to define the operators E ω NN and E x NN .
Fig. 2, illustrates the optimization process.The B 0uninformed noisy image reconstruction by FFT is used to estimate a spatially smooth field map by the proposed SH-Net.With the field map estimate, a B 0 -informed image reconstruction is performed, yielding a noisy image corrected for geometric distortions.The second NN is applied to obtain a denoised version of the reconstructed image.The final result after T iterations consists of an optimized field map and an optimized reconstructed, denoised image, corrected for geometric distortions.
For training the entire physics-informed reconstruction network in an end-to-end manner, we use a dataset by D = {(y, ω, x)|y = E(x, ω) + ε} and minimize the loss Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Require:y

. , T − 1 do
Field map related sub-problem: where α > 0 is a weighting factor which is chosen such that both errors are well-balanced.The unrolled network architecture and the training was implemented with PyTorch.The gradients are calculated using the automatic differentiation engine of PyTorch.

IV. COMPUTATIONAL EXPERIMENTS
We rely on supervised learning and therefore simulated lowfield MRI data.In this section we first describe the simulation process to obtain the data.Afterwards, we focus on the pretraining of the respective NNs, different cases for the optimization of our proposed approach and conventional methods for comparison.

A. Simulation of Training Data
For supervised learning of NNs, a relatively large amount of data is usually required for training.Since no low-field in-vivo data is available so far, synthetic low-field MRI data was simulated based on three-dimensional single-coil spin-echo acquisitions of the knee at 3 T from the FastMRI dataset [39].By selecting a single two-dimensional slice from each sample, we obtained 973 different complex-valued samples.SNR and B 0 -field inhomogeneities were estimated based on measured values from current low-field Halbach systems and the results should in principle be transferable to other low-field magnetic resonance (MR) systems [1], [2], [19].The retrospective simulation of low-field MRI data from ground truth samples was implemented according to (1).
The number of images for training, validation and testing was 1260 (70%), 360 (20%) and 180 (10%), respectively.To obtain a variety of smooth random field maps, we generated spherical harmonic coefficients up to the order of 10.We used a higher order in the simulation compared to our network defined in Section III-B to mimic the continuous character of real field distributions.The entries of the coefficient-vector in (19) were set by exponentially decreasing the strength of randomly chosen coefficients such that the impact of low frequencies dominates.With (19), we could then generate ground truth field maps from the randomized spherical harmonic coefficients.The mean and standard deviation of the correlation coefficient between the field maps showed no correlation or a weak correlation, from which it can be concluded that the field maps are sufficiently diverse.
Given ground truth field maps ω and images x, we retrospectively computed the k-space data y according to (1).Note that in the retrospective simulation we did not use the approximation of the forward operator.The latter is only used to accelerate the reconstruction.This also counteracts the inverse crime of using the identical operator in simulation and reconstruction.During training, the standard deviation of the Gaussian noise σ y was randomly chosen from the interval [0.2, 0.6].For testing and comparison, we then used specific values σ y ∈ {0.2, 0.4, 0.6}.

B. Pre-Training of Neural Networks
Because end-to-end training of the proposed unrolled network may take several days, depending on the number of optimization steps, we used a pre-training strategy.Before employing the NN and finetuning them by training the unrolled network that resembles Algorithm 1, we trained both networks SH Ω and u Θ separately for the respective tasks.Once the pre-training was carried out, the effects of the end-to-end training, which further fine-tunes SH Ω and u Θ as well as λ ω and λ x , could be investigated.The separated pre-training also helped to identify overfitting and generalizability during fine-tuning.

1) Pre-Training the SH-Net:
During pre-training of SH Ω , the mean squared error (MSE) of the network estimate and the ground truth field map was minimized, i.e. we used the loss function where ω NN = SH Ω (F H y). The network was defined in a similar fashion as shown in Fig. 1.We used 5 encoding stages, 2 convolutional layers per stage with a kernel size of 3, and a constant filter size of 16. 25 SH coefficients (maximum order of 4) required a total of 858 hyperparameters to be optimized.By the described parameterization we prevent overfitting and achieve good generalizability.The pre-training was performed until full convergence using 400 epochs with a batch size of 1 and the ADAM optimizer with a learning rate of 10 −4 .The training duration was 44 minutes and 1 s.
2) Pre-Training the Denoising U-Net: Denoising of the image was performed by a U-Net architecture with residual connections [30].The architecture was defined by 4 encoding steps, each consisting of 2 convolutional layers with a kernel size of 3, similar to the SH-Net encoder.The initial filter size was 32.For pre-training the U-Net, we used a dataset D Pre u := {(x 0 , x) | x 0 = F H y, y = Fx + ε} and the loss function Thereby, we could guarantee, that the trainable parameters were purely optimized for the task of denoising rather than a mixture between denoising and a correction for geometric distortions which should entirely be compensated for by the SH-Net.Again the training was performed over 400 epochs using a batch size of 1 and the ADAM optimizer with a learning rate of 10 −4 .The training duration of the U-Net was 21 minutes and 32 seconds.

C. Experimental Design
Different experiments with a varying numbers of alternations were performed to investigate the impact of the alternating optimization on the reconstruction result.The different cases were distinguished by T , holding the number of alternations between image and field map optimization steps.In addition we compared the case, where only sub-problem A is solved according to (22).
The following cases were distinguished.SP-A Only the image sub-problem is minimized, and the field map is taken from the neural network.T = 1 Each sub-problem is minimized once.T = 3 Alternation between sub-problem A and B, three times.For all CG-blocks which increase data-consistency of the outputs of the respective NN-outputs, we used 5 CG iterations to solve the two sub-problems (10) and (15).With the alternating optimization, the hyperparameters of the pre-trained NNs are further optimized using the ADAM optimizer.The regularization coefficients λ x and λ ω in the problem formulation (9) were initially chosen from empirical evaluations and are as well optimized during the training process.The learning rate was 10 −6 for the NN hyperparameters and 10 −5 for the regularization parameters.The training duration over 400 epochs with a batch size of 1 was 3 days 8 hours 3 minutes 54 seconds for SP-A, 4 days 15 hours 33 minus 3 seconds for T = 1 and 7 days 5 hours 37 minutes 14 seconds for T = 3.

D. Comparison Methods
The proposed SH-Net was compared to the ORN published by [23], [24].The proposed end-to-end reconstruction method was compared to two slightly different approaches.The first one jointly estimates the field map and image (joint-model) [36], [40], [41], [42], [43].The second method of comparison was developed specifically for low-field MRI and involves two different models, one for the image and one for the field map (bi-model) [19].Both methods use the phase difference map as a starting point, which is obtained from two acquisitions with different echo times.
1) Field Map Estimation Methods: Spatially B 0 -field inhomogeneities cause relative phase differences dependent on the echo time.A common way to reconstruct the B 0 -field map is to acquire multiple echos and calculate the B 0 -map from the relative phase difference map.For this purpose, we simulated additional k-space data with another echo time.
The ORN, published by [23] is a convolutional neural network (CNN) with equally shaped convolutional layers and residual connections which estimates an undistorted image from a single complex-valued distorted image.In [24], the approach was adapted to estimate a field map from the initial distorted reconstruction by the same architecture.In both cases, the CNN architecture consists of 8 convolutional layers with 3 residual connections, a kernel size of 5 × 5 and 128 filters.We implemented the CNN as proposed by the authors and compare it to our SH-Net from section III-B.The training was carried out as described in Section IV-B by replacing the SH-Net SH Ω with the ORN ORN Ξ in loss-function (23).

2) Methods for Joint Field Map and Image Reconstruction:
The joint-model approach is based on the problem formulation from (8).In [36], the quadratic regularization term R = ∇x 2  2 is used instead of the NNs-based quadratic penalty terms.Although the method has never been applied to the domain of low-field MRI, we used it for comparison, since the optimization scheme is similar to our proposed Algorithm 1.Since the use of the 2 -norm tends to smooth image details, we replace it by the 1 -norm, yielding a total variation (TV)-minimization approach which has been exhaustively used for MRI reconstruction, see e.g.[44] and solve the sub-problem by the primal dual hybrid gradient (PDHG) method [45].As starting value ω 0 , we consider the relative phase difference map, and the image starting value is given by x 0 := F H y. We used T = 10 alternations between the sub-problems, 100 iterations of the CG method and a tolerance value of 10e −5 for field map optimization and 128 iterations of the PDHG algorithm for image optimization.To ensure the best choice of the regularization parameters λ ω and λ x , we performed a grid search for each sample from the test dataset by evaluating the MSE with the ground truth.The range was chosen to cover a sufficient search space for each parameter.In the end, we consider the parameter combination λ ω and λ x for which the MSE with respect to the image is the smallest.This grid search is not applicable in real applications, however it serves the purpose to compare the best regularization parameters for this method knowing the ground truth.The bi-model approach uses two separate models to optimize field map and image [19].Despite the slightly different mathematical foundation, the comparison is useful as, in contrast to the joint-model, this method was specifically developed for heavily distorted low-field MRI.The initial field map is calculated based on the relative phase difference of two acquisitions and a Tikhonov regularization is added to obtain spatially smooth field maps.A maximum of 100 iterations and a tolerance value of 10 −5 for the CG method is used.
In contrast to the previously introduced methods, the full forward operator E is utilized instead of an approximation.Also this approach reconstructs both, y 1 and y 2 with E TE 1 and E TE 2 , respectively.The resulting images x 1 and x 2 , i.e. their phases, are then used to update the field map.The problem with respect to the image is solved by the Split-Bregman algorithm with 2 outer iterations.Within each step of the Split-Bregman algorithm, the CG method is applied with a maximum of 100 iterations and a tolerance value of 10 −5 .Through empirical evaluations, we found that 3 alternations, worked best for the optimization of the simulated dataset.A grid search is performed using the ground truth data to find the regularization parameter combination which yields the smallest image MSE.

E. Evaluation and Metrics
In order to generate distinct noise levels, the test data was simulated according to (1) using Gaussian noise with different standard deviations σ y ∈ {0.2, 0.4, 0.6}.The field map obtained from the relative phase difference of two measurements with different echo times (echo time difference of 20 ms) was compared to the field map estimates from our proposed SH-Net and the ORN [24].For assessing the obtained field maps, we evaluated the mean absolute error (MAE).In addition to the entire field map, the comparison has also been carried out in a circular region of interest (RoI) located in the center of the field map to exclude phase noise from outside the imaged object.The radius is set to 32 pixels, such that the RoI contains 20% of all pixels.By data augmentation, we ensured that all sample slices contain meaningful information within this RoI.
For a comparison of the different methods with respect to the reconstructed images from the test data, the root mean squared error (RMSE), the peak signal-to-noise ratio (PSNR) as well as the structural similarity index (SSIM) [46] was evaluated.In contrast to the global measures RMSE and PSNR, the SSIM is calculated based on the comparison of contrast, luminance and structure and is more sensitive to blurring distortions for instance [46].Note that the sample-wise grid search for the regularization parameters of the comparison methods was performed separately for all three noise levels.The RMSE of complexvalued images is calculated separately for the real and imaginary parts, where both contribute equally to the total RMSE.

V. RESULTS
First, in this section we compare the B 0 -field map results from SH-Net and ORN to those from phase difference map.In a next step, we compare different end-to-end trained unrolled optimization cases which utilize the proposed SH-Net.Finally we compare our method to the purely model-based joint approaches.

A. Field Map Estimation
The field map results from SH-Net and ORN, which are obtained from a single echo, are compared to the field map obtained from the relative phase difference of two measurements.Fig. 3 compares the MAE of the results at three distinct noise levels given by the standard deviation of the simulated k-space data, i.e. σ y ∈ {0.2, 0.4, 0.6}.Both for the entire field map and for the RoI, the SH-Net outperforms the other methods by showing lower MAE values.For the entire field map the MAE of the SH-Net is 324.7 ± 145.3 (σ y = 0.2), 324.6 ± 144.1 (σ y = 0.4) and 327.2 ± 142.4 (σ y = 0.6), while the ORN leads to an MAE of 711.3 ± 231.2 (σ y = 0.2), 726.5 ± 240.7 (σ y = 0.4) and 742.8 ± 247.5 (σ y = 0.6).The highest error was found for the phase difference maps with 852.3 ± 326.2 (σ y = 0.2), 945.7 ± 325.0 (σ y = 0.4) and 1107.8 ± 350.4 (σ y = 0.6).

B. Algorithm Unrolling Strategies
To first validate the time-segmented reconstruction operator we evaluated the point spread of the full encoding operator E and the approximation E L by reconstructing the k-space of a Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.single pixel, distorted by a linear increasing field map with a maximum of 7517 ppm, obtained from our simulation pipeline.With the full encoding operator E we obtained 99.99% of the original magnitude value.Using the approximated operator E L with L = 24 in contrast, we obtained 98.91% of the original magnitude.Based on this evaluation, we assess the approximation as sufficiently accurate.
Fig. 4 displays the training performance for various end-toend fine-tuning cases.We compare T = 1, T = 3 and a special case SP-A were only the sub-problem A is considered and the field map is entirely defined by the SH-Net.
Fig. 4 shows the behaviour of the loss-function components defined in (22).Fig. 4(a) shows the weighted combination for β = 10 −6 , while Fig. 4(b) and (c) show the validation error for image and field map, respectively.The validation error is represented by dots, while the training error is represented by crosses for each epoch.Upon comparing the three different cases, the NNs converge at similar levels.With a higher T , the error increases at the beginning of the training.Based on the error courses, the gap between training and validation errors is lowest for the case T = 3.The case SP-A tends to have the lowest error over the combined and image-specific MSE.In contrast to the image MSE, which decreases by -25.12% (SP-A), -22.07%(T = 1) and -28.49% (T = 3), only small improvements of the MSE can be achieved for the field map.For the field map-specific MSE, SP-A slightly increases by +0.32% over the epochs, while T = 1 and T = 3 slightly decrease by -1.26% and -3.98%.Within the RoI, T = 3 has the highest error, while T = 1 has the lowest.Here the validation error decreases relatively by -5.83% (SP-A), -6.71% (T = 1) and -5.15% (T = 3).In terms of performance all approaches work well and Fig. 4(c) shows that the non-linear SP-B is challenging.In contrast to the performance, the behavior with regard to overfitting can be distinguished more clearly and T = 3 shows a considerably lower tendency to overfitting.
As a consequence of the results from Fig. 4, T = 3 was used in the further analysis due to the small gap between training and validation error, comparable performance to T = 1 and the least tendency to overfitting.I for both, image and field map at the distinct noise levels σ y ∈ {0.2, 0.4, 0.6}.Median and IQR instead of mean and standard deviation were chosen, since the error distributions are rather non-normal according to the boxplots.Both values were calculated per metric for the overall test data.

C. Field Map and Image Optimization
The results show, that in comparison to the B 0 -uninformed FFT reconstruction, all other methods perform better with respect to the metrics reported.With respect to noise, the errors of B 0 -uninformed FFT, joint-model and bi-model approach, increase with increasing σ y .Our proposed method in contrast remains rather constant for different values σ y and only a slight drop in SSIM is noticeable.Further, by comparing the whiskers of RMSE in Fig. 5 and the according IQR our proposed method shows fewer and less concise outliers.
The field map MAE values also support this observation.The field maps obtained from phase difference map without further post processing lead to the highest MAE, followed by the joint-model approach.Only the bi-model approach outperforms our method according to the field map MAE.Note that this method, as well as the joint-model approach, rely on the relative phase difference map from two acquisitions compared to a single acquisition used in the SH-Net.Inside the RoI all methods provide comparable performance (Table I).
Fig. 6 compares the image reconstruction of two different samples from the test dataset by B 0 -uninformed FFT, jointmodel, bi-model and our proposed unrolled algorithm with T = 3.We compare two distinct noise levels σ y = 0.3 and σ y = 0.6.All intensities were scaled according to the ground truth reference image.Fig. 6 contains exemplary reconstruction results for the cases σ y = 0.3 and σ y = 0.6.The joint-model and bi-model approaches deliver comparable performances regarding SNR, while the SNR of our method is noticeable lower.Both noise levels and both samples show smooth images with only minor blurring artifacts for our unrolled reconstruction algorithm, while blurring substantially increases for joint-model and bi-model.
The bi-model approach reconstructs the geometric distortions more accurately, but suffers mostly from blurring.The jointmodel approach has the highest remaining error after correction.This is particular noticeable at the margins of the imaged object.The joint-model correction in the object center in turn is much more accurate compared to the bi-model approach.Our method provides an overall relatively low error and a high correction accuracy in the image center.Still, all methods have a residual error, which is relatively constant over the compared noise levels σ y ∈ {0.3, 0.6}.and our method have a low overall error which is approximately on the same scale and corresponds to the metrics reported in Table I.For increasing noise, the error remains constant.The robustness in estimating the field map against increasing noise and the spatial accuracy of the field maps correlate to the observations from Fig. 6.
The bi-model approach required ≈15 s reconstruction time per sample.The joint-model approach provides a reconstruction with computation times of ≈5 s.Our model in turn requires computation times of < 1 s for a sample size of 128 × 128 once it is trained.Note that because neither of the implementations was optimized for fast graphical processing unit (GPU) computation, the comparison should rather give a relative estimate.

VI. DISCUSSION
We proposed the novel SH-Net architecture to estimate the SH coefficients which describe the B 0 -map of a low-field system.The spatial field distribution can then be obtained by a linear combination of SH basis functions and the estimated coefficients which are learned from B 0 -uninformed FFT reconstructions.This guarantees a spatially smooth description of the field map which can be used for distortion correction by a time-segmented approximation of the conjugate phase reconstruction [34], [47], [48].Techniques to jointly reconstruct image and field map were previously limited to 3 T MRI and to relatively homogeneous B 0 field distributions of < 3 ppm [36], [40], [48].Field inhomogeneities used in this work of ∼5000 ppm resemble realistic configurations of low-cost MR systems [1].These systems may experience further fluctuations in B0 field inhomogeneities e.g. through environmental temperature fluctuations that decrease B 0 field homogeneity [20].The proposed technique can potentially address these issues by applying rapid corrections to strong field inhomogeneities.At first, we evaluated our novel SH-Net and compared it against the field maps obtained from relative phase difference and the ORN, proposed by [23], [24].We found that involving SH coefficients provide the most accurate results, which are smooth by nature.The ORN does not incorporate prior knowledge of spatial smoothness and requires a total of 829 185 hyperparameters.The proposed SH-Net ensures realistic field maps and clearly outperforms the ORN while requiring only 858 hyperparameters.
For the end-to-end fine-tuning step, we compared different cases of algorithm unrolling.We considered the image related sub-problem only, one and three alternations between both sub-problems.By comparing the training results, we found that the overall performance is relatively comparable.In general we only applied a relatively small number of optimization steps as the intention is to only guide the NNs.Although the final performance of the compared optimization schemes is comparable, we noticed different initial validation error values and thus differences in the relative performance.The validation error at the beginning of the training is proportional to the number of iterations T used in the approach.Since sub-problem B is not linear, it is not clear, if the performance can be improved.However, using data consistency constraints the solution and can improve the robustness.In contrast to the performance, the overfitting or generalizability can be distinguished more clearly.We accept the small performance shortcuts for improved robustness and chose T = 3.The long training durations we observed could be reduced in the future by manually implementing the backpropagation step instead of using the automatic differentiation engine.
The training was evaluated at epoch 320 as we observed spontaneous error deflections in one of the NNs close to epoch 400 and because the combined validation error already converged at epoch 320.The loss function for the training is composed of a convex combination of image and field map MSE with a constant weighting factor.In the validation errors we could observe different convergence rates.Using a convex combination with static weighting thus might not be the optimal choice for joint end-to-end training and may cause the training instabilities from epoch 320 on.We speculate that a gradient normalization and rotation may help to simultaneously optimize both networks more efficiently [49].
We compared our method with T = 3 alternations between both sub-problems to a joint-model and a bi-model approach.The joint-model approach does not involve SH coefficients to describe the field map and suffers from lacking information in the areas where no signal has been acquired.This explains the low field map error at the object center which is comparable to the results obtainable by the bi-model approach and our method.The bi-model approach also involves SH coefficients and provides field maps which compare to our approach in terms of accuracy.However, the bi-model approach requires reconstruction times in the order of minutes and at least two acquisitions with different echo times.Our approach in contrast provides reconstruction times in the order of milliseconds and only requires a single acquisition.The spatially more extensive description of field distributions allows more accurate image reconstructions of the object boundaries.Remaining errors from imperfect geometric distortion correction are still visible in any of the compared cases.For the purely model-based methods of comparison we determined a reasonable search space for the regularization parameters and always used the best coefficient combination to reconstruct the test data.As we chose the parameter configuration with the lowest image MSE, the results become very blurry at a noise level of σ y = 0.6.Note that this individual empirical search of optimal parameter configuration in not applicable in practice but is used to show that our method outperforms the methods of comparison even when their hyperparameters are chosen with access to the ground-truth data.Independent of the distinct noise level our NNs-based approach clearly outperforms the classical regularization methods and provided sharper images with much better SNR.This might also explain the more substantial improvement in SSIM of figure 4 in comparison to PSNR.
The simulated sample and ground truth pairs which were used for training, validation and testing cannot be generated in practice, but allowed a comparison at distinct SNR levels and diverse field maps.In this study we relied on the retrospective simulation of the k-space data.
In principle, we expect our model to be directly applicable to real data measured with a dedicated low-field MR system.With the simulated data employed in this work, we included a wide range of system characteristics to cover different experimental settings.For future work, we intend to apply and further adapt the method to the measured data acquired by a system which is currently under development.Thereby, self-supervised learning strategies could be used to further fine-tune the model and tailor it to the specific system.

VII. CONCLUSION
We proposed a novel reconstruction method for low-field MRI data involving NNs, fine-tuned by end-to-end unrolled optimization, which allows fast and robust corrections of B 0field inhomogeneities and boosts the SNR of the reconstructed images.The performance was evaluated at distinct noise levels and for various field map scenarios.Our approach relies on a single acquisition which is advantageous for achieving short acquisition times in clinical applications.In a next step, we plan to extend our approach to non-cartesian k-space trajectories and apply the method to measured low-field MRI data, possibly using transfer learning and additional fine-tuning to better match the MR system under consideration.

Joint B 0
and Image Reconstruction in Low-Field MRI by Physics-Informed Deep-Learning David Schote , Lukas Winter , Christoph Kolbitsch , Georg Rose , Oliver Speck , and Andreas Kofler Abstract-Objective: We present a model-based image reconstruction approach based on unrolled neural networks which corrects for image distortion and noise in low-field (B 0 ∼ 50 mT) MRI.Methods: Utilising knowledge about the underlying physics, a novel network architecture (SH-Net) is introduced which involves the estimation of spherical harmonic coefficients to guarantee a spatially smooth field map estimate.The SH-Net is integrated in an end-to-end trainable model which jointly estimates the B 0 -field map as well as the image.Experiments were conducted on retrospectively simulated low-field data of human knees.Results: We compare our model to different model-based approaches at distinct noise levels and various B 0 -field distributions.Our results show that our physics-informed neural network approach outperforms the purely model-based methods by improving the PSNR up to 11.7% and the RMSE up to 86.3%.Conclusion: Our end-to-end trained model-based approach outperforms existing methods in reconstructing image and B 0 -field maps in the low-field regime.Significance: low-field MRI is becoming increasingly more popular as it enables access to MR in challenging situations such as intensive care units or resource poor areas.Our method allows for fast and accurate image reconstruction in such low-field imaging with B 0 -inhomogeneity compensation under a wide range of various environmental conditions.Index Terms-Low-field MRI, physics-informed deep learning, image reconstruction, unrolled optimization, field inhomogeneities.

Fig. 2 .
Fig. 2. The field map prior is obtained by SH Ω from a complex-valued B 0 -uninformed FFT reconstruction.A distortion corrected image is obtained by applying E H ω NN and denoised by u Θ .The final estimates x and ω are obtained by unrolling the algorithm for T iterations.Note that, although the two CNN-blocks SH Ω and u Θ are applied only once, their parameters as well as the respective regularization parameters λ ω and λ x are obtained by training the entire unrolled reconstruction network.

Fig. 3 .
Fig. 3. Comparison of SH-Net and ORN on the test dataset after pretraining.(a) Boxplot comparing the MAE of the field maps obtained from relative phase difference of two measurements with different echo times, our proposed SH-Net and the ORN, whereby the latter are based on a single echo.The comparison is carried out at distinct noise levels σ ∈ {0.2, 0.4, 0.6} for the entire field maps (left) and within the RoI (right).(b) Field maps obtained from our proposed SH-Net and the ORN compared to the phase difference map and the ground truth at = 0:4.

Fig. 5
Fig. 5 shows the image-based comparison of RMSE, PSNR and SSIM for our unrolled algorithm with T = 3 to an B 0uninformed FFT reconstruction, the joint-model and the bimodel.Since we are preliminary interested in the final image quality, we only compare the image metrics of the different methods.Complementary to the results of our method, median and interquartile range (IQR) of RMSE, PSNR, SSIM and MAE are reported in TableIfor both, image and field map at the distinct noise levels σ y ∈ {0.2, 0.4, 0.6}.Median and IQR instead of mean and standard deviation were chosen, since the error distributions are rather non-normal according to the boxplots.Both values were calculated per metric for the overall test data.The results show, that in comparison to the B 0 -uninformed FFT reconstruction, all other methods perform better with respect to the metrics reported.With respect to noise, the errors of B 0 -uninformed FFT, joint-model and bi-model approach, increase with increasing σ y .Our proposed method in contrast remains rather constant for different values σ y and only a slight drop in SSIM is noticeable.Further, by comparing the whiskers of RMSE in Fig.5and the according IQR our proposed method shows fewer and less concise outliers.The field map MAE values also support this observation.The field maps obtained from phase difference map without further post processing lead to the highest MAE, followed by the joint-model approach.Only the bi-model approach outperforms our method according to the field map MAE.Note that this method, as well as the joint-model approach, rely on the relative phase difference map from two acquisitions compared to a single acquisition used in the SH-Net.Inside the RoI all methods provide comparable performance (TableI).Fig.6compares the image reconstruction of two different samples from the test dataset by B 0 -uninformed FFT, jointmodel, bi-model and our proposed unrolled algorithm with T = 3.We compare two distinct noise levels σ y = 0.3 and σ y = 0.6.All intensities were scaled according to the ground truth reference image.Fig.6contains exemplary reconstruction results for the cases σ y = 0.3 and σ y = 0.6.The joint-model and bi-model approaches deliver comparable performances regarding SNR, while the SNR of our method is noticeable lower.Both noise levels and both samples show smooth images with only minor blurring artifacts for our unrolled reconstruction algorithm, while blurring substantially increases for joint-model and bi-model.The bi-model approach reconstructs the geometric distortions more accurately, but suffers mostly from blurring.The jointmodel approach has the highest remaining error after correction.This is particular noticeable at the margins of the imaged object.

Fig. 4 .
Fig. 4. Comparison of different metrics during further fine-tuning of the pre-trained u Θ and SH Ω by unrolling the proposed scheme in Algorithm 1 with SP-A, T = 1 and T = 3 alternations.Note that for SP-A, the field map is entirely defined by the output of the SH-Net and only the image is ensured to be data-consistent by solving problem (10).(a) Comparison of training (dashed) and validation (solid) loss defined by (22) in arbitrary units.(b) MSE of the image in arbitrary units over the epochs for the validation set.(c) MSE of the field map in over the epochs for the validation set.

Fig. 6 (
Fig.6(b)  shows the corresponding field maps of the images shown in Fig.6(a).In comparison to the field map reconstructed by the relative phase difference, all compared methods deliver a smooth description of the spatial field distribution.The field map from the joint-model reconstruction shows a large error towards the edges, but a low error in the center.The bi-model approach

Fig. 6 .
Fig. 6.Comparison of reconstruction results from B 0 -uninformed FFT or phase difference map, joint-model, bi-model and the proposed end-to-end trained unrolled optimization.The first and the third row show the magnitude images or the field maps respectively for noise levels σ y = 0.3 and σ y = 0.6.Rows two and four depict the absolute error map with respect to the ground truth image or the field map, respectively.(a) Reconstructed magnitude images, error in arbitrary units.(b) Reconstructed B 0 -field maps, error in Hz.

TABLE I MEDIAN
AND IQR OF MSE, PSNR, AND SSIM FOR THE RECONSTRUCTED IMAGE x WITH JOINT-MODEL, BI-MODEL, AND UNROLLED OPTIMIZATION T = 3