Robust Image Reconstruction With Misaligned Structural Information

Multi-modality (or multi-channel) imaging is becoming increasingly important and more widely available, e.g. hyperspectral imaging in remote sensing, spectral CT in material sciences as well as multi-contrast MRI and PET-MR in medicine. Research in the last decades resulted in a plethora of mathematical methods to combine data from several modalities. State-of-the-art methods, often formulated as variational regularization, have shown to significantly improve image reconstruction both quantitatively and qualitatively. Almost all of these models rely on the assumption that the modalities are perfectly registered, which is not the case in most real world applications. We propose a variational framework which jointly performs reconstruction and registration, thereby overcoming this hurdle. Our approach is the first to achieve this for different modalities and outranks established approaches in terms of accuracy of both reconstruction and registration. Numerical results on simulated and real data show the potential of the proposed strategy for various applications in multi-contrast MRI, PET-MR, and hyperspectral imaging: typical misalignments between modalities such as rotations, translations, zooms can be effectively corrected during the reconstruction process. Therefore the proposed framework allows the robust exploitation of shared information across multiple modalities under real conditions.


I. INTRODUCTION
W E AIM at finding an approximate solution to an ill- posed inverse problem of the form where A denotes the forward operator and f is measured data which is typically corrupted by noise.Examples include magnetic resonance imaging (MRI), positron emission tomography (PET), computed tomography (CT), image denoising, deblurring, or super-resolution, or possibly a combination of these tasks.A widely used approach of obtaining approximate solutions of ( 1) is variational regularization where R is a regularization functional that enforces prior knowledge, like for instance sparsity of the solution or its gradient, see [1], [2], [3] and references therein.
In this work we adopt the scenario, that we are in possession of a specific piece of prior knowledge, namely a side information v which we know to have some "common structure" with the true solution u of (1).For instance, v could be a high-resolution photograph which assists the reconstruction of low-resolution hyperspectral images [4], [5], [6], [7], [8] or an anatomical (MRI or CT) image for the reconstruction of PET images [9], [10], [11], [12], [13], [14], see Figure 1.Other L. Bungert was with the Department of Mathematics, University of Erlangen, Germany.e-mail: leon.bungert@fau.de.
M. J. Ehrhardt was with the Institute for Mathematical Innovation, University of Bath, UK. e-mail: m.ehrhardt@bath.ac.uk.relevant applications include fMRI [15], spectral CT [16], EIT [17] and multi-contrast MRI [18], [19], [20], [21], [22], [23].Such approaches can greatly facilitate the reconstruction task of approximate solutions of (1) as shown in all of the aforementioned works, for instance.However, if the side information v badly aligns with the desired solution u as observed through f , most of these methods yield unsatisfactory reconstructions since they try to bring images into correspondence that do not correspond, see Figure 2 for an illustration and [24] for a survey on hyperspectral fusion algorithms exhibiting this problem.
In real-world applications, however, such misalignment typically cannot be avoided since frequently the acquisitions of v and f happen at different times and through different modalities.To mitigate this problem mathematically, one can for example use a blind deconvolution approach [8], [25] in order to correctly link the side information and the measured data.However, such methods are limited to cases where the u and v can be connected through a simple translation.This is due to the fact that translation of an image corresponds to convolution with a translated kernel.To be able to correct for more complicated deformations, we consider the following joint reconstruction and registration model This model seeks to reconstruct an image u together with a arXiv:2004.00589v1[cs.CV] 1 Apr 2020 deformation field φ, such that u has common structure with the side information v (enforced through the regularization term R(u; v)) and the deformed image u • φ explains the data f well.In addition, functional S can be chosen to suitably regularize φ.Note that the minimization can be restricted to subsets of admissible images or deformation fields.In particular, one can use parametric deformations in order to only allow for rigid, affine, diffeomorphic, and other deformations.
A similar strategy has been proposed for PET-CT in [13], [14].In contrast to our proposed framework, they focus on smooth regularizers and employ an algorithm with nested iterations which is not guaranteed to converge to a critical point of (3).
In [26], [27] the related problem of simultaneously reconstructing an image while registering it to a template was studied.Models to jointly reconstruct and estimate the motion between a sequence of images observed though indirect measurements were proposed in [28], [29].However, all these methods are not applicable for our task since we are dealing with images from different modalities.Registering the reconstruction to a template of a different modality is not meaningful and hence we have to enforce this registration indirectly through the regularization functional R(u; v).

A. Notation
In this section we give sense to model (3) above which, up to now, only has symbolic character.The optimization in (3) takes place over images u ∈ U and deformation fields φ ∈ V. We assume that an image u ∈ U has its pixels located at some fixed positions x i in R d for i = 1, . . ., n, where d ∈ N denotes the image dimension (typically d ∈ {2, 3}), which allows us to identify U ∼ = R n where u i := u(x i ).Furthermore, this lets us identify a deformation field φ ∈ V, which we denote as mapping φ : R d → R d , x → φ(x), in order to stay close to mathematical notation, with an array in R n×d by letting the i-th row be given by φ i := φ(x i ) ∈ R d for i = 1, . . ., n.For u ∈ U and φ ∈ V we define their composition as where J : U×V → U denotes an interpolation operator.Mathematically speaking, being an interpolation operator requires two properties: firstly, it shall hold J(u; id) = u for all u ∈ U where id ∈ V denotes the identity deformation field id(x) = x.Secondly, for fixed φ ∈ V the map u → J(u; φ) shall be linear.In contrast, for fixed u ∈ U the map φ → J(u; φ) is nonlinear, in general.However, we assume that J is continuously differentiable in its second argument, which is the case for biquadratic (-cubic, -quartic, etc.) spline interpolation, for instance.Note that linear interpolation is not differentiable and is hence not considered here.The differentiability assumption is not only of mathematical relevance but is also important from an algorithmic point of view (cf.[30]) since it assures that optimization algorithms based on derivatives are well-defined.The forward operator in (3) is a linear map A : U → R m and f ∈ R m is the measured data.In some applications, e.g.MRI, one needs a complex-valued data space.This can be incorporated in our setting by identifying the complex numbers with R 2 .For defining the regularization functional R in (3) we will also need the notion of a gradient ∇u ∈ V of an image u ∈ U. To this end we use forward finite differences, as detailed for instance [8].Throughout the paper, we let x denote the standard Euclidean norm of a vector x.

B. Structure-promoting regularization
The main challenge of our model is that it combines data from different modalities.In particular, the image u, observed through the data f , and the available side information v typically have completely different contrasts.This makes the template-based methods [26], [27], and optical-flow methods [28], [29] inapplicable.We only assume structural similarities on u and v.While there are also other models [10], we model this similarity in terms of shared image edges.Mathematically, this means that u and v ought to have parallel gradients, which can be expressed as for all i = 1, . . ., n.Since gradients are orthogonal to level sets, this condition is also referred to as parallel level sets, see [31], [32].Note that the left hand side in (5) vanishes whenever ∇u and ∇v are collinear and ∇u if they are orthogonal.
To enforce this constraint, one defines the directional total variation (cf.[19]) with respect to v ∈ U as where 1 denotes the d × d identity matrix.The parameter γ ∈ [0, 1) steers the influence of the side information from no (γ = 0) to high (γ ≈ 1), where γ = 1 should be avoided for theoretical reasons, see [33], [8].For γ = 0 one observes that dTV reduces to the standard total variation TV.Furthermore, ε > 0 is a small parameter which assures that ( 8) is welldefined even if ∇v i = 0. Note that if γ = 1 and ε = 0, then ( 7) is the projection onto the orthogonal complement of span(∇v i ) and in this case ( 5) holds for all i = 1, . . ., n if and only if dTV(u; v) = 0.For many modalities the physical assumption that images are nonnegative makes sense.In these cases, we enforce this via the characteristic function and set the regularization function R in (3) as Here, α > 0 denotes a regularization parameter which steers the amount of regularization used.We would like to stress that the overarching framework of this contribution is not limited to this particular regularizer and can be used in conjunction with other models, too, for example [9], [10], [4], [18], [5], [20], [12], [15], [21].

C. Parametric deformations
In practice, frequently the kind of deformation is approximately known.For instance, a CT and an MRI image of the brain of the same patient can be assumed to be connected by an affine transform of the form and shears with the shear matrix However, one can also think of higher-order parametrizations of the form φ(x) = H(x, x) + M x + b for x ∈ R d where H : R d × R d → R d denotes a bilinear form parametrized by a tensor with d 3 degrees of freedom.For more examples of parametric deformations we refer to [30].For a review on different methods in image registration we refer to [30], [34].An advantage of using parametrized deformations is that the number of variables in the optimization problem ( 3) is dramatically reduced since not a whole deformation field with n×d entries has to be estimated but only the parameters of the parametrization.Remember that n denotes the number of pixel which can be very large, and d denotes the dimension of the physical space which is typically 2 or 3.In the case of affine transformations (11) the number of parameters is d 2 + d = 6 for d = 2 and for rigid motion in the plane it is only 3. Hence, in these cases the number of parameters is significantly smaller than n × d.Moreover, parametric deformations do not require sophisticated regularization, as used for instance in [28], [29].We incorporate the parametrization through a map P from the parameter space R p to the space of deformation fields V.In general, P can be nonlinear-as it is the case for rigid motion, for instance-but for affine deformations (11) it is linear and given by P : R 6 → V, defined as in two spatial dimensions.For a more compact notation we suppress the dependency on the parametrization P and write where we used the composition operator (4).Furthermore, incorporating parametric deformations into (3) we obtain

III. ALGORITHMIC FRAMEWORK A. Optimization
In this section, we detail the optimization scheme that we use to solve problem (3) numerically.Before we formulate the algorithm, we cast problem (3) into the form Algorithm 1: Proximal alternating linearized minimization (PALM) to solve (17).
where D(u, ϕ) = 1  2 Au ϕ − f 2 denotes the data fidelity term and we abbreviated R(u) = R(u; v).Note that-due to the smoothness assumption on the interpolation operator in (4)the function D is smooth in both variables.Furthermore, the objective function in ( 17) is non-convex in the joint variable (u, ϕ) and also non-convex in ϕ.However, it is convex in the variable u.Still, due to the overall non-convexity one can in general only expect to find critical points of (17) using gradient-based algorithms and the outcome depends on the initialization of the numerical optimization algorithm.Another difficulty arises from the non-smoothness of the regularization functionals in (17) which excludes many gradient-based algorithms.Here we employ the proximal alternating linearized minimization (PALM) algorithm [35] which amounts to alternating forward-backward splitting, see Algorithm 1.
The proximal operator for dTV with nonnegativity constraint (10) can be computed using the fast gradient projection algorithm (also known as FISTA) [39], see [19] for details.
Let us now study the gradients of the data fidelity used in Algorithm 1.With the chain rule and (15), these are given by (20) where the asterisk denotes the adjoint / transpose of a linear map.Note that since the interpolation operator J is linear in the first argument, it holds that ∂ u J = J.In our implementation we use the scipy [40] routine RectBivariateSpline which allows to evaluate bivariate splines of order two or higher together with their derivatives ∂ φ J. Furthermore, the linear operator ∂P (ϕ) * can be explicitly calculated in dependency on the type of parametrization used.For affine deformations, for instance, map P in ( 14) is linear; hence, it holds ∂P = P and the adjoint map P * : V → R 6 can be computed easily.
Finally, we address the choice of step sizes σ and τ appearing in Algorithm 1. From a theoretical point of view it suffices to choose them smaller than the reciprocal of the Lipschitz constants of the gradient maps in (19), (20) to obtain convergence of the algorithm.However, in practice those Lipschitz constants are hard to compute analytically and typically one can only find pessimistic upper bounds for them.

B. Initialization strategy
Since image registration alone is already a severely illposed and non-convex problem, one can expect that the degree of non-convexity of our joint model ( 3) is even higher.In particular, the outcome of Algorithm 1 will depend heavily on its initialization, in general.Hence, in order to be able to detect large deformations between the side information v and the image u such that Au ≈ f , we employ a scalespace strategy commonly used in image registration [30].The overcome large deformations we reduce the resolution of the problem such that these correspond to a few pixels only.We also employ an over-smoothing strategy based on the regularization parameter α in (10).It is well-known for a large class of regularizations of the form R = αJ with a regularization functional J , that solutions of the variational regularization method (2) converge to an element in the nullspace of J as α → ∞ (see [41] for total variation and [42] for the general case).For the case of directional total variation, i.e.J = dTV, this means that for high regularization parameters R the reconstructed images become less oscillatory and better align with the side information.However, to avoid a strong loss of contrast in the reconstructions [41], one would like to choose the regularization parameter as small as possible.
In this work we propose a combination of resolution-based and over-smoothing strategy to solve (3) which is detailed in Algorithm 2. In a nutshell, the algorithm first reconstructs lowresolution images with high regularization parameters first and then successively decreases the regularization parameter while reconstructing better resolved images.

IV. NUMERICAL RESULTS
In all numerical experiments we will use affine parametric deformations (cf.Section II-C), since they model many realistic deformations and are computationally efficient due to the low number of free parameters.Furthermore, we do not regularize the affine deformation fields, meaning that we choose S(ϕ) = 0 for all ϕ ∈ R p in (16).Correspondingly, the second proximal operator in Algorithm 1 reduces to the identity operator.The details of the scale-space Algorithm 2 are specified below but at this point we already remark that we use K = 100 iterations of the PALM algorithm in each loop of Algorithm 2. The model parameters in (8) were chosen similar to [19], [8] as γ = 0.9995, ε = 0.01 • max n i=1 ∇v i .The numerical experiments have been carried out in Python using ODL [43].The line integrals in the PET experiment were computed using the Python module scikit-image [44] and for interpolation we use biquadratic splines from the scipy [40] routine RectBivariateSpline.The source code which reproduces all experiments will be published upon acceptance of this manuscript.

A. Hyperspectral super-resolution
In this section we fuse a 100 × 100 spectral data channel with a highly resolved panchromatic side information of size 400 × 400.We consider three test cases rigid, shear, and nonlinear where we artificially introduce different deformations between data and side information.For rigid we utilize the deformation where R θ is a rotation matrix (12) with angle θ = 0.1 ≈ 5.7 • and b = (0.06, 0.04) T is a translation vector.The pixels of the images lie in [−1, 1] 2 and have width 0.02 (data image) and 0.005 (side information).Hence, we try to correct an effective off-set of approximately 12 pixels.The other deformations are given by φ shear (x) = S a x + b-where S a is the shear matrix ( 13) with a = 0.08-and We notice that φ nonlinear is a nonlinear perturbation of φ rigid and, in particular, is not affine.In the scale-space Algorithm 2 we chose five different resolutions n k = (25 2 , 50 2 , 100 2 , 200 2 , 400 2 ) and corresponding regularization parameters α k = 10 −3 • (10 4 , 10 3 , 10 2 , 10, 1).These parameters were used for all reconstructions involving dTV and for all test cases.In the experiment with standard total variation TV as the regularizer we used the parameters α k = 10 −5 • (10 4 , 10 3 , 10 2 , 10, 1) instead.
The first row of Figure 3 shows the data rigid together with the side information misaligned through a rigid transform.Below we show four different reconstructions, including a target reconstruction which was computed using aligned data.In the absence of a ground truth solution to the problems it serves as target for the comparison with other methods.Furthermore, we show a standard TV reconstruction, which does not utilize the side information at all, and a dTV reconstruction, which uses the misaligned data and side information and does not correct the deformation.The last reconstruction is obtained using our proposed method which corrects for the misalignment between data and side information.
In Figure 4 we show results for our method used on the other data sets shear and nonlinear.Also here our method was able to correct the deformation between data and side information sufficiently to achieve appealing reconstructions.Note that the slightly worse reconstruction of nonlinear is due to the nonlinear nature of the deformation field φ nonlinear which cannot be fully corrected by an affine transformation.

B. PET-MR reconstruction
In this experiment we consider the problem of PET reconstruction using a fully sampled T1-weighted MR image of size 120 × 120 as side information.The forward operator is modelled by a parallel beam X-ray transform with 200 angles equispaced in (0, π] and 192 bins.The sinogram data were simulated using a ground truth image which with respect to the side information was transformed with the rigid deformation φ rigid in (21) with θ = 0.1 ≈ 5.7 one obtained through filtered back-projection, the second one utilizing TV regularization, the third one using dTV without motion correction, and the fourth one being the proposed method.The first two methods, which do not use the side information or correct any motion, exhibit poor image quality.The misaligned dTV method again introduces severe artefacts due to the misaligned side information.In particular, all three methods introduce bright spots due to noise.On the other hand, the proposed method corrected the deformation and is in perfect agreement with the ground truth and the target reconstruction.
Figure 6 illustrates the scale-space Algorithm 2 to correct large deformations.It shows the reconstructed images and deformation fields at increasing resolutions, each after 100 iterations of the PALM Algorithm 1.For better interpretability we show the deviation of the reconstructed fields from the identity, e.g. the yellow arrows indicate the field φ rigid − id and the red arrows the corresponding expression for the reconstructed fields.One can observe that with increasing resolution the algorithm successively improves the reconstructed deformation fields and images.Already at the lowest resolution the deformation field is almost correctly reconstructed in structured parts of the image.Due to the parametric approach, for higher resolutions the deformation fields are perfectly estimated in the whole image.Correspondingly, the reconstructed images show more and more fine details and due to the decreasing regularization parameter also the contrast is enhanced.

C. Undersampled multi-contrast MRI reconstruction
The final experiment is multi-contrast MRI reconstruction from undersampled data.Since all contrasts are likely to share a structural information, less data is needed if the expected  redundancy is exploited.Here we assume that a T2-weighted has been acquired using "full data" and reconstructed without artefacts.We then use this image to reconstruct a T1-weighted image using only a fraction of the data.The forward operator is a discrete Fourier transform defined on complex-valued images of size 256 × 256, followed by a sampling operator corresponding to 30 equidistant spokes.This yields a sampling of approximately 6% of the k-space.Sampling pattern and side information as well as the ground truth image and a target reconstruction, computed using aligned data and side information, can be found in the first row of Figure 7.As before, the image domain is [−1, 1] 2 which corresponds to a pixel width of approximately 0.0078.The deformation field in this simulation is a combination of rotation, zooming, shearing, and displacement given by φ mix (x) = 0.9 0.04 sin(θ) 0.9 x + 0.02 0.08 with θ = 0.1.In particular, zooming is another interesting deformation which can be modelled by affine maps.Here, we use a zoom-factor of 0.9 which means that the size of the side information image is only 90% of the size of the image which underlies the k-space data.Since nonnegativity is not a meaningful constraint for complex-valued images, we drop this constraint in (10) for this test case.The resolutions and regularization parameters in Algorithm 2 were chosen as n k = (32 2 , 62 2 , 128 2 , 256 2 ) and α k = 10 −2 • (5 3 , 5 2 , 5, 1).For the TV reconstruction we used α k = 5 • 10 −4 • (5 3 , 5 2 , 5, 1).
In the second row of Figure 7 we show the zero-filled reconstruction (pseudo-inverse) and three more reconstructions, as before.Note that we only display the magnitudes of the complex images.The reconstructions using the pseudo-inverse, TV, and misaligned dTV all show the significantly larger skull compared to the side information which was encoded in the measured k-space data.Furthermore, they suffer from artefacts due to noise, missing data, and misaligned side information, respectively.The proposed method, however, corrects all deformations and, in this way, registers the underlying image to the side information.
The scale-space generated by Algorithm 2 is visualized in Figure 8 with observations similar to subsection IV-B.
V. CONCLUSIONS Multi-modality imaging is becoming ever more important in many disciplines from medicine to material sciences.Mathematical approaches to combine data from multiple modalities  exist but are prone to imperfect misalignment.In this work we proposed a generic framework to reconstruct an image from indirect measurements while registering it to a structural side information from a potentially different modality.Numerical experiments for a specific regularizer (directional total variation) and a variety of applications including hyperspectral imaging in remote sensing, PET-MR and multi-contrast MRI underpinned the aptness of the proposed approach to correct misalignments between data and structural side information which cause existing algorithms to fail.Missing robustness to misalignment has been the biggest hurdle for integrating structure promoting regularizers into routine use, for instance in the clinic.Thus, the proposed framework paves the way for their translation into various applications.

Figure 1 .Figure 2 .
Figure 1.In many applications two images of different contrast and resolution are acquired.Images courtesy of D. Coomes, P. Markiewicz and J. Schott.

Figure 3 .
Figure 3. Hyperspectral super-resolution.Reconstruction of data set rigid using three different methods.TV and misaligned dTV do not correct deformations and yield poor results.The proposed method corrects the deformation and the reconstruction perfectly agrees with the target.

Figure 4 .
Figure 4. Reconstructions of data sets shear and nonlinear.The proposed method corrects the deformations yields satisfactory reconstructions.

Figure 5 .
Figure 5. PET reconstructions with structural MR side information.Filtered back-projection, TV, and misaligned dTV do not correct motion and yield poor reconstructions.The proposed method corrects the deformation and the reconstruction perfectly agrees with ground truth and target.

Figure 6 .
Figure 6.From left to right: PET reconstructions of increasing resolutions generated by Algorithm 2. The deviation of the ground truth field φ rigid and the reconstructed fields from the identity are shown in yellow and red, respectively.The ground truth field is only visible in the first two images where the estimated deformation field is inaccurate.

Figure 7 .
Figure 7. Multi-contrast MRI reconstruction.Pseudo-inverse, TV, and misaligned dTV do not correct motion.The proposed method corrects the deformation and the reconstruction perfectly agrees with ground truth and target.

Figure 8 .
Figure 8. From left to right: MRI reconstructions of increasing resolutions generated by Algorithm 2. The deviation of the ground truth field φ mix and the reconstructed fields from the identity are shown in yellow and red, respectively.The ground truth field is only visible in the first image where the estimated deformation field is inaccurate.