Towards Computing Cross-modality Symmetric Non-Rigid Medical Image Registration

This paper describes a new non-rigid approach to register images from same- and cross-imaging modalities such as magnetic resonance imaging, computed tomography, and 3D rotational angiography. The deformation is a key challenge in medical image registration. We have proposed a diffeomorphism-based method to tackle this problem using an optimized framework. A non stationary velocity field is used to minimize the effect of forces that are derived from the image gradients. Furthermore, we propose a similarity energy function, based on the gray scale distribution, to limit the fluctuations while approaching the local minima. The proposed method is evaluated on both private and public datasets; the results show that the values of mean square error (MSE), normalized cross-correlation (NCC), structural similarity (SS), mutual information (MI), feature similarity index (FSIM), and mean absolute error (MAE) are 1.3136, 0.9962, 0.9897, 0.883, 0.9922, and 1.52 ± 2.09, respectively. Both qualitative and quantitative evaluation show promising registration accuracy reflecting the potential of the proposed method.


I. INTRODUCTION
There are different types of imaging modalities such as computed tomography (CT), ultrasound imaging (US), three dimensional rotational angiography (3DRA), and magnetic resonance imaging (MRI) that have been used by the clinicians to diagnose, plan and treat the health issue [1], [2]. The patient's real world coordinate must be registered with this data to keep both data and patient in the same perspective; this (a) (b) (c) (d) registration of MRI and CT images finds clinically relevant applications that include treatment planning, computer-aided diagnosis, multimodal diagnostics, surgery simulation [3], radiotherapy [4], image-guided interventions, assisted/guided surgery [5], and disease follow-up. In interventional radiology, the pre-operative CT can be registered with intra-operative US images for hepatobiliary procedures to better visualize the lesions spread across the liver. Same-modal registration is complicated due to motion or disease progression, whereas multi-modal fusion is further challenging mainly due to: 1-nonfunctional intensity mapping across MRI and CT, 2-locally varying contrast patterns along with others.

B. RELATED WORK
A number of multimodal medical image registration methodologies have been reported over the years [6]- [14]. Turco et al. [15] present the impact of positron emission tomography (PET)/CT attenuation correction on the registration between cardiac PET and a CT image. It is found that blurring introduced by the heart beating has negligible effect on the registration, however, the noise has adverse impact on the registration outcome. Pilutti et al. [16] propose a Non-Parametric Bayesian Registration that assumes model for the effect of distortions caused by heartbeat and peristalsis; here, a Gaussian filtering is applied for spatial regularization towards deriving the MR image registra-tion. Wang [18] propose a region-adaptive CT/MRI registration method with the help of multi-target regression forest to deal with the large appearance gaps.
Zhou et al. [20] present a framework based on correlation-weighted sparse representation in order to separate the contrast agent from the input dynamic contrast-enhanced (DCE)-MR images allowing the motion components to be effectively registered. Xu et al. [19] present a method for MR/CT registration that co-registers preoperative MR image with intra-operative binary CT image using a rigid approach to make it faster; subsequently, non-rigid approach is used to correct tiny misalignments.
In recent years, machine learning (ML) and deep learning (DL) have vigorously gained traction in medical image registration. Huang et al. [22] present an unsupervised learning-based framework, where the network consists of affine and deformable transformations. Hansen et al. [23] propose a sparse key point-based geometric network that leverages discrete dense displacement maps facilitating the registration process. Grim et al. [24] present a method that first trains a neural network to detect a set of anatomical landmarks, then, the combination of landmark locations and network is used in computing the initializations to incorporate the confidence of the network to detect the landmarks. Yu et al. [25] present an unsupervised network based on a metabolic constraint function and a multidomain similarity measure towards determining PET/CT registration that uses standard uptake value (SUV) distribution of hypermetabolic regions of the human organs or region of interest (ROI). Fechter et al. [26] employ a method that combines U-Net with a coarse-to-fine approach and a differential spatial transformer module to estimate deformable image registration. Guan et al. [27] present a multi-channel convolutional neural network that combines a periodic vascular diameter variation model with the convolutional neural network (CNN) registering digital subtraction angiography images with their 3D models. Thus, the literature shows that the research fraternity has shown more inclination towards these ML/DL-based methods, however, there have been several limitations that still remain unanswered: Deep learning (DL)-based methods, specifically neural networks for medical image registration, have received attention due to their end-to-end nature and state-of-theart performance. However, neural networks face several challenges that are not present in the conventional methods [46]. In a clinical setting, these variations are expected in the data due to multiple machines with several acquisition parameters that can cause the data distribution to change. One technical limitation for training the neural networks is due to the limited quantity of clinical data, resulting in overfitting (i.e., poor generalizability). Additionally, the training procedure of the neural networks does not provide any convergence guarantees. Other technical challenges include the blackbox nature of deep learning-based approaches, which downplays the reliability of the neural networks in clinically sensitive settings [47].
We realize the above mentioned issues, es-pecially, the shortage of clinical data and thus decided to focus on the conventional methods. Yipeng et al. [28] present a method for generating a subject-specific statistical shape model capturing the prostate deformation. Li et al. [29] introduce an objective function for similarity measurement that embeds the local phase features derived from monogenic signal in the modality independent neighborhood descriptor based on autocorrelation of local structure. Pai et al. [31] propose a multi-scale, multikernel shape, compactly supported kernel bundle framework for stationary velocity field-based image registration. Jarrod et al. [33] describe a human-to-phantom validation framework that transforms the surface collection patterns from in−vivo image-guided liver surgery procedures onto a well-characterized hepatic deformation phantom for validating surface-driven nonrigid registration. Sun et al. [34] use lower-order Bspline functions for registration, while preserving smoothness of the deformation. Chakraborty et al. [37] propose a 2D myocardial deformation imaging to develop a nonrigid image registration motion estimator adapted to radio frequency (RF) data sets. Darkner et al. [38] present collocation for diffeomorphic deformations as a numerical solution to diffeomorphic image registration using an implicit A-stable collocation method. Qin et al. [41] combine two separate methods: 1-a superpixel-based structure scale estimator to estimate the boundary-aware structure scale of each reference structure, 2-an edgeaware mismatch scale measuring the mismatch degree of the edge structures to be matched in the images. Sureerat et al. [32] present a symmetric diffeomorphic deformable registration algorithm incorporating a modality-independent neighborhood descriptor and a Huber metric for MR/CT registration. Zhang et al. [42] use localphase mean and phase-congruency values of different orientations, to improve the robustness and accuracy using filter-bank of Log-Gabor filters at different orientations and frequencies.
Yang et al. [43] use an adaptive weighted ob- VOLUME 4, 2016 jective function that formulates the alignment of two point sets as a mixture model estimation problem.
Despite having rich literature, the problems are many and they still remain: 1) different physical acquisition processes may generate statistical correlation between imaging structures that do not correspond to the same anatomical structures, violating one of the underlying assumptions for most intensity-based similarity measures, and 2) the deformation, spatial and temporal variabilities. The ability to capture the complex image deformations and establish accurate point-wise correspondence is key to many computer vision applications that involve image registration and atlas construction. These properties become particularly challenging, when the object depicted on the images undergoes a severe deformation or has a high shape variability. Dissimilarities can occur due to inter-and intrafractional anatomical variations from the preoperative image set, i.e. variations between different treatment sessions and during single treatment sessions, respectively. Furthermore, the imaging data is taken from different imaging devices (multi-modality) and may be taken within different time frames (multi-temporality). Deformable registration has a significantly greater number of degrees of freedom (DOF) to manage the aforementioned local distortions between anatomical structures. Interestingly, diffeomorphism methods [31], [32], [38], [42] register images slowly warp images until a satisfying overlap is attained; this method is particularly capable of preventing an invalid folding of the deformation field and guarantees a smooth oneto-one mapping between points. However, these registration methods have been domain specific and parameter sensitive. The update scheme relies on forces derived from the image gradients (even though partially) and is therefore fundamentally limited by the local scope. Therefore, we have proposed a new diffeomorphism-based method that designs a similarity energy function overcoming these problems.
The rest of the paper is organized as follows: Section II describes the clinical data used and the proposed method, Section III includes the results obtained by the method. Finally, Section V concludes the paper.

A. DATA
We have used both private (Hamad Medical Corporation) and publicly available datasets. The data that have been obtained from Hamad Medical Corporation have the following description: each dataset consists of 700 slices acquired along the long axes of the subjects. The average slice thickness, pixel spacing, and matrix size are 0.29 mm, 0.29 mm×0.29 mm, and 512×512, respectively. In the public dataset [44], and [45], a single inversion pulse is followed by a 400 ms delay time. As a result magnetizationprepared 180 degrees radio-frequency pulses and rapid gradient-echo (MP-RAGE) images are T1-weighted. The excitation pulse has a 10 degree flip angle, echo time (TE) is 4 ms, and TR is 10 ms. The resolution is 128×256×256. The ground truth data were provided by the respective organizations.

B. METHODOLOGY
Generally, an image registration aims at determining the spatial correspondence of two or more image sets for minimizing their differences. Let us consider two image sets: a static, F xd, and a dynamic/moving, M vg; image registration algorithms try to find the optimal transform minimizing the difference between F xd and M vg. Such algorithms can be rigid or deformable. For rigid image registration, the translation and rotation of all image pixels is uniform, such that all pixel-to-pixel relationships within the image set remain equal before and after the transformation. For deformable registration (also called non-rigid), those pixel-to-pixel relationships change, i.e., while two image sets are aligned on the same reference coordinate, a pixel in each image set on the same coordinate may not necessarily represent the same anatomical  structure. Therefore, deformable image registration can account for local distortions, occurring since organs and tissues are non-rigid structures and subject to deformation. In our method, we use the nearest-neighbor searches establishing the global correspondences between the images; the spectral forces capture the substantial deformations. This is a diffeomorphism-based method that uses an optimized framework venturing the global scope and the speed of nearestneighbor search to limit/capture the deformations. The theory of Lie groups states that a diffeomorphic transformation φ resides in a Lie structure [38]. Furthermore, φ is associated with velocity field by, φ = e v . However, in case of stationary velocity fields, φ = e v [30], where v is the velocity field and φ is the diffeomorphism.
In this work, we perform feature matching followed by the spectral representations. Section II-B1 describes the similarity metric used for symmetric registration, Section II-B2 discusses the velocity field inducing diffeomorphic mapping. The working flow-chart of the method is provided in Fig. 2.

1) Similarity Metric for Symmetric Registration
Intensity-based measures are divided into statistical measures, information theoretic measures, and measures that consider the dependency of neighboring pixels [35], [39], [36].  VOLUME 4, 2016 where p, •, Reg, λ φ , Ω, represent the mapping of pixel, transformation operator, regularization term, regularization parameter, and common region of two input images F xd and M vg after registration, respectively [51]; λ φ = αi 2 αs 2 and In order to minimize, a hidden variable is introduced that considers the regularization criterion as a prior on the smoothness of the transformation φ. Instead of requiring the point correspondences between the image pixels (a vector field l) to be the exact of the transformation, some error is introduced at each image point. Thus, we introduce a non-ruled spatial transform parameter l, that makes the new symmetric function as: where Dist (l, φ) = l − φ , λ 2 un represent uncertainty degree between l and φ, respectively. The displacement field u is produced using the space geometric transform, and two vectors are added directly to form a new vector. The energy functional then becomes: The energy function depends on two fields, l and φ, thus it is minimized with respect to l and φ. We first start with φ 0 = Id at iteration 0; then iteratively at iteration=n, 1) we find l n by minimizing ing gradient descent method [59], 2) we find φ n , minimizing 1 2 F xd − M vg • (l) 2 + 1 λun 2 l n − φ n 2 using single convolution [60].
After minimizing the energy function and solving the displacement field, the final displacement field is obtained. Diffeomorphic transform or space ensures reversible, smooth deformation and topologically invariant deformation.

2) Velocity Field Inducing Diffeomorphic Mapping
Since there is deformation, we consider a nonstationary velocity field (nSVF), where the initial velocity (or equivalently the momentum map) is different at different time points along a geodesic [50]. In this case, for more than two time points, it is necessary to choose a time point for the subject-specific template, and this time point is generally the average (or median) of the observed time points. The momentum maps (from the template to all the time points) can then be compared in the template reference space only. Diffeomorphic transform is the exponential map of nSVF, v, and it is a time independent vector that induces diffeomorphic mapping φ. Let us consider pairs of points in the image domain x 0 , x t ∈ Ω, t ∈ R + , such The inverse mapping is found by integrating −v, i.e. for y 0 , y t ∈ Ω, dy dt = −v (y) , y (0) = y 0 , ψ t (y 0 ) = y t . We suppose that the two input images are affine registered using conditional mutual information [49], i.e.
The optimal registration is found by minimizing (3) accommodating the conditional mutual information (C) with respect to deformation. The numerical approximate is The convergence condition is continuously monitored until E(n) − E(n − 1) < threshold. For this, we use mid-point rule: where φ i are the diffeomorphisms and i=1...n.

3) Summary of the Proposed Method
The key steps in the algorithm are summarized below (in Algorithm 1): The steps after 2 are iterative in nature and it may be noted that the updates convergence rate is a measure that defines the efficacy of a diffeomorphic registration method; the denser is the convergence the faster is the registration.

III. RESULTS
The proposed method is tested on the datasets obtained from Hamad Medical Corporation in Qatar, [44], and [45]. The data include brain aneurysm 3DRA data, brain tumor MR/CT data, and liver MR/CT data [48]. We have tested the algorithm on the images from same modality and cross-modalities as well. The results of registration processes are provided in Fig.  3, 4, 5, and 7. It may be observed that the proposed method has updates (indicated with the arrows) have faster convergence indicating a global scope. Fig. 3 provides the results when tested on brain aneurysm images. The reference image includes the aneurysm vessels that appear gradually, when the contrast agent is injected in the cerebral vessels, therefore, the two images are at different time instants. Similar is the case in Fig. 4. Fig. 5 and 6 provide results when tested on cross modalities, CT and MR. Fig.  7 provides the results of brain MR images but with deformations on moving image. We too have measured the similarities in terms of accuracy and variances as provided in Fig. 8. The algorithm has been implemented on MAT-LAB R2017b running on a workstation with 16 GB RAM and 2.8 GHz Intel processor; the average time to perform registration on a data with approximately 80 slices is little more than 1 minute that is quite reasonable from clinical perspective. We have also validated the method quantitatively. We have compared our method with various other diffeomorphic and non-diffeomorphic methods such as [65], [32], [39], [40], [38], and [31], using some popular measures such as mean square error (MSE), normalized cross-correlation (NCC), structural similarity (SS), mutual information (MI), feature similarity index (FSIM), and mean absolute error (MSE) [52]. The results of CT/CT, 3DRA/3DRA, MR/MR, and CT/MR are provided in Table 1 Table 5) the proposed method with some popular DL-based methods, including Voxel-Morph (VM) [67], LT-Net [66], and Symmetric Normalization (SyN) [35]. We have selected these methods because they have been regularly preferred by the research fraternity for comparison purpose. Among these, Voxel-Morph is probably the most famous methods in recent years. The results show that the proposed method fairly performs as compared to the DL-based methods although the margin is not significant.

IV. DISCUSSION
From the Table 1, 2, 3, and 4, this can well be observed that the registration performance on CT/CT registration is the best among the other imaging modalities. The energy function plays an important role in this registration process. Therefore, we have investigated the effect of l after its introduction in (3) by subtracting (1) from (3), VOLUME 4, 2016 Algorithm 1 diffeomorphic registration Require: A static image, F xd, and a moving image, M vg, such that Ω → Γ, where Ω ⊆ R N , Γ ⊆ R Ensure: Determine the optimal transform T(x) minimizing the difference between F xd and M vg 1: Seek a bijective mapping φ : Ω → Ω: F xd•φ to improve the similarity with M vg and M vg •φ −1 improves similarity to F xd under En (φ), φ being the diffeomorphic mapping 2: The displacement field, u, is initialized 3: A new symmetric function, En (l, φ) is calculated using spatial transform parameter l 4: The energy functional in the displacement field, En (u), is calculated 5: Parametrize the spectral features φ by time, t, as dx The driving force is calculated and the velocity field is updated 7: The updates in the velocity field during registering process are computed with spectral correspondence 8: The deformation field is regulated 9: Exponential mapping of deformation is obtained by diffeomorphic transform 10: Similarity measurement function En is calculated and it is solved to judge the convergence 11: Pairwise registration is performed In order to have a decent comparison, we have plotted the difference of the two energy functionalities and their difference (6). Also, we have plotted the energy functionality defined in (4) and the difference to assess them fairly. All of these figures are provided in Fig. 9. We have found that the energy function defined in (4) converges appropriately reflecting its significance in image registration.

V. CONCLUSION
In this paper, we have presented a non-rigid registration method that is based on diffeomorphic mapping. The results are promising with respect to large or small deformations; furthermore, it is not domain specific. The image gradients have little or no effect on the registration outputs. In future, we intend to test this method on large cross-modal platform with very large deformations.

APPENDIX. REGISTRATION EVALUATION MEASURES
In this section, some of the measures that evaluate the proposed registration method are included.

A. MUTUAL INFORMATION
Mutual information (MI) is defined as [53]: where IE (X) and IE (Y ) denote the information entropy of reference image X and float image Y , respectively. IE (X, Y ) is the joint entropy of the two images.

B. MEAN ABSOLUTE ERROR (MAE)
The mean absolute error is calculated over the neighborhood of the landmarks. The MAE of the real registration error RR i and the estimated one RR i is calculated by [54]: where N is total number of pixels.

C. FEATURE SIMILARITY INDEX
The feature similarity index (FSIM) is a similarity measure between two images X and Y is  calculated as [56]: Ω means the whole image spatial domain. Here S L (x) determines the similarity between X(x) and Y (x). The maximum phase congruence P C m (x) weighs the importance of S L (x).           SARADA PRASAD DAKUA is working as a Senior Research Scientist in the Department of Surgery, Hamad Medical Corporation (HMC). Prior to joining HMC in 2017, he has spent over six years in Qatar Robotic Surgery Centre, Qatar Science and Technology Park (Qatar Foundation) as a Research and Development Executive. He has 14+ years of research experience in computer vision and image processing. He holds a Ph.D. degree from Indian Institute of Technology Guwahati in medical image processing and analysis. He also holds an MBA degree from University of Leicester (United Kingdom) and is a certified PMP.