Fast and Robust Symmetric Image Registration Based on Distances Combining Intensity and Spatial Information

Intensity-based image registration approaches rely on similarity measures to guide the search for geometric correspondences with the high affinity between images. The properties of the used measures are vital for the robustness and accuracy of the registration. In this paper, a symmetric, intensity interpolation-free, affine registration framework based on a combination of intensity and spatial information is proposed. The excellent performance of the framework is demonstrated on a combination of synthetic tests, recovering known transformations in the presence of noise, and real applications in biomedical and medical image registration, for both 2D and 3D images. The method exhibits greater robustness and higher accuracy than similarity measures in common use, when inserted into a standard gradient-based registration framework available as part of the open source Insight Segmentation and Registration Toolkit. The method is also empirically shown to have a low computational cost, making it practical for real applications. The source code is available.


I. INTRODUCTION
I MAGE registration [1]- [3] is the process of establishing correspondences between images and a reference space, such that the contents of the images have a high degree of affinity in the reference space.An example of such correspondence is a mapping of an image (often referred to as floating image) of a brain to a reference space of another image (often referred to as reference image) of a brain where their important structures are well co-localized.There are two main categories of approaches for image registration: feature-based methods extract a set of feature points between which a correspondence is found, whereas intensity-based methods use the voxel-values directly, and evaluate candidate mappings based on a similarity measure (affinity).There are also two main categories of transformation models: linear (which include, as special cases, rigid, similarity, and affine transformations), and non-linear (deformable).The combination of differentiable transformation models and differentiable similarity measures enables the use of gradient-based local optimization methods.
Medical and biomedical image registration, [4], is an important branch of general image registration and a lot of effort has been invested over the last decades to refine the tools and techniques, [2].Although a majority of the recent research has been devoted to non-linear registration techniques, the most prevalent registration method used in the clinic is still The authors are with the Centre for Image Analysis, Department of Information Technology, Uppsala University, Uppsala, Sweden.E-mail: {johan.ofverstedt,joakim.lindblad,natasa.sladoje}@it.uu.se linear registration.In a number of situations, the deformations allowed by non-linear registration can be difficult to evaluate and may affect reliability of diagnosis, [2]; hence, physicians may prefer a more constrained rigid or affine alignment.Considering their wide usage as fundamental tools, improvement of rigid and affine registration in terms of performance and reliability is highly relevant in practice.
Feature-based image registration is dependent on the ability of the feature extraction method to locate distinct points of interest appearing in both (all) images.Feature-extractors (e.g.SIFT [5]) typically presuppose the existence and relevance of specific local characteristics such as edges, corners and other salient features; if no, or too few, such distinct points are found, the registration will fail.This is often the case in medical and biomedical applications, [6], [7], where intensitybased registration, therefore, tends to be the method of choice.Figure 1 shows an illustrative example of a biomedical application where feature-based method fails, whereas intensity-based method can be successful.
Intensity-based registration is, in general, formulated as a non-convex optimization problem.The similarity measures commonly used as optimization criteria typically exhibit a high number of local optima [8], [9]; a count which tends to rapidly increase under noisy conditions.A small region of attraction of a global optimum imposes that the starting position has to be set very close to the optimal solution for it to be found by an optimizer.This leads to reliability challenges for automated solutions.
In this study we develop a registration framework based on a family of symmetric distance measures, proposed in [9], which combine intensity and spatial information in a single measure.These measures have been shown to be characterized by smooth distance surfaces with significantly fewer local minima than the commonly used intensity-based measures, when studied in the context of template matching and object recognition.In this work we demonstrate that these distance measures can, with a slight modification, be successfully used for fast and robust affine image registration.By differentiating the distance measure we are able to use efficient gradientbased optimization.The proposed method outperforms the commonly used similarity measures in both synthetic and real scenarios of medical and biomedical registration tasks, which we confirm by (i) landmark-based evaluation on transmission electron microscopy (TEM) images of cilia [10], with the aim of improving multi-image super-resolution reconstruction, as well as (ii) evaluation on the task of atlas-based segmentation of magnetic resonance (MR) images of brain, on the LPBA40dataset [11].Intensity interpolation is typically a required tool in the context of intensity-based registration performed with commonly used similarity measures since the sought transformation (and intermediate candidates) is likely to map points to regions outside of the regular grid.Treating the reference and floating images differently in terms of the interpolation introduces a significant source of asymmetry [12] and can cause a registration task to succeed or fail depending on which image is selected as reference and which is floating.Our proposed approach requires no off-grid intensity values, and is interpolation-free in terms of intensities; empirical tests confirm that it is highly symmetric in practice.
Noting that intensity-based image registration can be computationally demanding, we also demonstrate that the proposed measure is fast to compute, in comparison with the implementations of the measures existing in the ITK-framework [12].The registration framework, with the proposed measure, is implemented in C++/ITK, and its source code is available 3 .

A. Images as Fuzzy Sets
First we recall a few basic concepts related to fuzzy sets [13], a theoretical framework where gray-scale images are conveniently represented.
A fuzzy set S on a reference set X S is a set of ordered pairs, S = {(x, µ S (x)) : x ∈ X S }, where µ S : X S → [0, 1] is the membership function of S.Where there is no risk for confusion, we equate the set and its membership function and let S(x) be equivalent to µ S (x) A gray-scale image can directly be interpreted as a spatial fuzzy set by rescaling the valid intensity range to [0, 1].We assume, w.l.o.g., that the images to be registered have an intensity range [0, 1] and we directly interpret them as fuzzy sets defined on a reference set which is the image domain, and is in most cases a subset of Z n .We use the terms image and fuzzy set interchangeably in this text.
A crisp set C ⊆ X C (a binary image) is a special case of a fuzzy set, with its characteristic function as membership function 2 imagej.net/LinearStack Alignment with SIFT 3 Source code available from www.github.com/MIDA-group The height of a fuzzy set S ⊆ X S is h(S ) = max An α-cut of a fuzzy set S is a crisp set defined as α S = {x ∈ X S : µ S (x) ≥ α}, i.e., a thresholded image.Let p be an element of the reference set X S .A fuzzy point p (also called a fuzzy singleton) defined at p ∈ X S with height h(p), is defined by a membership function

B. Intensity-Based Registration and Point-Wise Distances
Intensity-based registration is a general approach to image registration defined as a minimization process, where a distance measure between the intensities of overlapping points (or regions) is used as optimization criterion.Given a distance measure d and a set of valid transformations Ω, intensity-based registration of two images A (floating) and B (reference) can be formulated as the optimization problem, where T (A) denotes a valid transform of image A into the reference space of image B.
Intensity-based similarity/distance measures which are most commonly used for image registration are Sum of Squared Differences (SSD) [14], Pearson Correlation Coefficient (PCC) and Mutual Information (MI) [15].These measures are pointbased, i.e. they are functions of the intensities of points belonging to the overlapping regions of the two compared sets.Their evaluation, therefore, typically requires interpolation of image intensities.
For two images P and Q defined on a common reference set X P,Q of overlapping points, these measures are defined as (5) and MI(P, In ( 5) avg(P ), and avg(Q) denote means of the resp.intensity distributions over the evaluated region.In ( 6) the (joint and marginal) entropies H P , H Q and H P,Q of the image intensity distributions P and Q are defined in terms of the estimated probability p of a randomly selected point v having intensities P (v), Q(v), as and Intensity-based registration, as formulated in (3), is, in general, a non-convex optimization problem with a large number of local optima, especially for the commonly used point-based measures (SSD, PCC, and MI).To try to overcome this optimization challenge, a resolution-pyramid-scheme is normally used [16], [17], where smoothed low resolution images are first registered, followed by registration of images with increasing resolution and decreasing degree of smoothing, using the transform obtained from the previous stage as starting position (so-called coarse-to-fine approach).

C. Distances Combining Intensity and Spatial Information
While the distances of Sec.II-B only rely on intensities of overlapping points, the distances presented in this section incorporate also spatial information of non-overlapping points.For such spatial relations, we consider distances between two points, between a point and a set, and between two sets.The most commonly used point-to-point distance is the Euclidean distance, denoted d E .
Given a point-to-point distance d(a, b), the common crisp point-to-set distance between a point a and a set B is Closely related to the crisp point-to-set distance is the (external) distance transform of a crisp set B ⊆ X B (with point-topoint distance d) which is defined as Taking into the consideration the intensity, or equivalently, the height of a fuzzy point, the fuzzy point-to-set inwards distance d α , based on integration over α-cuts [9], between a fuzzy point p and a fuzzy set S, is defined as where d is a point-to-set distance defined on crisp sets.The complement distance [18] of a fuzzy point-to-set distance d is The fuzzy point-to-set bidirectional distance d ¯α is For an arbitrary point-to-set distance d, Sum of Minimal Distances (SMD) [19] defines a set-to-set distance as A weighted version can be defined [9], which may be useful if a priori information about relative importance of contributions of different points to the overall distance is available: Inserting distances (11) or ( 13) in ( 14) or (15) provides extensions of the SMD family of distances to fuzzy sets [9].We refer to them as d α SMD , d ¯α SMD , d α wSMD and d ¯α wSMD .It has been observed for fuzzy set distances [20] in general, and for distances based on (11) and (13) in particular, that distances between sets with empty α-cuts may give infinite or ill-defined distances.We follow a previous study and introduce a parameter d MAX ∈ R ≥0 , [21], to limit the underlying crisp point-to-set distance.This has a double benefit of (i) reducing the effect of outliers and (ii) making the distances well defined also for images with empty α-cuts.

D. Transformations, Interpolation, and Symmetry
Linear transformations relate points in one space to another via application of a linear function.A transformation is rigid if only rotations and translations are permitted, and affine if shearing and reflections are also permitted.Affine transformation T : R n → R n can be expressed as matrix multiplication, Linear transformations can, in general, transform points sampled on an image grid to positions outside of the grid, hence an interpolator is required for obtaining the image intensity at the transformed point's location.Interpolation is a large source of error, bias, and a significant contributing factor of asymmetry in intensity-based registration, [12].Commonly, interpolation is only required for one of the two images, where sampling (for optimization) is done from the grid of the other image space; hence, the two images are treated asymmetrically, yielding distinct similarity surfaces (over the transformation parameters) depending on which image is taken as reference.This can cause a registration task to succeed or fail, depending on the registration direction.

E. Optimization
Registration with a differentiable distance measure as objective function enables the use of gradient-based optimization algorithms, which typically are significantly more efficient than derivative-free algorithms for local iterative optimization.An effective and commonly used subset of gradientbased algorithms are the stochastic gradient descent methods [22], which consider a random subset of the points in each optimization iteration, incurring a two-fold benefit: utilizing randomness to escape shallow local optima in the implicit distance surface, while also decreasing the computational work required per iteration.The size of the random subset is usually given as a fraction of the total number of points, and denoted as the sampling fraction.Approximation of the cost function by random subset sampling (where a new random subset of points is chosen in every iteration) has been, in previous studies, [15], [23], shown to perform well for intensity-based registration.

III. PROPOSED IMAGE REGISTRATION FRAMEWORK A. Distances
To extend the family of distance measures given by (15), to be suitable for registration, optionally with random subsampling optimization methods [23], we here define a new related family of distance measures.

and a weight function w
We consider point-to-set distances defined by (11) or (13).
Building on the asymmetric distance, we formulate a symmetric distance as follows: Definition 2 (Average minimal distance).Given fuzzy set A on reference set X A ⊂ R n , fuzzy set B on reference set X B ⊂ R n , weight functions w A : X A → R ≥0 and w B : X B → R ≥0 , the Average minimal distance between A and B, is In the context of image registration, we utilize d − → AMD to express a (weighted) distance between transformed fuzzy points T (A(x)), and the image B, where the transformation of a fuzzy point A(x) = {(x, µ A (x))} is given by the transformation of the reference point x: To reflect the bounded image domain, only the transformed points falling on a predefined region M B ⊂ R n are considered.Note that, when A and B are digital images, X A and X B are typically subsets of Z n and the transformed points T (x)| x∈X A do not necessarily coincide with the points of the reference set X B ; an illustrative example is given in Fig. 2.
We, therefore, provide the following definitions suited for the task of image registration: Definition 3 (Asymmetric average minimal distance for image registration).Given fuzzy set A on reference set   average minimal distance for image registration from A to B, parameterized by a transformation T : where Definition 4 (Average minimal distance for image registration).Given fuzzy set A on reference set X A , fuzzy set B on X B , weight functions w A : X A → R ≥0 and w B : X B → R ≥0 , and crisp subsets (masks) M A , M B ⊂ R n , the Average minimal distance for image registration between A and B, parameterized by an invertible transformation T : R n → R n , with inverse T −1 , is defined as The distance d R AMD is based on full sampling, taking into account all points in the two sets which have non-zero weights, as long as they are transformed to points inside the mask associated with the other set.To reduce the computational cost of the distance and, in addition, to enable random iterative sampling, we propose an approximate version of d R AMD : Definition 5 (Subsampled average minimal distance for image registration).Given fuzzy set A on reference set X A , fuzzy set B on X B , weight functions w A : X A → R ≥0 and w B : B → R ≥0 , and crisp subsets (masks) M A , M B ⊂ R n , the Subsampled average minimal distance for image registration between A and B, parameterized by an invertible transformation T : R n → R n , with inverse T −1 , and crisp sets P A ⊆ X A and P B ⊆ X B , is defined as dR Inserting ( 11) or ( 13) as point-to-set distance in (20), and hence indirectly in ( 21) and ( 22), provides extensions of this family of distances to the α-cut-based distances, which we denote d αAMD and d ¯R αAMD .Normalization of the weights of the sampled points, introduced through Def. 1, renders the magnitude of the distance (and subsequently its derivatives) invariant to the size and aggregated weight of the sets or of the sampled subsets.Since the normalization is done separately from A to B and from B to A, both directions are weighted equally even if the total weights of the point subsets from the two sets are different.This normalization can simplify the process of choosing e.g.step-length of various optimization methods, and makes it more likely that default hyper-parameter values can be found and reused across different applications.

B. Registration
We propose to utilize symmetric distances d ¯αAMD and d ¯αAMD as cost functions in (3) to define concrete image registration methods.Inserting d ¯R αAMD into (3) we obtain For the case of subset sampling with sets P A and P B , registration is defined as (24) By selection of new random subsets P A and P B in each iteration, various stochastic gradient descent optimization methods can be realized.
To solve the optimization problems stated in ( 23) and ( 24) with efficient gradient-based optimization methods, the partial derivatives of the distance measures with respect to the transformation parameters of T are required.

C. Gradients
The derivative of ( 9), the crisp point-to-set distance measure d(T (x), S) (in n-dimensional space), with respect to parameters T i of the transformation T applied to a point x ∈ X, yielding y = T (x), can be written as The values ∂d ∂yj are the components (partial derivatives) of the gradient ∇d(y, S) of the point-to-set distance in point y ∈ Y ⊂ R n , and are not dependent on the transformation model.
The gradient of the fuzzy point-to-set distance measure ( 11) is given by the integral over α-cuts, of gradients of the (crisp) point-to-set distances:

D. Algorithms for Digital Images on Rectangular Grids
The distances and gradients can be computed efficiently for the special case of digital images on rectangular grids.For images quantized to ∈ N >0 non-zero discrete α-levels the integrals in (11) and ( 26) are suitably replaced by sums.The number of quantization levels is typically taken to be a small constant; a choice of = 7 non-zero equally spaced α-levels has previously shown to provide a good trade-off between performance, speed and noise-sensitivity [9], and we keep it for all experiments.
We need a discrete operator to approximate the gradient of d(x, S) for a set S defined on a rectangular grid with spacing s ∈ R n >0 .We propose to use the following difference operator providing a discrete approximation of ∇d(x, S) : where γ x is an indicator function, and u i is the unit-vector along the i-th dimension.The indicator function γ x causes the operator ∆[S](x) to be zero-valued for points included in S (i.e., where the distance transform is zero-valued).This prevents discretization issues along set boundaries, where the standard central difference operator yields non-zero gradients, which can cause the measure to overshoot a potential voxel-perfect overlap.
By creating tables for the distance and gradient sums for each image as a pre-processing step, using either of the procedures in Alg. 1 (∆αDT for inwards distances and ∆αDT BD for bidirectional distances), the distance and gradient may then be readily computed with Alg. 2. |T | denotes the number of parameters of the transformation, which is 6 for twodimensional (2D) affine transformations, and 12 for threedimensional (3D) affine transformations.
for i = 1 to do 4: Compute DT of α-cut.

5:
DT i ← min(DT i , d MAX ) 6: for all v ∈ X A do 7: end for 10: end for return D, G 12: end procedure for all v ∈ X A do 18: 6: end for 9: else return 0, 0, 0 Zero dist., grad., and weight. 12: end if 13: end procedure The procedures in Alg. 1 have linear run-time complexity O(( + 1) |X A |), achieved by using a linear-time algorithm for computing the distance transform (e.g.[24]) in line 4 of Alg. 1.The space complexity of the algorithm is O(( + 1) |X A |) and the D, G structures must remain in memory to enable fast lookup in Alg. 2. Figure 3 shows an example of the distance and gradient of a sample α-level.Alg. 2 computes the pointto-set distance and gradient w.r.t. the transformation using the pre-computed tables and has run-time complexity O(|T | n) thus being independent of and the size of A and B.
Output final distance value.9: T ← T Output final transformation estimate.
Algorithm 3 performs a complete registration given two images, their binary masks, weight functions, and an initial transformation.Algorithm 3 completes N full iterations, however other termination criteria may be beneficial (see Sec. IV).The registration consists of pre-processing, followed by a loop where the symmetric set distance and derivatives are computed and T is updated.∆T −1 ∆T , in line 6 of Alg. 3 denotes a matrix of partial derivatives of the parameters of the inverse transform T −1 w.r.t. the parameters of the forward transform T .This matrix relates the computed partial derivatives ∆d2 ∆T −1 with the parametrization of the forward transform.
The overall run-time complexity is Practical choices for N tend to be in the range [1000, 3000], depending on hyper-parameters (e.g.λ), and distance in parameter-space between starting position and the global optimum.The evaluation in Sec.V confirms empirically that convergence, according to (31) or (32), tends to be reached after 1000 to 3000 iterations, using an optimizer with a decaying λ.
The QUANTIZE procedure in Alg. 2 takes the membership of point v, µ A (v), and gives the index i of the minimal αlevel (α 1 , . . ., α ) for which µ A (v) ≥ α i .If the membership is below all α-levels, the index is 0. For equally spaced α-levels, the quantization can be expressed as The INTERPOLATE procedure in Alg. 2 computes the value of the discrete functions D and G in point v which may not be on the grid due to application of T .There are many interpolation schemes proposed in the literature, but  we suggest that either nearest neighbor (for maximal speed) or linear interpolation (for higher accuracy) are used here, since the distance and gradient fields are smooth.By linearity of integration and summation, nearest neighbor and linear interpolation may be performed on the pre-processed D and G and yield the same result as if each level was interpolated before integration, allowing efficient computation.The (discretized) measure does not require intensity interpolation; the interpolation operates on distances and gradients only.

IV. IMPLEMENTATION
We implemented the proposed distance measure and registration method in the open-source Insight Segmentation and Registration Toolkit (ITK) [12].We chose this particular software framework because it • enables the use of an existing optimization framework, • allows for a fair comparison against well written, tested, and widely used implementations of reference similarity measures, with support for resolution-pyramids, • supports anisotropic/scaled voxels in relevant algorithms, • facilitates reproducible evaluation, • makes the proposed measure easily accessible for others.The built-in ITK optimizer we have used for the registration tools and all the evaluation is RegularStepGradientDescen-tOptimizerv4.This is an optimizer based on gradient descent, with an initial step-length λ, and a relaxation factor which reduces the used step-length gradually as the direction of the gradient changes, in order to enable convergence with high accuracy.In addition to a maximum number of iterations N , two termination criteria are used: (i) a gradient magnitude threshold (GMT), and (ii) a minimum step-length (MSL), where r is the current relaxation coefficient.We use default values of 0.0001 for both of these criteria.A relaxation factor of 0.99 is used for all experiments, since it performed well in preliminary tests; in this study we are willing to trade some (potential) additional iterations for better robustness.To maximize utilization of the limited number of α-levels, images are normalized before registration to make sure that they are within the valid [0, 1] interval.We use the following robust normalization technique: Let P ρ (X) denote the ρ-th percentile of image X with respect to image intensities, NORM ρ (X) = max 0, min 1, (X − P ρ (X)) P 1−ρ (X) − P ρ (X) . (33)

V. PERFORMANCE ANALYSIS
We evaluate performance of the proposed method, both for 2D and 3D images, in two different scenarios; (i) we perform a statistical study on synthetically generated images, where we seek to recover known transformations and measure the registration error by comparing the ground truth locations of known landmarks with the corresponding registered ones; (ii) we apply the proposed framework to real image analysis tasks: landmark-based evaluation of registration of TEM images in 2D, and atlas-based segmentation evaluation of 3D MR images of brain.
To compare the proposed measure and registration method against the most commonly used alternative methods and similarity measures, we select the widely used ITK implementations of optimization framework and similarity measures (SSD, PCC and MI) as the baseline of intensity-based registration accuracy.Note that the PCC measure is denoted Normalized Cross Correlation (NCC) in the ITK framework.
All experiments are performed on a workstation with a 6core Intel i7-6800K, 3.4GHz processor with 48GB of RAM and 15MB cache.The operating system used is Ubuntu 16.04 LTS.The compiler used to build the framework is g++ version 5.4.0 (20160609).Version 4.9 of the ITK-framework is used for testing and evaluation.

A. Datasets
One biomedical 2D dataset and one medical 3D dataset are used for the evaluation.
1) TEM Images of Cilia (2D): The dataset of 20 images of cilia [10] is acquired with the MiniTEM4 imaging system.Each image is isotropic of size 129 × 129 pixels, with a pixelsize of a few nm.An example is shown in Fig. 1.A particular challenge is the near-rotational symmetry of the object: 9 pairs of rings are located around a central pair of rings, which gives 9 plausible solutions for a registration problem.The alignment of the central pair can be taken into special consideration to reduce the number of solutions to two.The dataset comes with a set of 20 landmarks per image, indicating the position of each of the relevant structures to be detected and analysed -20 rings (2 in the center and 18 in a circle around the center).The landmarks are produced by a domain expert and are only used for evaluation of the registrations.
2) LPBA40 (3D): LPBA40 [11] is a publicly available dataset of 40 3D images of brains of a diverse set of healthy individuals, acquired with MRI.The images are anisotropic, of size 256 × 124 × 256 voxels with voxel-size 0.86 × 1.5 × 0.86 mm 3 .The dataset comes with segmentations of the brains into 56 distinct regions marked by a medical expert, which are used in this study as ground-truth for evaluation.LPBA40 includes two atlases: first 20 out of 40 MR images of brain in the dataset are used to generate one brain atlas by Symmetric Groupwise Normalization (SyGN) [25]; another atlas is created analogously, from the last 20 brains in the dataset.The atlases contain both a synthesized MR image and the fused label category in all the voxels, as well as a whole brain mask which may be used for brain extraction.

B. Evaluation Criteria
We evaluate accuracy and robustness of the registration methods in presence of noise, their robustness w.r.t.change of roles of reference and floating image (symmetry), and their speed.We quantify the performance of the observed frameworks in terms of the following quality measures: 1) Average Error measure (AE): The registration result is quantified as the mean Euclidean distance between the sets of corresponding image corner landmarks L R and T (L F ) in the reference image space, after transformation of the floating image corner landmarks L F , where |L R | is the number of landmarks (4 in 2D; 8 in 3D).The quality measure is defined as A slight variation of this measure, the Average Minimal Error (AME), is used in the real task of cilia registration: For the central pair, the error is simply AME CP = AME, whereas for the outer rings we utilize the knowledge that an odd (even) landmark should be matched with an odd (even) landmark of the other image.The error function for the outer rings, [10], is therefore defined as: 2) Success Rate (SR): A registration is considered successful if its AE is below one voxel(pixel).Success rate (SR) at a given AE value corresponds to the ratio of successful registrations (w.r.t. the set of performed ones).
3) Symmetric Success Rate (SymSR): is defined as the ratio of performed registrations which are successful (i.e., AE ≤ 1) in both directions, i.e., when the roles of reference and floating image are exchanged.
4) Inverse Consistency Error (ICE), [26]: Given a set of interest X A ⊆ A, the transformations T AB : A → B, and T BA : B → A, the ICE of this pair of transformations is (37) We compute ICE considering all the points of the reference image for each of the cases where Symmetric Success is observed (AE ≤ 1 in both directions).
5) Jaccard Index for segmentation evaluation: For two binary sets, R 1 and R 2 , the Jaccard Index is defined as 6) Execution Time: We evaluate (i) the execution times required for one iteration in the registration procedure, i.e., times needed to compute the distance (similarity) measure and its derivatives, with full sampling, and in full image resolution, between two distinct images from the same set, as well as (ii) the execution time for complete registrations.

C. Parameter Tuning
The distance measure and optimization method have a number of parameters which must be properly chosen.Synthetic tests indicated that the following values lead to good optimization performance: three pyramid levels with downsampling factors (4, 2, 1) and Gaussian smoothing σ = (5.0,3.0, 0.0), max 3000 iterations per level and an initial step-length λ = 0.5.The number of α-levels used is = 7. Normalization percentile is normally 5%.This same parameter setting, if not stated differently, is used in all the tests, on both synthetic and real data.

D. Synthetic Tests
A synthetic evaluation framework is used to evaluate the performance of the proposed method, and to compare it with standard tools based on SSD, PCC, and MI, in a controlled environment.For this evaluation, we construct sets of transformed versions of a reference image and add (a new instance of) Gaussian noise to each generated image.The transformations are selected at random from a multivariate uniform distribution of rotations measured in degrees (1 angle for 2D images and 3 Euler angles for 3D images) and translations measured in fractions of the original image size.
1) 2D TEM Images of Cilia: Three sets of transformed images are built based on image Nr. 1 in the observed dataset, by applying on it the following three groups of transformations: Small, containing compositions of translations of up to 10% of image size (in any direction) and rotations by up to 10 • ; Medium, containing compositions of translations and rotations such that at least one of the parameters exceeds the range of Small, and falls within 10 − 20% of image size of translation (in at least one direction), or 10 − 20 • of rotation; and Large, containing compositions of translations and rotations such that at least one of the parameters exceeds the range of Medium, and falls within 20 − 30% of image size of translation (in at least one direction), or 20 − 30 • of rotation.The transformed images are also corrupted by additive Gaussian noise, from N (0, 0.1 2 ) (σ = 0.1, corresponding to a PSNR≈20 dB).Each group of transformations is applied 1000 times, and the resulting images are registered to image Nr. 1, each time corrupted by a new instance of Gaussian noise.
To evaluate symmetry, we performed 1000 registrations of images transformed by randomly selected translations of up to 30% of image size, and rotations by up to 30 • , and corrupted by additive noise from N (0, 0.1 2 ).Each of the registrations were performed twice, with exchanged roles of reference image and floating image.
Intensity-based registration with gradient-descent optimization can be computationally demanding, requiring the distance function and its derivative for each iteration of the optimization procedure.The time to compute the distance and derivatives is directly proportional to the number of sampled points.We, therefore, evaluate influence of the sampling fraction on registration success, observing registrations after Small transformations and added noise (with σ = 0.1), over a range of sampling fractions.For each evaluated sampling fraction, 1000 registrations are performed and SR and AE are computed for successful registrations (AE ≤ 1).No resolution pyramids are used for these tests.
2) 3D MR Images of Brain: Three sets of transformed images are built based on image Nr. 1 in the observed dataset, by applying to it the following three groups of transformations: Small, containing compositions of translations of up to 10% of image size (in any direction) and rotations by up to 10 • (around each of the rotation axes); Medium, containing compositions of translations and rotations such that at least one of the parameters exceeds the range of Small, and falls within 10 − 15% of image size of translation (in at least one direction), or 10 − 15 • of rotation (around at least one rotation axes); and Large, containing compositions of translations and rotations such that at least one of the parameters exceeds the range of Medium, and falls within 15 − 20% of image size of translation (in at least one direction), or 15 − 20 • of rotation (around at least one rotation axes).The transformed images are also corrupted by additive Gaussian noise, from N (0, 0.1 2 ).Each group of transformations is applied 200 times, and the resulting images are registered to image Nr. 1, each time corrupted by a new instance of Gaussian noise.

E. Results of Synthetic Tests
1) 2D TEM Images of Cilia: Figure 4 shows the distributions of registration errors (AE), for the three transformation classes.Superiority of the proposed measure, and the corresponding registration framework, is particularly clear for Medium and Large transformations; it reaches a 100% success rate, with subpixel accuracy, whereas the competitors not only exhibit considerably lower accuracy, but also much lower success rate, i.e., they completely fail in a large number of cases.
Overall registration performance is summarized in Table I, for complete sampling (a), and for random sampling of 10% of the points (b).The proposed method has 100% success rate and also 100% symmetric success rate.The other observed measures exhibit much lower success rate and poor symmetry scores; the second best, SSD, succeeds in 54% of the cases, and succeeds symmetrically in only 31% of the cases.The registration error for successful registrations is considerably smaller for the proposed method, while the execution time is considerably lower.The reduced sampling fraction in (b) has a small impact on the proposed method while substantially degrading the performance of the other measures.
Figure 5 shows registration performance for varying sampling fractions; Small transformations, in presence of noise (σ = 0.1) are considered.We observe that the registration performance flattens and stabilizes at approximately 0.01 sampling fraction (1% of the points).We conclude that previous findings of [15], [23], suggesting that random subsampling provides good performance even with very small sampling fractions, apply well for the proposed measure.
2) 3D MR Images of Brain: Figure 6 shows the observed distributions of registration errors (AE) for the three transformation classes, and clearly confirms that the proposed method is robust and with high performance, even for larger transformations, while the magnitude of the transformation has a substantial negative effect on the performance of the other observed measures.
Figure 7 presents bar plots corresponding to the performed synthetic tests on the LPBA40-dataset, consisting of 200 registrations of images after up to (and including) Large transformations (with additive Gaussian noise, N (0, 0.1 2 )).Successful registrations (AE ≤ 1) are observed.Here as well, the proposed method delivers 100% success rate, whereas the second best, SSD, succeeds in only 33% of the cases.The registration error for successful registrations is the smallest for the proposed method.We observe a relative increase in execution time of the proposed registration framework in 3D case, where it is slightly slower than the other measures.
3) Execution Time Analysis: The number of iterations required for convergence of the optimization (registration) typically range from 1000 to 3000.Measures SSD, PCC and MI use cubic spline interpolation.Lookups from the distance maps for d ¯R αAMD are done using linear interpolation.Table II shows the mean (and standard deviation) execution time of one iteration, which includes computation of the measures and their derivatives, repeated 1000 times for 2D, and 50 times for 3D affine image registrations.We observe that the proposed measure is the fastest per iteration both in 2D and 3D.Note that these execution time measurements exclude preprocessing.

F. Evaluation on Real Applications
1) Registration of Cilia: Registration of multiple cilia instances detected in a single TEM sample, for enhancement of diagnostically relevant sub-structures, requires a pixel-accurate and robust method which is able to overcome the challenges posed by the near-rotational symmetry of a cilium.At most two of the possible solutions properly align the central pair, which is vital for a successful reconstruction.
We compare the performance of the proposed method with reported results of a previous study [10] which uses intensitybased registration with PCC as similarity measure.We follow the general protocol described in [10] and perform, as a first step, a multi-start rigid registration (parameterized by angle θ in radians, and translation t = (t x , t y )), followed, in a second step, by affine registration initiated by the best (lowest final distance) registration of the 9 rigid ones.
No resolution pyramids are used since they were observed to interfere with the multi-start approach (by facilitating large movements).The registrations are performed in full resolution, without stochastic subsampling.For the rigid registration we use a small circular binary mask with radius of 24 pixels, positioned in the center, combined with a squared circular Hann window function.The affine registration is performed using a circular binary mask with radius of 52 pixels; the mask removes the outside background and the outer plasma which is not helpful in guiding the registration.No additional weight-mask is used for the affine registration.
Step length 0.1 was used for the rigid and 0.5 for the affine registration.We use = 7. Normalization percentile is set to 0% for the rigid stage and 1% for the affine stage.
2) Atlas-based Segmentation (LPBA40): In [27], a protocol for evaluation of distance/similarity measures in the context of image registration was proposed.The protocol starts with affine registration, for which results are reported, and then proceeds to deformable registration.Since this study focuses on the development of an affine (linear) registration framework based on the proposed distance measure, we compare with the reported affine-only performance; an improved affine registration is of great significance since a very high correlation between the performance of the affine registration and that of the subsequent deformable registration has been established.
We start from the two atlases created utilizing the Advanced Neuroimaging Tools (ANTs) registration software suite and the open-source evaluation script provided in the reference study [27].We utilize the atlas created using Mutual Information since that is the one found in [27] to be best performing and is used as the basis for the whole deformable registration study.Two-fold cross validation is utilized; the first atlas is registered to the last 20 brain images and the second atlas is registered to the first 20 brains, hence all registrations are done with brains that did not contribute to the creation of the atlas.
The multi-label segmentations defined by the atlas are transformed using the transformation parameters found during the registration and compared to the ground-truth segmentations for each brain.The Jaccard Index [28] is calculated per region, as well as for the entire brain mask.
For the proposed method based on d ¯R αAMD we use = 7, normalization percentiles 5%, N = 3000, 0.05 sampling fraction, and circular Hann windows as weight-masks.

G. Results of Real Applications
1) Results of Registration of Cilia: Performance of the proposed method, together with the best previously published results, are shown in Tab.III.The table shows the mean and standard deviation of registration error (AME, in pixels) of the 19 registrations, for the three considered sets of landmarks: the Central pair, the Outer rings, and All (1+9) ring pairs.The original study includes deformable registration as a final stage, after the rigid and affine steps.Here presented framework based on d ¯R αAMD includes linear (rigid and affine), but not deformable registration.However, as results included in Tab.III confirm, the proposed method outperforms the previous state-of-the-art, even if using only rigid and affine registrations.
We note that with only rigid registration we improve the alignment of the central pair while degrading the alignment of the outer rings.After the affine registration, the alignment of the central pair is improved further, plausibly due to the less constrained transformation model of affine compared to rigid, and we observe that the alignment of the outer rings and the total alignment are improved substantially.
2) Results of Atlas-Based Segmentation of Brains: Table IV shows results of atlas-based brain segmentation.The Mean Jaccard Index computed for each of the brain regions, for d ¯R αAMD and MI, with affine registration as reported in [27].For d ¯R αAMD , mean and std.dev.are displayed; for the comparative results (MIAff, [27]), only mean was reported.
We observe that for the whole brain mask, for the aggregated overlap, and for 43 out of the 56 distinct regions, the proposed measure outperforms the reported performance obtained with the MI metric; MI was the best performing measure out of the three evaluated in [27].

VI. DISCUSSION
Compared to the traditional similarity measures (SSD, PCC, MI), the proposed measure and associated registration method require substantial amounts of memory to store the auxiliary data-structures.A single 3D registration of two MR images of brains may require approximately 4GB of working memory with a reasonable set of parameters; contemporary machines for high-end data processing typically have a lot more memory than 4GB, but this requirement can affect how many registrations can be performed in parallel on a single machine.

VII. CONCLUSION
In this study we have adapted a family of distance measures [9] to gradient descent based image registration, for 2D and 3D images.We have shown that such an extension is feasible and that the very good performance of the measures observed previously for object recognition and template matching, and their property of a large catchment basin for local optimization, also hold in the context of registration.This has been shown by evaluating the method in four main ways: (i) on synthetic tests,

LPBA40 Label d ¯R αAMD
MIAff [27] brain images.The performance observed in this study shows that the proposed method provides outstanding performance for intensity-based affine registration, in terms of robustness, accuracy and symmetry.It is also faster or similar in speed to the commonly used measures, which allows its practical applications.Future work includes extending the measures to non-linear (deformable), as well as multi-modal registration.

Fig. 1 .
Fig.1.Illustrative example of a biomedical registration task where a widely used feature-based (FB) method (SIFT, as implemented in FIJI-plugin Linear Stack Alignment 2 ) fails, while the proposed intensity-based (IB) method (Sec.V-F1) performs well.Green points in (a) and (b) are incorrectly detected as having a match, and red points do not have a match.The feature-extractor fails to detect points corresponding to the relevant structures (one approximately correct match, indicated with arrows, can be found manually), and both the central rings and the outer rings are misaligned.
(a) A with weights w A (radius).(b) B with mask M B (surrounding rectangle).(c) A mapped into the space of B. Triangles mark points mapped outside M B , and thus discarded.(d) Distance contribution graph for d ¯α(T (A(x)), B); where x is central point of A.
Inwards part of d ¯α: d α (A(x), B) point-to-set distance.

Fig. 2 .
Fig. 2. Illustration of the Asymmetric average minimal distance for image registration.(a) Source set A (radius represents associated weight; gray-level represents membership).(b) Target set B with associated mask M B .(c) The transformed (by rotation and translation) A on top of B. (d) Illustration of the contributions to the point-to-set distance d ¯α by the central point of A. Thickness of lines show the α-integrated height contributed by each point in B. (e-f) The inwards and complement parts of d ¯α visualized as 1D graphs, where the x-axis is the Euclidean distance (in 2D) from the mid-point and the points at the left and right side (of the origin) respectively are the points on the left and right side of the mid-point T (A(x)) in (d).

Algorithm 2
Point-to-Set Distance and its Gradient w.r.t.T Input: Fuzzy point A(v) and associated weight w A (v), fuzzy set B and associated (binary) mask M B , distances D[0 . . .] and gradients G[0 . . .]; transformation T .Output: Point-to-set distance d, derivatives ∆d ∆T , weight w.

Algorithm 3
Symmetric Registration Input: Fuzzy sets A, B, with binary masks M A , M B , and weight functions w A , w B , initial transformation guess T .No. of α-levels , distance saturation d MAX , step-lengths λ[1 . . .N ], and iteration count N are hyper-parameters.Output: Symmetric set distance d (d ¯R αAMD ), estimated transformation T .1:

Fig. 4 .
Fig. 4. Registration error for 2D TEM images of cilia with Gaussian noise of σ = 0.1 added, for three observed transformation classes.(a-d) Examples of reference-floating image pair with corresponding masks.(e-g) Cumulative histograms of the fraction of registrations with registration error AE below a given value (left and up is better).The red vertical line shows the chosen threshold for success, AE ≤ 1.

Fig. 5 .
Fig. 5. (Left/Blue) SR for registrations of cilia images, and (Right/Red) AE of the successful registrations, as functions of sampling fraction for the proposed method.Both measures improve (almost) monotonically with sampling fraction and flatten out after approximately 0.01.

Fig. 6 .Fig. 7 .
Fig. 6.Registration error for 3D MR images of brain with Gaussian noise, σ = 0.1, added, for three observed transformation magnitudes classes.(a-f) Example of reference-floating image pair slices along each major axis.(g-i) Cumulative histograms of the fraction of registrations with registration error AE below a given value (left and up is better).The red vertical line shows the chosen threshold of success, AE ≤ 1.

TABLE I REGISTRATION
OF SYNTHETIC 2D IMAGES OF CILIA.THE TABLES SHOW SUCCESS RATE (SR), AVERAGE ERROR (AE) OF SUCCESSFUL REGISTRATIONS, SYMMETRIC SUCCESS RATE (SYMSR), AVERAGE INVERSE CONSISTENCY ERROR (ICE) AND AVERAGE RUNTIME FOR THE REGISTRATION WITH COMPLETE SAMPLING (A) AND WITH RANDOM SUBSAMPLING (B).SUCCESSFUL REGISTRATIONS (AE ≤ 1) OF TRANSFORMATIONS UP TO (AND INCLUDING) LARGE, ARE CONSIDERED.

TABLE II TIME
ANALYSIS OF DISTANCE (SIMILARITY) VALUE AND DERIVATIVE COMPUTATIONS FOR A FULL RESOLUTION IMAGE, REPEATED TO GENERATE STATISTICS.BOLD MARKS THE FASTEST MEASURE IN EACH CATEGORY (2D AND 3D).THE 2D IMAGES ARE OF SIZE 1600 × 1278, AND THE 3D IMAGES ARE OF SIZE 256 × 124 × 256.

TABLE III REGISTRATION
OF CILIA: PERFORMANCE OF THE PROPOSED METHOD COMPARED TO REFERENCE RESULTS, SHOWN AS THE 'MEAN (STD-DEV)' OF THE REGISTRATION ERROR (IN PIXELS) W.R.T. THE CONSIDERED SETS OF LANDMARKS FOR THE 19 REGISTRATIONS.'R' DENOTES RIGID, 'A' DENOTES AFFINE AND 'D' DENOTES DEFORMABLE REGISTRATION.BOLD MARKS THE SMALLEST ERROR FOR EACH SET OF LANDMARKS.

TABLE IV RESULTS
OF ATLAS-BASED BRAIN SEGMENTATION.THETABLE SHOWS THE MEAN JACCARD INDEX FOR EACH OF THE BRAIN REGIONS FOR d ¯R αAMD AND MUTUAL INFORMATION WITH AFFINE REGISTRATION AS REPORTED IN [27].FOR d ¯R αAMD , MEAN AND STD.DEV.ARE DISPLAYED; FOR THE COMPARATIVE RESULTS (MIAFF), ONLY MEAN WAS REPORTED.