Image Deformation Estimation via Multi-Objective Optimization

The free-form deformation model can represent a wide range of non-rigid deformations by manipulating a control point lattice over the image. However, due to a large number of parameters, it is challenging to fit the free-form deformation model directly to the deformed image for deformation estimation because of the complexity of the fitness landscape. In this paper, we cast the registration task as a multi-objective optimization problem (MOP) according to the fact that regions affected by each control point overlap with each other. Specifically, by partitioning the template image into several regions and measuring the similarity of each region independently, multiple objectives are built and deformation estimation can thus be realized by solving the MOP with off-the-shelf multi-objective evolutionary algorithms (MOEAs). In addition, a coarse-to-fine strategy is realized by image pyramid combined with control point mesh subdivision. Specifically, the optimized candidate solutions of the current image level are inherited by the next level, which increases the ability to deal with large deformation. Also, a post-processing procedure is proposed to generate a single output utilizing the Pareto optimal solutions. Comparative experiments on both synthetic and real-world images show the effectiveness and usefulness of our deformation estimation method.


Image Deformation Estimation via Multiobjective Optimization Takumi Nakane, Haoran Xie, and Chao Zhang
Abstract-The free-form deformation model can represent a wide range of non-rigid deformations by manipulating a control point lattice over the image. However, due to a large number of parameters, it is challenging to fit the free-form deformation model directly to the deformed image for deformation estimation because of the complexity of the fitness landscape. In this paper, we cast the registration task as a multi-objective optimization problem (MOP) according to the fact that regions affected by each control point overlap with each other. Specifically, by partitioning the template image into several regions and measuring the similarity of each region independently, multiple objectives are built and deformation estimation can thus be realized by solving the MOP with off-the-shelf multi-objective evolutionary algorithms (MOEAs). In addition, a coarse-to-fine strategy is realized by image pyramid combined with control point mesh subdivision. Specifically, the optimized candidate solutions of the current image level are inherited by the next level, which increases the ability to deal with large deformation. Also, a postprocessing procedure is proposed to generate a single output utilizing the Pareto optimal solutions. Comparative experiments on both synthetic and real-world images show the effectiveness and usefulness of our deformation estimation method.
Index Terms-Genetic algorithm, image deformation estimation, image registration, multi-objective evolutionary algorithm.

I. I
Estimation of the deformation parameters of a target (either objects or texture) is a fundamental technique mainly for computer vision applications such as registration [1]- [3] and tracking [4]- [6]. If the geometric deformation model is constrained to only rotation and translation then the deformation is rigid. Affine or projective transformations can express more complex deformation, while practical applications, such as medical image analysis [7], [8] and morphing [9], [10], often involve non-rigid deformation with more degrees of freedom. Furthermore, the deformation between two images can be global, local, and even space-invariant, which makes the problem more challenging because the movement vector of each pixel is required to be estimated independently while preserving the smoothness. This technique can estimate the deformation parameters by deforming a template image such This work was supported in part by the Japan Society for the Promotion of Science (JSPS) KAKENHI under Grant JP20K19568, and in part by the Support Center for Advanced Telecommunications Technology Research (SCAT) Grant.
Takumi Nakane is with the Department of Engineering, the University of Fukui, Fukui 910-8507, Japan.
Haoran Xie is with the Japan Advanced Institute of Science and Technology, Nomi 923-1211, Japan.
Chao Zhang is with the Department of Engineering, the University of Fukui, Fukui 910-8507, Japan (e-mail: zhang@u-fukui.ac.jp). that the similarity between the deformed template and a target image is maximized. The procedure of similarity maximization can be cast as an optimization problem by treating deformation parameters as decision variables and the similarity as the objective function.
As to the geometric deformation model, free-form deformation (FFD) [11] models the deformation by manipulating control points arranged in a regular lattice over the target. Each pixel moves based on weights by basis functions and displacements of surrounding control points. B-spline basis functions are generally used to weigh displacements [12]. The larger the number of control points is, the more finely the deformation can be modeled. In FFD, the influence of a control point is limited to neighbor pixels, which brings benefits with respect to ability in modeling and computational cost [13]. Due to these characteristics, FFD is allowed to model highly free and subtle deformation. However, optimization with FFD's parameters is challenging since an optimizer needs to treat the displacements of all control points as decision variables, and it is obvious that the expressive power of the deformation model is proportional to the number of parameters. Moreover, because each control point can affect multiple regions, an improvement of similarity in one region may negatively make similarity in other regions worse, i.e., there exist conflicts between regions.
To alleviate the above issues, we introduce a new idea to estimate the FFD parameters by casting the deformation estimation problem as a multi-objective optimization problem (MOP), which can be effectively solved by multi-objective evolutionary algorithms (MOEAs). The overview of our algorithm is shown in Fig. 1. A template is spatially divided into several groups and the similarity measure over each group is treated as a single-objective function and independently computed. Each group consists of patches, and the pixels in each patch are affected by the same control points. A MOP requires simultaneous optimization of two or more objectives that conflict with each other. In our problem setting, we aim to find Pareto optimal solutions given none of the groups can be improved without degrading some of the other groups, which can be solved by certain off-the-shelf MOEAs. In addition, we adopt a coarse-to-fine strategy using image pyramids to improve the estimation capability, especially for large deformations. Specifically, the optimization starts at the top of the pyramid (i.e., the lowest resolution image) and is executed at each level of the pyramid. The number of control points is gradually increased as the resolution increases, and the interpolation of control points is realized by mesh subdivision to allow fine-grained deformation. Also, a post-processing method is  proposed to integrate Pareto optimal solutions into a single output as the decision-making procedure. For each group in Fig. 1, the group-wise deformation parameters with the highest group-wise similarity are adopted. These group-wise deformation parameters are aggregated into a final solution. We perform comparative experiments using both synthetic and real-world data to show the effectiveness and usefulness of our method. In conclusion, our contributions are threefold.
• The deformation estimation problem is cast as a MOP by spatially dividing an image into multiple groups accompanied by independent similarity measures. • The estimation capability is improved by a coarse-to-fine strategy, which is realized by building image pyramids and conducting mesh subdivisions at each level. • A post-processing method is proposed to integrate Pareto optimal solutions into a final output. The rest of this paper is organized as follows. We present related works of deformation estimation and MOEAs in Sec. II followed by a brief review of FFD model in Sec. III. The overview of three off-the-shelf genetic algorithms (GAs) are given in Sec. IV. In Sec. V, we describe the details of the spatial multi-objective problem and the coarse-to-fine optimization strategy. The experimental results are shown in Sec. VI. Conclusion is given in Sec. VII.

A. Deformation Estimation Between Two Images
Many methods have been proposed to deal with deformable surfaces, which can be roughly categorized as feature-based methods and pixel-based methods. The former category estimates deformation parameters by using feature correspondences commonly extracted from two images. The latter cat-egory maximizes the similarity calculated using dense pixels directly. Also, hybrid methods have been studied to incorporate the advantages of both approaches [14]- [16].
Feature-based methods [5], [17]- [20] estimated deformation parameters based on the correspondences between feature points between the template image and the target image. The accuracy largely depends on the quality of the correspondences. Therefore, the elimination of outliers from the extracted feature set is an essential process. However, the large number of parameters in free-form deformation makes it difficult to apply standard methods such as RANSAC [19]. Other limitations are: 1) In the case of feature-less images, feature points are hard to be detected. Without inlier correspondences, the parameters cannot be appropriately estimated. Especially in the case of non-rigid transformations, more corresponding points are required [21]. 2) Local features such as SIFT [22] and ASIFT [23] are susceptible to complex transformation, which may largely degrade the confidence of correspondences when complex transformation occurs.
The purpose of pixel-based methods [1], [4], [21], [24]- [28] is to solve the minimization problem of the cost function consisting of a data term and some restrictions such as smoothness term calculated from pixel intensities. The data term is usually defined as the sum of the intensity differences between the pixels of the template image and the corresponding pixels in the deformed target image. Such methods are less dependent on image features compared to feature-based methods. In addition, the capability of dealing with self-occlusion is a notable point. Since only a few features typically exist near the self-occlusion boundary, the pixel-based methods are more reasonable in such cases [1], [15]. In [1], a penalty term called shrinker is incorporated into the cost function. The shrinker term acts to shrink the displacement in order to make self-occluded areas disappear. [15] employed a pixelbased approach to refine the deformation parameters given by a proposed feature-based method. When self-occlusion or strong deformations are involved, the hybrid method shows better results than only using the feature-based method. There also exist researches to maximize the similarity under the framework of evolutionary computation with a single-objective [27], [28]. There also exist methods achieving the minimization of the cost function by employing non-linear least squares solvers, such as the Gauss-Newton algorithm [1], [15], [21], the Levenberg-Marquardt algorithm [24], [26], and the learningbased methods [4]. To the best of our knowledge, exploiting evolutionary algorithms [29] or multi-objective optimization approaches [30], [31] to deal with deformable surfaces have been sparsely treated so far. Our previous work appearing in GECCO2019 addressed this problem by using a modified single-objective GA [32]. Different from the previous work, in this paper, we attempt to adopt evolutionary algorithms for solving this problem by casting it as a multi-objective optimization problem.

B. Multiobjective Evolutionary Algorithms (MOEAs)
EAs are optimization algorithms inspired by Darwin's evolutionary theory, such as the GA, evolutionary strategy, and evolutionary programming. These algorithms share a common framework in which many candidate solutions are simultaneously dealt with and stochastic operations are iteratively applied. Because of the powerful exploration capability, EAs have been applied in a variety of computer vision tasks. Interested readers can also refer to the survey [33]. EAs are also effective tools for solving MOPs. The population-based search procedure provides the advantage of finding the Pareto optimal solutions in a single run. MOEAs use dominance relation to rank solutions in an objective space consisting of conflicting objectives. In particular, representative MOEAs, such as non-dominated sorting GA-II (NSGA-II) [34], strength Pareto EA2 (SPEA2) [35], and Pareto enveloped based selection algorithm-II (PEAS-II) [36], include a mechanism that preserves non-dominated solutions in every generation, called elitism, and hence these algorithms can outperform non-elitist MOEAs by preventing the loss of good solutions [37]. Since the goal of MOEAs is to provide solutions that are widely distributed on the Pareto front, MOEAs are also required to maintain the diversity of solutions. In NSGA-II, a crowding distance was proposed which is the sum of the distances between the two nearest solutions for each objective. SPEA2 used the inverse of the distance to the k-th nearest solution as the density. PEAS-II divided the objective space into several hyperboxes and counted the number of solutions within them. The density was assigned to each hyperbox as the number of solutions contained. On the other hand, MOEAs are less effective for problems with four or more objectives, i.e., manyobjective optimization problems (MaOPs). The main reason is that as the number of objectives increases, the condition of dominance becomes more complex. More objectives lead to a greater proportion of non-dominated solutions, and hence the ability of convergence toward the Pareto front decreases [38]. There are several strategies to adapt MOEAs to MaOPs [39], such as dimensionality reduction [40], [41] and use of indicators [42], [43]. Among them, one representative strategy is via decomposition, e.g., MOEA based on decomposition (MOEA/D) [44], where a MaOP was decomposed into singleobjective sub-problems using weighting vectors, and referencepoint based many-objective NSGA-II (NSGA-III) [45], where an objective space was divided by reference vectors. There also exist various improved versions of MOEA/D [46], [47] and NSGA-III [48], [49]. Combination of both merits is also shown in [50].

III. D M
Deformation estimation is achieved by registering the template image to the target image . In order to deform , we employ FFD combined with cubic B-splines using control point meshes. For of × pixels, control points are arranged on the × lattice with horizontal spacing = /( −3) and vertical spacing = /( − 3) (i.e., the outmost control points are outside the region of ), as illustrated in Fig. 2. Each control point is assigned a displacement vector representing the distance and direction from the initial position, and the movement of a certain coordinate = ( , ) on is determined by surrounding control points, which is defined as: where = / , = / , = / − , = / − , and , are the cubic B-spline basis functions [51]. Then, the pixel coordinate on corresponding to is given by the transformation function : Since the transformation using Eq. 2 is a forward warping procedure and consists of real numbers, there is a problem that rounding operation is necessary to obtain pixel intensities. Such a process results in a large number of "holes" in the deformed template image, as illustrated in Fig. 3a and Fig. 3b. An alternative is to employ backward warping that corresponding to is computed using the inverse transformation −1 and interpolation scheme can be used to obtain the pixel intensity. According to [52], −1 can be defined using an approximation: The deformation obtained in Eq. 3 and its difference from Eq. 2 are illustrated in Fig. 3c and Fig. 3d, respectively. In conclusion, Eq. 1 is employed for modeling geometric deformation and Eq. 3 is employed as the image transformation for registration in this paper.
To further explain how we use the multi-objective optimizer in Fig. 1 to solve the optimization problem, we provide a short review on GAs in this section. GA is one of the leading algorithms in nature-inspired optimization methods. For a population consisting of a number of candidate solutions, the GA gradually optimizes the by iteratively applying genetic operators. One of the unique features of the GA is genotype representations for candidate solutions. Each candidate solution, called individual , is encoded into an internal representation, such as a bit string or a real-valued vector, in order to apply genetic operators. The genotype representations can allow genetic operators to adapt to different problems flexibly.
The genetic operators mainly consist of parents selection, crossover and mutation. The procedure of the simple GA [53] is briefly listed as follows: Step 1:Set = 0 and generate individuals in randomly.
Step 2:Select individuals as parents from with parents selection operator.
Step 3:Generate an offspring population from parents by crossover and mutation operators.
Step 4:Evaluate all individuals in .
Step 6:Set = +1 and return to Step 2 until the termination criterion is satisfied.
Parent selection is typically implemented as probabilistic selection biased by evaluation values (i.e., individuals with better evaluation values are assigned higher probabilities.) The crossover operation generates offspring through the genetic recombination of multiple parents. The mutation operation randomly changes genes of offspring with low probability.
Step 5 is also known as survivors selection, that is, all individuals in and compete in order to become members of +1 according to their evaluation values. The simple GA directly adopts as +1 . Besides, the elitism strategy, which preserves the best individual in the pool = ∪ is often adopted.
NSGA-II [34] is a representative GA-based algorithm for solving MOPs. The key idea of the NSGA-II is to introduce the selection criterion using two sorting approaches. The first one, called fast non-dominated sorting, iteratively extracts a non-dominated set from the population and assigns a rank to each according to the order in which they are extracted. Another one, called crowding distance sorting, determines the priority in by the crowding distance which represents the density of neighboring individuals in the solution space. At last, individuals are selected from consisting of 1 to −1 , where | | ≤ < | | + | |. Insufficient individuals are taken from according to the crowding distance. Fast non-dominated sorting promotes convergence to the Pareto front, while crowding distance sorting maintains diversity on the Pareto front. Moreover, elitism is ensured by using in survivors selection.
NSGA-III [45] is a variant of NSGA-II which focuses on solving MaOPs. Instead of crowding distance sorting, reference lines connecting the origin with reference points evenly distributed on the evaluation value space are used to maintain the diversity of the population on the Pareto front. Each is associated with the closest reference line in the perpendicular distance. After is determined by fast non-dominated sorting in survivors selection, the number of individuals in associated with each reference line, called niche count, is logged. NSGA-III then iteratively selects the individual in which is associated with the reference line of the lowest niche count. Reference points can relieve the algorithm in adaptively maintaining population diversity. In addition, users can obtain only a part of the Pareto front as required by manually distributing reference points. Fig. 4. Overview of the optimization procedure.

V. S M O
As described in Sec. III, the deformation of the template image is determined by the displacement of each control point. Therefore, the purpose of the optimization procedure is to calculate the displacements where the deformed template image matches the target in the target image most. The principal contribution of this work is to cast this task as a MOP by spatially partitioning the template into groups. Each group is assigned a single-objective function of the similarity measure. The overview of the optimization procedure is shown in Fig. 4. The procedure starts from building pyramids for both the template image and the target image, then the optimization is performed with the pyramids in a coarse-to-fine scheme.
The key advantage of this framework is that the population optimized at each level can be inherited as the initial population of the next level. To ensure the consistency of parameter inheritance from low-resolution level to high-resolution level, a subdivision method considering control point mesh is employed to achieve a natural interpolation of additional displacements, which allows the optimized population to be directly inherited.
Final candidate solutions are obtained by optimizing the population at the bottom of the pyramid (i.e., the image in the original resolution). The subsequent effort lies in how to determine a final solution as output from multiple solutions on the Pareto front. In addition to a direct selection approach, a post-processing approach using multiple candidate solutions is also proposed.

A. Optimizing Deformation Parameters via MOEAs
Three GAs described in Sec. IV are employed and compared to optimize the displacements in the experiment. Since each control point can move within the plane, the total number of decision variables is 2 . To represent these variables as genes, we use real-valued coding because each displacement is a 2D vector of real values. For clarity, we denote the displacement of a single point by , = ( , , , , , ) , and an individual is represented by the vector concatenating the displacements of all the points as follows, can directly represent a candidate control point mesh. The initial population is iteratively optimized by genetic operators. As to the evaluation of each , for simplicity, a singleobjective function is firstly introduced, which is combined with the simple GA and compared to multi-objective GAs in the experiment. In the objective function, mean absolute difference with respect to intensities is used as the similarity measure. As introduced in Sec. III, we use backward warping to find the correspondences for the calculation of objective function, hence sampling is performed on the target image. Let Ω and Ω denote the entire region of the template image and the target image , respectively, and we can further define the sampling region ⊆ Ω as: The single-objective function for the simple GA is given by: Based on Eq. 6, spatial multi-objective functions can be easily defined. We first define patch as a region consisting of pixels affected by the shared 4 × 4 control points, i.e., Ω consists of ( − 3) × ( − 3) patches. The group partitioning is achieved by dividing all the patches into groups, where is the number of objective functions. We denote the region of -th group as ⊆ Ω. One objective function is assigned over each group, which can be written by modifying Eq. 5 and Eq. 6: where = 1, 2, . . . , . Therefore, multi-objective GAs evaluate individuals based on the following vector function: Because each control point affects multiple patches, their similarity functions can "conflict" with each other, which is also considered in the case of groups consisting of multiple patches. This is an important motivation for adopting MOEAs because a single-objective optimizer can hardly solve such a conflicting problem efficiently [34]. The number of regions is adjustable as a hyper-parameter (increasing makes optimization significantly more difficult). Pareto optimal solutions are supposed to obtain more appropriate solutions than optimizing a single-objective.

B. Coarse-to-Fine Strategy
An iterative framework using image pyramids can be employed to alleviate the difficulty of deformation estimation with large displacements [1], [26]. Estimation starts from the lowest resolution image for a rough estimation, and more accurate estimations are achieved as the image resolution increases. Specifically, we adopt Gaussian pyramid which iteratively generates low resolution images through Gaussian smoothing. The assignment of pyramid level indices follows the order in which estimation is performed (i.e., the first and the -th level images are with the lowest and the highest resolution, respectively). Both the image width and height are halved and hence the -th level image has 1/4 resolution of the ( + 1)-th level image.
With the increase of image resolution, it is also necessary to increase the resolution of the control point mesh to inherit deformation parameters between different levels. To this end, for a certain level, interpolating new control points without destroying the mesh configuration of the previous level is required. In this work, we adopt the Catmull-Clark subdivision [54] for the mesh subdivision. The purpose of subdivision is to update the × control point mesh with respect to each individual from the optimized to +1 × +1 mesh, where +1 = 2 −3 and +1 = 2 −3 (i.e., and are fixed for all levels). The Catmull-Clark subdivision algorithm generates a subdivided mesh by inserting new control points and updating the existing control points. As illustrated in Fig. 5, the points on the subdivided mesh can be classified into three types: face points, edge points, and vertex points. A face point is inserted to a patch. Assuming that the vertices of a patch are 1 , 2 , 3 , and 4 , the face point +1 is computed as their centroid, An edge point is inserted to the edge shared by two patches.
Assuming that two face points are +1 1 and +1 2 , and two endpoints of the edge are 1 and 2 , the edge point +1 is computed as follow: A vertex point is the updated point of a vertex shared by the four patches. Denoting that the average of the four face points is¯, and the average of the midpoints of the four edges which share as one of the endpoints is¯, the vertex point +1 is computed as follows: Note that the edge points and vertex points outside the dashed rectangle are not calculated, as surrounding control points are needed for calculation. After interpolation of control points, all the displacements are doubled to fit the increase of resolution in the image pyramid. An example of the subdivision process is shown in Fig. 6. It can be observed that the subdivided mesh in Fig. 6b can maintain the shape of the previous mesh in Fig. 6a well. Therefore, this subdivision step is useful for providing a good initialization for optimizing the image of the next level in the pyramid.

C. Decision of the Final Output
We introduce a post-processing procedure to decide the final single solution as the output from the optimized population. A natural idea is to define the best solution as the individual with the smallest sum of the objective function values (i.e., maximum similarity). However, in MOEAs, such an approach can lose most of the valuable information of the Pareto optimal solutions. We propose a post-processed solution by exploiting these sub-optimal solutions, as illustrated in Fig. 7. For each group, the solution with the smallest value of the corresponding objective function is extracted, which provides the control points that only affect the corresponding group. The final output is created by aggregating control points provided from all the groups. For shared control points, the average of their displacements is computed.

VI. E R
The effectiveness and usefulness of solving the deformation estimation problem with the multi-objective scheme are verified using both synthetic data (Sec. VI-A) and real-world images (Sec. VI-B). We compared the following four settings: the simple GA with a single-objective, NSGA-II and NSGA-III with two-objectives respectively, and NSGA-III with fourobjectives. In the two-objective setting, patches are divided equally and vertically into two groups (e.g., for a 7 × 7 lattice including 4 × 4 patches, each group consists of 2 × 4 patches). Similarly, patches are divided vertically and horizontally into four groups in the four-objective setting (e.g., each group consists of 2 × 2 patches for a 7 × 7 lattice). The number of pyramid levels is fixed to three in all experiments. To reduce the computational cost, pixel sampling is performed for individual evaluation by scanning the target image at five pixel intervals. For the implementation of three GAs, the Platypus package is used, which is an evolutionary computation framework in Python which includes many MOEAs. To ensure the correctness and fairness of the experiment, we only manually set several essential parameters and fix other parameters following the default setting throughout the experiment. Specifically, the number of evaluations is set to 10000 and the number of reference points of NSGA-III is set to 100 for the two-objective setting and 120 for the four-objective setting. For fair comparisons, all experiments are executed five times with different random seeds considering probabilistic operations. The initial population 1 0 for all settings is kept the same with respect to each random seed. https://github.com/Project-Platypus/Platypus

A. Comparison with Synthetic Data
To only focus on verifying the estimation ability rather than robustness against noises, we prepare five images in 400 × 400 pixels for generating template images and deformed target images, as shown in Fig. 8. The corresponding template images are obtained by cropping 160 × 160 pixels regions from the center of each image. Eight types of deformations are used for generating a deformed target image, the parameters are: • Deformation (2 types): an image is deformed into a wavy shape by moving the control points according to a sine curve. In addition to vertical-only displacements, a combination of both vertical and horizontal displacements is used.   Two  Two  Four  variable  objective  objective  objective  objective  objective  objective  range  NSGA-II  NSGA-III  NSGA-III  NSGA-II  NSGA-III  NSGA- As a result, there are 40 (i.e., 5 images × 8 types of parameter settings) types of deformed target images in total. These images are generated by using backward warping, and hence the ground truth of displacements can be obtained. In this section, we evaluate each result based on not only the root mean square error (RMSE) but also the mean Euclidean distance error (MEDE). RMSE is calculated from all the pixels between the deformed template image and the sampling region . MEDE is calculated based on all the ground truth displacements.
1) Results of Vertical Wavy Images: Comprehensive numerical results of the best solutions with respect to the vertical  wavy images are summarized in Table I. For each combination of image setting and algorithm, the minimum, maximum, and average values based on five random trials are investigated. As can be observed by comparing the four algorithms in terms of RMSE and MEDE, it is clear that the two-objective algorithms can achieve better results in most cases. Examples of visual results are shown in Fig 9, from which we can observe that these methods can estimate deformation parameters correctly for all the test images. By contrast, GA with single-objective achieves the best result only once in terms of the average RMSE and zero times in terms of the average MEDE value. The accuracy of GA can degrade significantly, e.g., 11 × 11 lattice with [−10.0, 10.0] range as illustrated in Fig. 10a. These observations verify the effectiveness of the multi-objective approaches. In addition, we can observe that four-objective NSGA-III gets poorer results than two-objective algorithms. Focusing on average values of both evaluation criteria, fourobjective NSGA-III outperforms others for zero times regarding the RMSE and only once regarding the MEDE.
The final solutions after post-processing on Pareto optimal solutions obtained by multi-objective algorithms are compared in Table II. As can be observed, the post-processing successfully improves the estimation accuracy in many cases comparing to Table I. In particular, four-objective NSGA-III benefits most from the post-processing. Focusing on the number of improved results on the average value, four-objective NSGA-III performs the best 17 times on RMSE and 19 times on MEDE, while two-objective NSGA-II performs the best for 15 and 14 times, respectively.

2) Results of Vertical and Horizontal Wavy Images:
The results of the best solutions with respect to the vertical and horizontal wavy images are shown in Table III. Despite more complex deformations, the multi-objective approaches can still achieve good estimation. Several qualitative results are shown in Fig. 11. Comparing with Table I, we can observe that the best results are irregularly distributed in terms of both evaluation criteria. Nevertheless, two-objective NSGA-II and NSGA-III are still better choices overall. In the case of 7 × 7 lattice, GA can achieve the best result when the search space is small (e.g., the sea image). Four-objective NSGA-III shows good results in the case of 11 × 11 lattice setting, which achieves the best average MEDE for six times out of ten times. Although the number of groups is a hyper-parameter to be handled carefully as mentioned in Sec. VI-A1, our results show the trend that larger number of groups is more effective when dealing with complex and subtle deformations.

B. Comparison on Real-World Images
The usefulness of the proposed method under real-world scenarios is verified in this section. We use three different pairs of template and target images as shown in Fig. 12.
• Texture (Fig. 12a): the images capture a part of the undeformed/deformed texture printed on a piece of paper. The target image has two vertical bumps. We set the lattice as 7 × 7 and the decision variable range as [−20.0, 20.0]. • Sign (Fig. 12b): the images capture a sign with undeformed/deformed text printed on a piece of wrapping paper. The lattice and decision variable range are set to 7 × 7 lattice and [−10.0, 10.0], respectively.  II  C  -. E  ,  ,  . T  T  I  I  . T  RMSE  MEDE  ,  T  I,  .   RMSE  MEDE   Image  Lattice  size   Decision  Two  Two  Four  Two  Two  Four  variable  objective  objective  objective  objective  objective  objective  range  NSGA-II  NSGA-III  NSGA-III  NSGA-II  NSGA-III  NSGA- • Face (Fig. 12c): the template image and the target image show a frontal face with serious expression and smile, respectively. The goal is to obtain deformation parameters that express smiling. We use the 7 × 7 lattice and the [−20.0, 20.0] decision variable range.
The sizes of the template images and target images are set the same as Sec. VI-A. Fig. 12a and Fig. 12b are captured by a web camera and Fig. 12c is extracted from the FEI face database [55] and cropped. Because the ground truth of the https://fei.edu.br/~cet/facedatabase.html  Two  Two  Four  variable  objective  objective  objective  objective  objective  objective  range  NSGA-II  NSGA-III  NSGA-III  NSGA-II  NSGA-III  NSGA- real deformations is unknown, we evaluate results only using RMSE.
The results of best solutions and post-processed solutions are shown in Table IV. These results show that multi-objective approaches can outperform the single-objective approach in real-world situations. However, post-processing fails to im-prove estimation accuracy in a number of cases (e.g., the face image). The estimation results for each image are shown in Fig. 13. It can be observed that the deformed texture image contains some highlight areas and the eye area of the face image is also shadowed. Hence, the deformation results for these corresponding areas show worse accuracy than the other Fig. 11. Examples of visual results achieved by multi-objective methods on horizontal and vertical wavy images. The 1st∼2nd columns are ground truth and 3rd∼4th columns are results. Red markers represent the control points, and green dots represent the deformed surface generated by forward warping. Images in the 4th column show the deformed template images with estimated deformation.  areas. The groups including these areas cannot contribute to the post-processing. As a limitation, the proposed method suffers from illumination changes due to the characteristics of the similarity measure.

VII. C
In this paper, we proposed a novel deformation estimation method using MOEAs to tackle the conflicts based on the fact that each control point of the deformation model affects a local region rather than a single pixel. Our method casts deformation estimation as a MOP by dividing a template image into several groups consisting of patches with group-wise similarity defined as group-wise objective functions, which can be solved by off-the-shelf MOEAs. To handle large deformations, optimization is run hierarchically following a coarse-to-fine strategy powered by image pyramid and control point mesh subdivision. Besides, a post-processing procedure is proposed to integrate Pareto optimal solutions into a single output, which can improve the estimation accuracy. The observations from experimental results can be summarized threefold. First, our partitioning approach with two-objective algorithms can obtain deformation parameters more accurately than GA with a single objective. Second, although the four-objective algorithm performs not as well as expected due to a large number of objectives, it shows to be effective in dealing with complex and subtle deformations. Third, the post-processing procedure can improve estimation accuracy in many cases. We can observe the usefulness of the proposed method with real-world images.
The main limitation of our method is that high computational resources are required. As future work, we would like to address this issue by further tuning the hyper-parameters that can reduce the computational cost without degrading the performance. We are also interested in the referenced point distribution of the NSGA-III. A user-supplied setting may be able to focus solutions on regions that are desirable for the post-processing procedure.