Dense Feature Matching Based on Homographic Decomposition

Finding robust and accurate feature matches is a fundamental problem in computer vision. However, incorrect correspondences and suboptimal matching accuracies lead to significant challenges for many real-world applications. In conventional feature matching, corresponding features in an image pair are greedily searched using their descriptor distance. The resulting matching set is then typically used as input for geometric model fitting methods to find an appropriate fundamental matrix and filter out incorrect matches. Unfortunately, this basic approach cannot solve all practical problems, such as fundamental matrix degeneration, matching ambiguities caused by repeated patterns and rejection of initially mismatched features without further reconsideration. In this paper we introduce a novel matching pipeline, which addresses all of the aforementioned challenges at once: First, we perform iterative rematching to give mismatched feature points a further chance for being considered in later processing steps. Thereby, we are searching for inliers that exhibit the same homographic transformation per iteration. The resulting homographic decomposition is used for refining matches, occlusion detection (e.g. due to parallaxes) and extrapolation of additional features in critical image areas. Furthermore, Delaunay triangulation of the matching set is utilized to minimize the repeated pattern problem and to implement focused matching. Doing so, enables us to further increase matching quality by concentrating on local image areas, defined by the triangular mesh. We present and discuss experimental results with multiple real-world matching datasets. Our contributions, besides improving matching recall and precision for image processing applications in general, also relate to use cases in image-based computer graphics.


I. INTRODUCTION
Feature matching between two color images is an essential step in many computer vision applications, such as imagebased rendering, 3D reconstruction, object tracking, change detection, stitching, image registration and photo mosaicking [15], [22], [40], [49]. The conventional feature matching pipeline can usually be divided into the following four subprocesses: feature detection, feature description, preliminary feature matching and outlier removal. For the first two substeps, algorithms such as SIFT [30], SURF [7] or ORB [47] are mostly used.
The preliminary feature matching is based usually on a simple Euclidean distance comparison between the previ-The associate editor coordinating the review of this manuscript and approving it for publication was Yizhang Jiang . ously computed feature descriptors. The outlier removal is typically performed by using the RANSAC algorithm [18]. It is based on successive attempts to fit a model (usually the fundamental matrix) to a maximum subset of matched features, the so-called inliers. A correctly estimated fundamental matrix describes the geometric relationships between all corresponding image pixels [32].
Robustness and accuracy are crucial for most feature-based image processing applications in practice. In some use cases, particularly in context of image-based computer graphics, dense feature correspondence sets are also required. Examples include image morphing, warping, 3D reconstruction and photogrammetric modeling [13], [41], [52], [54], [61]. Such use cases are relying on many precise matches to reduce ghosting artifacts in interpolated intermediate views or to minimize 3D reconstruction error, for example. However, these requirements represent a fundamental challenge in computer vision, as the conventional feature matching pipeline has some intrinsic limitations, described below: The first challenge in conventional feature matching is the degeneration of the fundamental matrix, because its estimation is sensitive if computed for scenes with complex structures or multiple depth layers. A wrong fundamental matrix estimation can also be caused by the RANSAC algorithm itself: In its basic implementation, it typically selects only prominent feature points, which are concentrated just in a local pixel or depth area, especially in real scenes with high depth complexities. In such cases the resulting fundamental matrix is not representative, resulting in false rejection of further (potentially correct) feature correspondences [57]. Fig. 1 illustrates this problem: Even with perfect feature correspondences (a), the estimated fundamental matrix (b) deviates from the correct fundamental matrix (c), which was computed using actual camera parameters in this example. In addition, by using the 5-point [38] or 8-point algorithm [29], RANSAC uses only a small number of potential correspondences from the matching set to estimate the model. This is potentially suboptimal for image pairs with wide baselines, as these will contain a large percentage of outliers [64].  [37].
A further problem is that feature matching is intended to find image pair correspondences, which represent the same physical point. In conventional feature matching, however, a matched feature point in one image corresponds just to the nearest neighbor based on Euclidean distance comparisons in the other image. So, initial mismatches can be propagated as false positives in following application steps, which is known -in case of ambiguities -as the ''repeated pattern matching problem'' of computer vision [45].
In this paper we introduce a new feature matching method, which addresses all the aforementioned problems in one pipeline. The corresponding goal is to output more precise and denser feature-based correspondences: Our solution extends conventional feature matching to an iterative rematching process, allowing us to reconsider previously rejected feature points as potentially correct matches. Furthermore, instead of using one fundamental matrix, our search for correct feature correspondences is executed per iteration with an individually estimated homographic transformation. The result of the rematching process is set of homographies, we call homographic decomposition. Using different homographies, each associated with a specific matching area, provides the following advantages: (a) matching feature points in the target image can be refined by using neighboring homographies from the source image to approximate their exact physical positions, (b) ''critical image areas'' (i.e. containing partially occluded objects) can be detected by combinatorial analysis of the homographic decomposition and considered in following pipeline steps, and (c), additional feature points, which cannot be detected by traditional matching, can be identified by using local homographies for feature extrapolation, especially in peripheral zones of critical image areas. Moreover, Delaunay triangulation of the feature point set makes it possible to utilize the resulting triangle mesh as a ''supporting structure'' to implement Delaunay outlier detection. On the one hand, this makes it possible to defuse the repeated pattern problem. On the other hand, it allows us to further increase feature point density. Here, we refer to focused matching, which incorporates the local re-execution of the matching pipeline within triangle cells that correspond in both images.
In our work we primarily target the above mentioned use cases from image-based computer graphics. Therefore, we focus on datasets with the following properties, common in this application context: (i) inside-out shots of real scenes, typically with high depth complexity, (ii) sequences of pairwise overlapping images from different viewpoints, (iii) representation of only static scenes with predominantly stable illumination situations, and (iv) without strong lens distortions, as in fisheye photography, for example.
We present related work in the following section. Then, section III contains a detailed description of our pipeline, followed by the evaluation and discussion of qualitative and quantitative results in section IV. Conclusions and future work are addressed in section V.

II. RELATED WORK
Since the 1980s, different algorithms have been developed for the detection and description of image features: For example, Harris and Stephens [20] published the Harris Corner Detector. The ''Scale Invariant Feature Transform'' (SIFT) algorithm was presented by Lowe [30]. Bay et al. [7] presented the ''Speeded Up Robust Features'' (SURF) algorithm. The ''Binary Robust Independent Elementary'' (BRIEF) descriptor was introduced by Calonder et al. [10]. In 2011, based on the ''Features from Accelerated Segment Test'' (FAST) detector [46] and the BRIEF descriptor, ''Oriented FAST'' and ''Rotated BRIEF'' (ORB) algorithms were introduced by Rublee et al. [47], respectively. Recent methods use also deep learning for feature detection and description: For example, ''Learned Invariant Feature Transform'' (LIFT) was proposed by Yi et al. [63]. It is a deep neural network that combines the components of standard pipelines for local feature detection and description into a single differentiable network, supervised by a common ''structure from motion'' process. DeTone et al. [14] published a self-supervised framework, called ''Superpoint'', for training interest point detectors and descriptors. Ono et al. [39] introduced another deep neural architecture, which trains a detector and descriptor end-to-end in a two branch setup. One branch is differentiable and is feeding on the output of the other non-differentiable branch. Lou et al. [31] published the ''ASLFeat'' learning framework for local features of accurate shape and localization. Truong et al. [58] introduced a CNN-based feature point detector for specific applications, like medical image matching. It is trained in a semi-supervised manner on pairs of images related by a homographic transformation.
Image feature descriptors and corresponding distance comparisons can only approximate relationships between physical features, which usually leads to a relatively high number of visual mismatches in practice. Therefore, ''outlier removal'' has become an important step in feature matching pipelines, typically by trying to approximate global geometric relationships between images. For example, Fischler and Bolles [18] presented the ''random sample consensus'' algorithm (RANSAC) to remove outliers during estimation of the fundamental matrix by considering epipolar geometry to filter out falsely corresponding feature points. Researchers have already developed a number of methods to improve the efficiency and robustness of the basic RANSAC algorithm, for example to solve the above-mentioned fundamental matrix degeneration problem and thus to obtain a better geometric model estimation. Examples, incorporating local optimization methods, are Chum et al. [12] and Frahm and Pollefeys [19], including ''Inner RANSAC''. Raguram et al. [42] implemented the ''USAC'', an universal framework for random sample consensus, which extends the simple hypothesize-and-verify structure of standard RANSAC and makes it possible to consider various optimizations. The work of Tan et al. [55] includes improvements for achieving a more uniform spatial distribution of feature correspondences and filtering mismatches using a smoothed disparity check based on a pre-estimated fundamental matrix. Other researchers proposed alternative model fitting algorithms. For example, Barath et al. [4] presented the MAGSAC algorithm that does not require a single inlier-outlier threshold such as RANSAC. By exploiting the residual density, Tiwari and Anand [56] introduced the DGSAC algorithm. In the work of Ranftl and Koltun [44] outliers are removed via geometric model estimation and the underlying fundamental matrix is computed using deep neural networks. More recently, Skoryukina et al. [51] proposed a RANSAC scheme with geometrical restrictors, focusing on ID document classification. For this case of planar object matching, improvements in accuracy are achieved.
Especially for image pairs with wide baselines, there is a a major drawback of outlier removal by basic fundamental matrix estimation: Corresponding algorithms typically rely only on small subsets of the data, required to generate the hypothesis. This can result in a high number of outliers [64]. Previous works try to address this limitation at the matching subprocess level. Therefore, they are related to our work, since they pursue the same goal of pruning false matches while finding a high number of robust and accurate correspondences: Ancuti et. al [2] use kernel feature correspondences to estimate geometric relationships between surrounding regions for the generation of additional positive matches. Bian et al. [9] reject outliers by converting motion smoothness constraints into statistical measures based on a limited number of feature matches between a region pair. Another related correspondence pruning method by Lin et al. [25] aims to detect a coherence-based separability constraint from noisy matches and embed it into a correspondence likelihood model. Exact matches are then obtained by varying the affine motion model. Ma et al. [34] proposed in their work an outlier removal method based on preserving local neighborhood structures. They formulate their idea into a mathematical model and derive a closed-form solution with linear time complexity. Jiang et al. [21] presented a matching method using adaptive spatial clustering of putative matches based on motion consistency, considering also an additional ''mismatch cluster''.
Lee et al. [23] formulate the problem of the matching subprocess as a Markov random field. They use both, local descriptor distances and relative geometric similarities, to enhance robustness and accuracy. Liu et al. [28] presented a new matching method, contributing an advanced consensus of neighborhood topology. Combining it with a guided matching strategy from potential matches for neighborhood construction, results in improved inlier detection. Recent work of Liu's et al. [27] also includes a matching method particularly for remote sensing images. Inspired by region growing segmentation, they determine a high-ratio inlier subset as the seed (matching) set. It is then used to extract more reliable matches by an correspondence growing criterion based on motion consistency. Mohammed and El-Sheimy [37] presented a descriptorless feature matching. In addition to geometrical constraints, it also uses template matching to achieve a reasonable prediction of correspondence locations and their distribution.
Yi et al. [64] use deep learning for feature matching. Their neural network requires a set of potential sparse matches and the ground truth camera intrinsic parameters as input. It is used to label the test matching set as inliers or outliers and to output the camera motion. In the work of Wang et al. [60] learning of local feature matches is realized by solving a differentiable optimal transport problem. Corresponding UML-based activity diagram of our feature matching pipeline, including the main stages and the individual processes. The pipeline works with two basic data structures: ''reiteration data structure'' (stage 1) and ''refinement data structure'' (stages 2-4). The colored symbols (triangles, squares and circles) represent contained data entries that a process uses as input, or that are modified by a process as output. Further explanations and details are elaborated in section III.
costs are predicted by a graph neural network. Ma et al. [33] and Li et al. [24] interpret mismatch removal as a binary classification problem. They use different sets of geometrical properties to describe the putative matches and to feed corresponding match representations to supervised learning procedures.
Other related work by Chen et al. [11] refers to stabilization of stereo image correspondences. Starting with a pre-computed set of reliable feature correspondences, each image is divided into triangles using Delaunay triangulation (similar to the triangulation process in our second pipeline stage). Then, the resulting triangle set is processed using a specific ''planarity test'' in order to reconstruct planes in 3D space by depth calculation. In the next step, further feature correspondences are computed in each planar region using the corresponding homographies. In contrast to our work, Chen at. el. require the estimation of an initial fundamental matrix, which can often be error-prone in practice and thus can have a negative impact on following processing steps. Moreover, the detection of ''critical image areas'', like occlusions, and further pipeline optimizations are not considered in their solution. Further work on feature matching was presented by Dou et al. [16]. They also take advantage of Delaunay triangulation for outlier detection: After initial matching, Dou et. al. try to remove false matches by utilizing sparse approximation theory. Then, the remaining feature points are triangulated separately in each image for final outlier removal by searching for triangles with non-corresponding vertices in the image pair. However, this approach strongly depends on the correctness of the initial matching stage. Even a small set of incorrect matches can lead to locally inconsistent triangulation and thus significantly degenerate the final number of positive matches.

III. ITERATIVE DENSE FEATURE MATCHING PIPELINE
In the following sections we describe the four main stages of our matching pipeline. Each stage is composed of individual processing steps, as shown in Fig. 2. VOLUME 10, 2022 A. ITERATIVE REMATCHING Our first pipeline stage consists of the repetition of the following processing steps, which are re-executed on the set of yet unmatched feature points (including also preliminary mismatches from previous iterations): First, we run brute-force feature matching to determine for each point in the ''source image'' the nearest neighbor in the ''target image''. In the next step, we use the RANSAC algorithm for estimating a homography matrix with most inliers for the current matching set, rejecting associated outliers. A homographic relation can be used to describe feature correspondences for points, which lie on the same plane in 3D space. However, in practice one homographic plane can span over multiple surfaces, e.g. of different objects. Thus, we improve the quality of homography estimation by clustering feature points and recalculation of a (more precise) homographic transformation per cluster. Our motivation here is visual coherence and the observation that a cluster-based re-processing increases the probability for better surface approximation. For automatic clustering we use the ''Density-Based Spatial Clustering of Applications with Noise'' (DBSCAN) algorithm [17].
We have also implemented the so-called target collapse filter, which is executed per iteration to detect ''degenerated matches''. This happens, if multiple feature points of the source image are matching with the same point of the target image (see Fig. 3). Matching degeneration occurs typically in the following case: First, close feature points in the source image have a visually comparable local texture. Additionally, too few features could have been detected in the corresponding target image area. Such collapsing feature points basically indicate wrong matches w. r. t. homography estimation. Consequently, they have to be excluded in the following steps to prevent pipeline failures. To support fast convergence of the collapse filter, the exclusion is performed in both feature sets, for the source and for the target image. Then, the last iteration is repeated to trigger the recalculation of the corrected homography matrix. The termination of our iterative rematching stage is controlled by the relative rematching distance error (RRDE), which is calculated as follows: Let d 1 be the initial average matching distance (after first iteration) and d i the average matching distance of the last executed iteration i ≥ 1, then: Notice that the following always applies: d i ≥ d 1 . Let ε ∈ [0, 1[ be a user-defined parameter. Then, rematching is terminated after iteration i, as soon as condition ε ≥ RRDE i is met the first time. ε is a threshold parameter, which can be used to control the trade-off between the desired matching density and quality of the rematching stage: The larger its value is chosen, the more iterations can be performed. On the one hand, this allows the detection of more feature points and more differentiated homographic relations (each potentially corresponding to a plane in the scene presented by the images). On the other hand, this results in a potentially increased number of false matches due to the continuously increasing matching distances d i per iteration. For our evaluations in section IV, a heuristically chosen error value ε = 2/3 has proven to be a reasonable trade-off between achieving the presented high matching densities and preserving superior matching quality in terms of accuracy of recall.
The result of the rematching pipeline stage is a base set of feature point pairs (between the source and target image) and a set of homography matrices. Each feature pair is associated with a distinct homography.

B. HOMOGRAPHIC DECOMPOSITION
The next pipeline stage implements stage-two outlier removal and refinement of the feature matching results from the previous step: First, the feature point set of the source image is triangulated using Delaunay mesh generation. Then, the resulting mesh is mapped to the matching point set of the target image in order to detect further outliers. This refers to the ''repeated image patterns problem'', mentioned in section I. We identify false matches based on a target mesh consistency check: Feature points, which are incident to an overlapping mesh edge, are successively removed from the base set of matching feature point pairs, as illustrated in Fig. 4. The next step aims at further improvement of the matching accuracy. Each source feature point p ∈ P is associated with a homography matrix h ∈ H and embedded in a triangular mesh structure. Hence, we can take advantage of its connectivity information and search for a better match in the target image as follows: (a) Successive transformation of p using neighboring homographies, (b) recalculation of the matching distance at each transformed position, and (c) comparing it to the initial distance. This feature point refinement is shown end for 11: end for in algorithm 1. It turned out that this step can significantly contribute to the reduction of matching ambiguities, as illustrated in Fig. 5. We call the refined set of feature point pairs, including the homography associations, a homographic decomposition.

C. FOCUSED MATCHING
In the third pipeline stage we perform focused matching to find additional good corresponding features: We re-execute rematching and feature refinement (as described in sections 3.1 and 3.2), considering the remaining set of unmatched points. But now, each execution is restricted to one corresponding triangular area of the source and target images. This restriction is intended to mimic ''visual focusing''. Thereby, it is possible to detect further detail features and to estimate new local homographies. Focused matching is most effective for (larger) triangles with complex visual structures. So, to assure robust RANSAC-based homography estimation, we skip triangles with too few feature points, recommending a threshold value of at least 16 points per triangle. Finally, Algorithm 2 Feature Point Extrapolation Input: Unmatched source feature point set P U , Vertex set V t of triangle t, homography set H V t for V t , minimal matching distance d min between V t and target V t Output: Extrapolated target feature point set P E , corresponding source feature point set P E 1: for each p ∈ P U | p ∈ area(t) do 2: for each h ∈ H V t do 3: p C := h * p

D. MATCHING EXTRAPOLATION
In the last stage, we concentrate on the detection of further feature points in ''critical image areas''. We identify these areas by ''inhomogeneous triangles''. Homogeneity, in context of our feature point mesh, is defined for a triangle t with vertices v i ∈ V t as follows: Let h i ∈ H be the (initially) associated homography matrix with feature point vertex v i . Then, t is homogeneous, if: The ''raw'' homographic decomposition typically exhibits a high degree of variance in homography-to-vertex associations (with initially one homography per vertex). Therefore, in order to improve the detection quality in inhomogeneous triangles we have implemented the so-called homogenization process: Every feature point p is transformed successively from the source to the target image using a homography matrix from the set of neighboring feature points. Then, we search for transformed feature points in the target image, whose reprojection error is smaller than the threshold parameter of the RANSAC algorithm. The neighboring homographies, which satisfy the aforementioned condition, are additionally assigned to p. Notice that now a feature point in the resulting homogenized homography decomposition can be associated with multiple (locally equivalently transforming) matrices. Finally, we execute the feature point extrapolation algorithm to obtain further good matches in inhomogeneous triangles: First, each yet unmatched source feature point p is transformed successively to the target image using local matrices of the homographic decomposition. The result is a set of candidate target features points P C . If the minimal VOLUME 10, 2022 matching distance between p and P C is smaller than the minimal local matching distance d min , then p and the corresponding p ∈ P C are added to the extrapolation sets P E and P E , respectively. The union of P E and P E with the respective feature point sets from previous processing steps represent the final result of our matching pipeline. The extrapolation is shown in detail in algorithm 2 and illustrated in Fig. 6.

IV. IMPLEMENTATION AND RESULTS
Our pipeline was developed in C++ using the OpenCV library. Apart from parallelization of the homogenization and extrapolation algorithms (subsection III.D), no other run-time or memory optimizations have been implemented yet. The following benchmarks have been performed on a PC with an Intel i9-9900K CPU, 32 GB RAM and Windows 10.

A. QUANTITATIVE EVALUATION
To evaluate our feature matching pipeline, we have used the following image data: From the classic Oxford matching dataset [35], [36], ''Wall'' and ''Graf'' scenes were picked, because (only) these images satisfy our target use cases, defined in section I.  (9 scenes) were picked from the MultiH dataset [3], published for evaluation of multi-plane fitting methods in stereo images. All datasets provide ground truth matching information.
Our quantitative evaluation is based on the numerical indicators ''precision'' (p) and ''recall'' (r), as described by Agarwal and Roth [1]. Let n TP be the number of true positive Feature pairs whose distance to the ground-truth epipolar line is smaller than a certain threshold d in in both images (see below) are regarded as true positives, or in the other case as false negatives. To guarantee uniform comparability for different image resolutions, the threshold d in is determined by α √ h 2 + w 2 , where h and w are height and width of an image, respectively. The user-defined precision factor α is set to 0.003, as proposed by Bian et al. [8].
Additionally, we propose a new numerical indicator for feature matching based on recall and precision. The Q-indicator, is a simple single-value measure for determining the overall ''matching quality'', emphasizing precision: In the first part of our evaluation we performed feature matching tests on the aforementioned datasets with the goal of demonstrating the flexibility and robustness of our pipeline. For this purpose, we used various feature detectors and descriptors, including a comparison to baseline matching (cf. Table 3): SIFT [30], SURF [7], FAST [46] with BRIEF [10], ORB [47], SuperPoint [47], LF-Net [39], LIFT [63] and ASLFeat [31].
In the second part, we evaluated the robustness of our pipeline, determined detailed matching results as well as time measurements of our algorithms and performed ablation studies to justify the contribution of the individual pipeline stages (cf. Table 2). For this purpose, we generated sparse and dense feature sets for the following image pairs, respectively: ''Wall 1'', ''Wall 3'' (W), ''Graf 1'', '' Graf 3'' (G) from the Oxford dataset and ''Neem'' (N), ''Elderhall-B'' (E) from the AdelaideRMF dataset. The sparse feature matching configurations have approximately three to four times less initial matches (pipeline stage 1) compared to the dense configurations. In these tests we also enabled the feature point extrapolation for inhomogeneous triangles (see section III.D), tagged with ''H+I'' in Table 2. This makes it possible to consider even more true positives matches and thus to get a even more representative recall value. A selection of corresponding visual results is shown in Fig. 8. In the last part, we performed similar matching tests as in part 1, but now we compared our proposed pipeline to the following matching methods: GMS [9], LPM [34], CODE [25], PFM [23], RFM-SCAN [21], LC [64] and LMR [33] in combination with RANSAC for geometric model estimation (as we also set up our pipeline with RANSAC). The results are summarized in Table 1, including arithmetic and harmonic mean values, and in Fig. 7, showing cumulative distributions of precision and recall. For this evaluation, we have generated different ''training feature sets'' for the corresponding image pairs in order to tune the parameters of each matching algorithm as best as possible. The results reported in this paper are based on the execution of each corresponding matching algorithm only once (for each image pair per dataset). For this purpose, the initially estimated and fixed parameters are used, thus giving the final ''test feature sets''. For LC and LMR we used the pre-trained model released by the authors. In all tests we used the SIFT algorithm for feature detection and description.

B. RESULTS AND DISCUSSION
As can be seen in Table 1, the comparison with conventional matching shows that our solution achieves consistent and stable improvements in matching quality even for different descriptors and detectors. The use of our pipeline as an ''matching framework'' makes it possible to reach high precision and recall values also with ''traditional'' algorithms, like SIFT. These can then compete even with modern (ML-based) methods, like Superpoint, LF-Net, LIFT and ASLFeat.
The detailed matching results in Table 2 (with all pipeline stages executed) can be summarized as follows: Stage 1 (iterative rematching) detects between 40% and 69% of all matches. Focused matching (stage 3) adds between 10% and 27% matches. For datasets with a planar scene setup (W and G), additional 26% to 43% of matches are included due to feature point extrapolation (stage 4). For datasets with higher depth complexities (N and E), extrapolation contributes to 16% to 40% extra matches (test configuration ''I'') and 47% to 52% matches (test configuration ''H+I''). The results of our ablation studies in Table 2 can be concluded in the following way: The basic homographic decomposition step (stage 2) has not only a direct impact on the total number of matches, but it is also crucial for matching precision: Discarding stage 2 results in a significant increase of n FP by 27% up to 128%. In particular, we can see that the implemented Delaunay mesh consistency check has the potential to significantly reduce the number of false positives, typically caused by repeated image patterns in this case. Skipping focused matching (stage 3) results in an decrease of the overall matching count by 13% up to 27%. But, n TP , n FP and n FN remain roughly stable in proportion to the overall count. If feature point extrapolation (stage 4) is disabled, then the total number of matches decreases by 16% up to 43% (causing also n TP and n FP to decrease). However, n FN increases up to 27%,  [35], [36], AdelaideRMF [62] and MultiH [3]. A point (x,y) on one of the curves implies that there are (100 · x)% of image pairs whose recall/precision does not exceed y in each case.
which has a negative impact on the matching recall. Since the precision values remain largely stable with and without extrapolation, stage 4 supports the generation of true positive matches in discontinuity regions without causing additional side effects that could have a negative impact on the pipeline results.
The performance bottleneck in the current (yet nonoptimized) implementation are stages 3 and 4: Feature point refinement requires 31% to 41% and extrapolation 32% to 46% of total run-time. Stage 1 (iterative rematching) requires 7% to 10% and stage 2 (initial homographic decomposition) 12% to 18% of run-time. Beyond that, a direct interpretation or comparison of the DFM run-times would have only limited significance: On the one hand, one key requirement in our work was to achieve high feature densities, while maintaining high precision (rather than high run-time performance). Consequently and as motivated in section I, we are focusing on pre-processing, such as for image-based computer graphics applications (and not real-time feature matching, for example). One the other hand, our pipeline corresponds to a multi-stage designed software framework in which individual (''one-step'') matching methods can be integrated and then executed iteratively (as demonstrated in Table 1). In Table 4 we give an overview of the worst-case time complexities of all pipeline algorithms. From the average recall and precision results of Table 3, we can see that our pipeline significantly improves both indicators on all datasets in comparison to the other evaluated VOLUME 10, 2022 FIGURE 8. Visual results of each pipeline stage, including Delaunay mesh (blue), feature points from focused matching (yellow) and extrapolation (green).

FIGURE 9.
Histograms of reprojection errors for the following datasets (from left to right): Oxford [35], [36], AdelaideRMF [62] and MultiH [3]. matching methods. The precision and recall distributions are shown in Fig. 7. The corresponding key results can be summarized as follows: In particular, CODE and RFM-SCAN have mostly just slightly lower recall values than our DFM, but they do not achieve the same consistently high precision values. The evaluated ML-based methods (LC and LMR) have lower recall and precision values in all tests (except for a single peak recall value for LC and the AdelaideRMF dataset). We interpret this as a consequence of the limited or specific pre-trained models available.

C. APPLICATION EXAMPLES
One possible target use case of our pipeline is ''image morphing'': It is an image processing technique that generates smooth transitions between image pairs. Basic image morphing consists of the following two sub-steps: warping and blending [41]. In our example, we use triangle-based warping. The triangle vertices represent the feature correspondences of our matching pipeline. A dense and accurate feature matching set is crucial to reduce visual distortions as well as ghosting artifacts due to blending. Furthermore, image areas with occlusions or disocclusions are a challenge for image morphing, as visual artifacts occur especially there. Using our matching pipeline, we can detect such areas. We therefore plan to further utilize this information in our future research on feature matching and in particular image-based rendering applications. Corresponding first visual results are shown in Fig. 10. We have chosen the Castle-P30 dataset [53] to illustrate feature-oriented image morphing, because of the clearly available parallaxes.
Finally, to demonstrate the accuracy of our multihomography decomposition, we computed the reprojection errors for each feature matching pair. Therefore, we used the results from our pipeline and the provided ground truth data of the corresponding datasets. Minimal reprojection errors (i.e. on average clearly smaller than one pixel) are important for high-quality visual results in context of image pre-processing for multi-view 3D reconstruction and photogrammetric 3D modeling applications, for example [13], [54]. In Fig. 9 we show the resulting reprojection error histograms of all our evaluation datasets (for a RANSAC reprojection threshold of 2.1 pixels). The corresponding ''root mean square reprojection error'' (RMSE) [48] is approximately 0.94 pixels for the Oxford dataset, 0.66 pixels for AdelaideRMF and 0.70 pixels for MultiH, respectively.

V. CONCLUSIONS AND FUTURE WORK
In this paper a novel feature matching approach was presented, which detects significantly more robust and accurate feature correspondences, compared to conventional and related state-of-the-art methods. A dense feature matching set can be generated also for scenes with high depth complexity. This opens up new application opportunities e. g. for use cases in computer graphics, including morphing, warping, 3D reconstruction, image-based modeling etc. Our work addresses the prevailing challenges commonly encountered in the development of feature-based image processing applications, providing a single-pipeline solution: The first pipeline stage, iterative rematching, comprises the homographic decomposition and cluster analysis of the image space. This bypasses the ''fundamental matrix degeneration problem'' and makes it possible to handle visually disturbing effects in following pipeline steps. Our Delaunay outlier detection (second stage) removes false positive matches, which are especially caused by ''repeated image patterns''. Additionally, matching accuracy is increased due to refinement of the matching set by taking advantage of our multihomographic decomposition. Focused matching (third stage) simulates ''visual focusing'', resulting in the identification of additional detail feature points. Homogenization (last pipeline stage) supports the detection of ''inhomogeneous'' image regions that are typically caused by parallax effects. Even just their peripheral areas are difficult to match in practice, but still important for many of the aforementioned use cases. Our feature extrapolation makes it possible to detect further matches in these ''critical areas'', resulting in a refined multi-homography decomposition.
Current limitations of our pipeline concern the restrictions on the types of input datasets supported: Our method is designed for image sequences of RGB-colored photo or video shots with sufficient pairwise overlaps. The images should represent static scenes, i.e. excluding significant object motions, lighting changes and (specular) effects. Image data with large camera distortions, such as in ultra wide-angle or fisheye photography, has not yet been tested. The present focus is on high-quality matching and uses cases in context of offline (pre-)processing. Therefore, our algorithms are currently not optimized for performance-critical applications, much less real-time application scenarios.
Our future work includes research in context of automatic tuning of matching parameters, which would allow an easierto-use and wider range of applications using homographic decomposition. Additionally, we plan to implement improvements of our algorithms, in particular with respect to runtime. This includes low-level and algorithmic optimizations, further multi-threaded processing and GPU acceleration.

ACKNOWLEDGMENT
This project was promoted by the Bavarian Academic Forum (BayWISS), as a part of the joint academic partnership digitalization program.