Robust Motion Blur Kernel Estimation by Kernel Continuity Prior

The accurate kernel estimation is key to the blind motion deblurring. Many previous methods depend on the image regularization to recover strong edges in the observed image for kernel estimation. However, the estimated kernel will be degraded when recovered strong edges are less accurate, especially in images full of small-scale edges. Different from previous methods, we focus on the kernel regularization. Inspired by the fact that the blur kernel is highly related to the continuous camera motion trajectory during the image capturing, we propose to encourage the continuity of the kernel through a kernel prior. The proposed prior measures the continuity of each element in the kernel and generates a continuity map. By encouraging the sparsity of the map using $L_{0}$ norm, discontinuous kernel elements are suppressed. Since the model with the proposed prior is non-convex and non-linear, an approximation method is proposed to minimize the cost function efficiently. Numerous experimental results show that our method outperforms state-of-the-art methods on both the normal and challenging cases. Moreover, the proposed prior is able to further improve the performance of existing MAP-based methods.


I. INTRODUCTION
Motion blur is an image degradation caused by the motion between the camera and the scene during the exposure. Blind deblurring aims to recover the latent image and the blur kernel based on the observed blurred image, which is ill-posed. Consequently, extra information is required to alleviate this ill-posed problem. The maximum a posterior (MAP) framework [1]- [5] is commonly used in conventional deblurring methods. This framework introduces extra information by the way of priors: where y, x and k denote the blurred image, the latent sharp image and the blur kernel, respectively. y − k * x 2 2 is the likelihood term that enforces the similarity between the blurred image y and the latent image degradation k * x. φ(x) is the image prior and ρ(k) is the blur kernel prior.
Previous methods seek help from image statistics for image priors φ(x), such as the mixture of Gaussian [6], the normalized sparsity [7], the L 0 sparse representation [2] and the The associate editor coordinating the review of this manuscript and approving it for publication was Hongjun Su.
Hyper-Laplacian [8]. Their success relies on the recovery of strong edges during optimization [9], [10], since strong edges provide most of the accurate blur information. In addition to image priors, heuristical image filters are also effectively used for recovering strong edges [11]- [13]. Meanwhile, Xu et al. [1] figure out strong edges are not always good for kernel estimation because some strong but small-scale edges may introduce ambiguities. Moreover, the image prior φ(x) can only affect the kernel accuracy indirectly by the likelihood term.
On the other hand, the accurate description of the kernel prior is more direct and important [5], [21], [22]. In the view of that the motion blur kernel is generated by the continuous camera shake trajectory during the exposure, the kernel highly resembles the connected and continuous trajectory. Therefore, the continuity is a very important property of the kernel. To encourage the continuity, the most popular kernel priors focus on the kernel gradient domain ∇k, such as ∇k 2 2 [15], ∇k 0.5 [17] and the new spatial term [16]. The model is easily optimized with the gradient term, but it emphasizes more on the local smoothness rather than the connectivity of the support area which is the key property of the continuity. Other methods use the post-processing step to enforce the kernel continuity, i.e.the adaptive threshold [1] and the noise pruning [3], [4]. But these two-step methods lack of a unified cost function, leading to the difficulties on the convergency and the global optimization. Therefore, a unified model with a proper continuity preservation is required in the kernel estimation problem.
Inspired by this, we propose a new continuity measurement for the kernel prior in a unified model. First, we define the continuity on each element within a window to form a map. Second, by enforcing the map to be sparse using the L 0 norm, less connected elements, e.g.isolated noise, are penalized. Thus the problem of the encouraging the kernel continuity becomes the problem of encouraging the map sparsity. Finally a unified model with the kernel continuity prior is proposed. As the kernel prior is non-linear and non-convex, an approximate solver is proposed to optimize the unified model efficiently.
The contributions of the paper are summarized as follows: • We are the first to propose a kernel prior that measures the continuity of each element in a window and preserves the kernel continuity by encouraging the sparsity in a unified model.
• An approximate solver is proposed to minimize the non-convex and non-linear cost function and achieves fast convergence.
• Solid experimental results on large datasets and challenging cases show our method outperforms previous methods in both accuracy and efficiency. Moreover, the proposed prior can further improve the performance of the state-of-the-art method. The rest of the paper is organized as follows: Section II reviews the related work. Section III describes the proposed continuity kernel prior. In section IV, the unified model and the optimization are presented. Section V analyzes the effectiveness of the proposed kernel prior. Section VI shows the experimental results. In sectionVII, we draw the conclusion of our method.

II. RELATED WORK
Recent years have witnessed the fast development of the kernel estimation in the blind image deblurring. We review the related work in three aspects: the image priors, the kernel priors and the post-processing methods.

A. IMAGE PRIORS
Most previous methods focus on the image priors [1]- [5], [7], [8], [11], [16], [23]- [29]. The success of these methods relies on the recovery of strong edges for the kernel estimation. However, strong edges do not always benefit the kernel estimation, for example, strong but small-scale edges even introduce ambiguities [1]. Here small-scale edges indicate the closely adjacent strong edges whose gap is smaller than the kernel size. Under the blur degradation, the high frequency details are easily confused. Xu et al. [1] point out that the gradient magnitude of the small-scale edge in the blurred image is much lower than the original sharp one, making it hard to recover the original sharp edges. They propose a criterion to select larger-scale edges in the image for kernel estimation to avoid the negative effect of small-scale edges. The criterion is effective and also used by method [16]. However, these edge selection methods are based on the assumption that large-scale edges are available in the blurred image, which may be violated in some cases. Moreover, the image prior can only affect the kernel accuracy indirectly by the likelihood term.

B. KERNEL PRIORS
Other methods study the characteristics of the blur kernel and constrain the estimated kernel using priors, as shown in Table 1. Among them, L 2 norm [2]- [4], [20], L 0 norm [14], [15], L 1 norm [7], [18], L 0.5 norm [19] and L 0.9 norm [17] are widely used as blur kernel priors p(k). However, most of them do not take the kernel continuity into consideration. To encourage the continuity, some researchers propose priors with kernel gradients. Reference [17] proposes a prior that combines the kernel intensity and gradient: k 0.9 + ∇k 0.5 . Similarly, [15] proposes another combination: k 0 + ∇k 2 2 . Reference [16] borrows the idea of L 0 image smoothing method [30] to constrain the continuity in kernel gradient domain. The idea of restricting the kernel gradient has a weakness that it emphasize more on the local smoothness rather than the kernel continuity.

C. KERNEL POST-PROCESSING METHODS
In addition to the kernel prior, post-processing methods [3], [4], [31], [32] are also widely used to refine the estimated kernel, as shown in Table 1. A threshold is the simplest way to denoise the kernel after the kernel estimation. However, it is VOLUME 8, 2020 not efficient to remove noise that has higher intensity than the main trajectory. Some methods also remove noise using space consistency [31] or spareness evaluation [32], but these methods are for certain cases. [3], [4] use the noise pruning method to preserve and remove small non-zero regions after the kernel estimation. However, when those regions are connected to the main trajectory of the kernel, they can not be identified and removed by the noise pruning method. In a word, the post-process method can only refine the kernel, but the accuracy of the kernel mainly depends on the kernel estimation performance. Moreover, these two-step methods lack of a unified cost function, leading to the difficulties on the convergence and the global optimization.
In this paper, we propose a new kernel prior to preserve the kernel continuity. Different from the previous priors that preserve the kernel continuity in the gradient domain, we define the continuity of each element within a window and form a continuity map. By encouraging the map to be sparse, in a unified model, the unconnected kernel elements are suppressed.

III. THE KERNEL CONTINUITY PRIOR
In this section, we give details of the kernel continuity prior. A blur kernel k ∈ R f ×f describes the motion trajectory of a camera during the exposure time, in which non-zero elements denote positions that the camera goes through. Usually, the kernel element with higher intensity denotes a longer integration at the corresponding position. In a continuous period of exposure time, the camera trajectory is continuous. Consequently, non-zero elements of the kernel should be connected, which form the main structure related to the camera trajectory, as shown in Figure 1 (a). Therefore, we get two requirements of a possible kernel trajectory element: • The element intensity should be relatively high, indicating the photon integration during the exposure.
• The element should have connected neighbors with similar or higher intensities. Considering the above requirements, we measure the possibility of each kernel element and form a kernel continuity map.
First, we judge whether the intensity of each kernel element is high enough using a global threshold T g . We set T g = a · k max , where 0 < a < 1 and k max is the highest intensity of the kernel k. If the intensity of the current element is Algorithm 1 The Kernel Continuity Map Generation 1: Input: the blur kernel k ∈ R f ×f , the global threshold T g , the local threshold T l , the window size r 2: Output: continuity map M (k) 3: Initialize: the map M 4: for h = 1:f do 5: for v = 1:f do 6: if k(h, v)>T g then 7: select the r × r patch centering at (h, v). 8: counting the number n of elements that are 9: larger than T l in the patch. 10: end if 12: end for 13: end for higher than T g , the element meets the first requirement. Here, a lower T g is necessary for involving all the possible trajectory elements for evaluation. As a result, we set a = 0.05 which is used as an empirical threshold for removing smaller values in an estimated kernel in previous methods [3]- [5].
Second, we use a sliding window whose size is r × r to extract connected neighbors of each kernel element. A threshold T l = b · k c is used to evaluate all the neighbors in the window, where 0 < b < 1 and k c is the intensity of the center element in the window. Only neighbors with higher intensities than T l will be counted. The value of T l directly affects the continuity measurement of the kernel element, a relatively higher T l is necessary to leave out noise elements. So we set b = 0.1. The window size is set as r = 3 to ensure all the elements in the window are next to the center element. We denote the number n of satisfying elements within the window of each kernel element as the continuity metric. Larger n means the more connected neighbours and the higher possibility of the element belonging to the trajectory. In this way, we judge whether each kernel element meets the second requirement.
After all the elements of the kernel is measured, a map M (k) ∈ R f ×f will be formed. The steps of the map generation are shown in Figure 1 (c)-(d) and the detailed implementation is shown in Algorithm 1. Figure 1 (b) shows a kernel with noise. Noise is inevitable during the optimization and it degrades the continuity of the kernel. In the proposed continuity map in (d), kernel noise is imposed with lower values, even though the noise has higher intensity. It is the main difference from threshold-based methods. Figure 2 shows comparisons between the intensity maps and the continuity maps of kernels with and without noise. Obviously, all noise elements with higher or lower intensities are imposed with lower values in the continuity map. Thus, using the continuity map, we can easily distinguish noise.
To encourage the kernel continuity during the optimization, we encourage the continuity map to be sparse using L 0 norm. The proposed kernel prior is M (k) 0 . However, 46164 VOLUME 8, 2020 FIGURE 2. Comparison between kernel continuity map and intensity map. First row: real kernels and corresponding kernels with manually added noise. Second row: corresponding intensity maps in pseudo color. Third row: corresponding continuity maps in pseudo color. This figure shows the main trajectories are more separable from noise in the continuity maps than that in the intensity maps.
the non-convex and non-linear of the prior makes it hard to optimize a cost function with the prior. Next section will give detailed solver of it.

IV. FRAMEWORK AND OPTIMIZATION
In this section, the details of the kernel estimation framework is presented and we also give the approximate solver of the non-convex and non-linear cost function.
As mentioned in Section III, we use the L 0 norm to encourage the sparsity of the continuity map M (k). The cost function with the proposed prior is: where λ, γ and α are the weights to balance the terms.
To minimize (2), we alternatively update the blur kernel k while fixing x: and the image x while fixing k: During the alternative optimization, the updated x and the updated k in each iteration are denoted as intermediate image and intermediate kernel, respectively. The flowchart of our framework is presented in Figure 3. After the output kernel is estimated, the final deblurred image is obtained by the deconvolution of the input image with the output kernel.

A. ESTIMATING THE INTERMEDIATE BLUR KERNEL
Estimating k using (3) is not easy due to the combination of the non-linear function M (·) and the non-convex function L 0 . We use the half quadratic splitting [1] to solve the L 0 Algorithm 2 Kernel Estimation 1: Input:blurred image y, latent image x 2: Initilize: initial kernel k prev 3: Output: estimated kernel k 4: M T u = k prev 5: for j = 1:n k do 6: solve for k by Equation (9) 7: solve for u by Equation (8) 8: β = β × 2 9: end for regularized term by introducing an auxiliary variables u with respect to M (k). We rewrite the cost function as: where β is the penalty parameter. According to [1], when β is close to infinity, the solution of (5) is close to the solution of (3). Similar to [1]- [3], we update the value of β by β = β × 2 in each iteration as shown in Algorithm 2.
We minimize (5) by alternatively solving the following two sub-problems: estimating k while fixing u and estimating u while fixing k The details of the intermediate kernel estimation sub-model is shown in Algorithm 2. Solving (7) is relatively easy, because when k is fixed we can directly compute M (k), and then estimating u becomes an element-wise minimization problem [1] that can be solved as: The detailed implementation is shown in Figure 4. First, we transfer a noisy kernel to a continuity map u using M (·). Second, we use (8) to update u, consequently, noise is identified in the map u. Third, we transfer u back to a blur kernel k = M T (u) by setting noise to be zero in k. Equation (6) is a standard quadratic function that can be solved using FFT method [3]: whereF(x) is the conjugate of F(x).

B. ESTIMATING THE INTERMEDIATE IMAGE
Given k, we estimate x by minimizing (4). The chosen of the image prior φ(x) heavily affects the performance of the intermediate image estimation. Previous methods have used the image gradient prior by representing the image gradient distribution using L 0 norm [2], L 0.8 norm [5], L 1 norm, L 1 /L 2 norm [7] and also image intensity prior with L 0 VOLUME 8, 2020  Steps to encourage the sparsity of the continuity map u.
Step 1: transfer the kernel to the continuity map.
Step 2: update u in Equation 8.
Step 3: transfer the continuity map back to the kernel.
norm [3]. With rapid progress of the deep neural networks, some methods [24], [34] replace the above handcrafted priors with data-driven priors that are learnt from collected training datasets using neural networks.

C. ESTIMATING THE FINAL LATENT IMAGE
After estimating the blur kernel using Algorithm 2, we recover the final latent image using non-blind deconvolution method. Different from the intermediate image in Section IV-B that is used for kernel estimation, the latent downsample y to the current image pyramid to get y i 8: for g = 1:m do 9: solve for x by Equation (10) 10: solve for k by Equation (5) 11: end for 12: upsample k to fit the next scale 13: end for image is the final deblurred image with fine textures. As this paper focuses on kernel estimation, we use existing non-blind deconvolution method in [4] to recover our deblurred image.

D. MULTI-SCALE STRATEGY
In order to ensure the fast convergence, many previous methods use the multi-scale strategy [3], [4], [11], [16]. Because the blur could be reduced when the image is downsampled to coarse scales, moreover, the reduced blur makes it easier to estimate an accurate kernel. We also adopt the multi-scale strategy [11] to estimate the blur kernel by a coarse-to-fine pyramid of image resolutions. To get the blurred image y i in each scale of the pyramid, we downsample y using a factor √ 2 2 , which is similar to [16]. The implementation of the multi-scale strategy in our framework is shown in the Algorithm 3.

V. ANALYSIS OF THE PROPOSED KERNEL PRIOR
In this section, we analyze the effectiveness of the proposed kernel prior. First, we show the potential of an intermediate kernel to recover accurate strong edges in the intermediate image, which is key to conventional blind deblurring methods [2], [16], [20]. Second, the effect of the proposed kernel prior in improving the accuracy of the intermediate kernel is presented. Third, we compare the proposed prior with other methods preserving the kernel continuity. Last but not the least, we evaluate the convergence property of our method.

A. THE POTENTIAL OF AN INTERMEDIATE KERNEL IN RECOVERING STRONG EDGES
We revisit some previous edge selection methods [1], [16] and show that an accurate intermediate kernel is crucial for the strong edges recovery.
Similar to [1], we use an 1D signal with strong edges for illustration. Figure 5 (a) is presented in [1] to show the ambiguity of the recovered strong edges in the intermediate image.
It has a small-scale edge on the left and a large-scale edge on the right. The blue curve denotes the blurred signal and the green curve denotes the ground truth signal. The magnitude of blurred small-scale edge is much lower than the ground truth. As a result, without other information it tends to recover the signal to the red line instead of the ground truth green one. Edge selection methods [1], [16] will exclude the small-scale edge in the kernel estimation. However, if the blurred image does not has large-scale edges, these methods will fail.
In this case, an accurate intermediate kernel can provide information to recover the ground truth edge. If the intermediate kernel is as accurate as the ground truth kernel k GT , we can recover a sharp signalx by (10). Figure 5 (b) shows the potential of the accurate kernel. The recovered signal (red dashed curve) has much higher magnitude than the blurred one (blue curve) and is much close to the ground truth one, which means the ambiguity of the edge magnitude is well suppressed.
Even though the intermediate kernel may not be as accurate as the ground truth in practice, by improving the kernel accuracy in each iteration, small-scale edges can be accurately recovered in the intermediate image. Figure 6 (e) gives an example that our method can recover the small-scale edge better than the other methods (b)-(d) as shown in the red box.

B. ABLATION STUDY
We verify the effectiveness of the proposed kernel prior by experiments in this section.
To compare the deblur performance with and without the proposed prior, we set α = 16 and α = 0 in our framework, respectively. Figure 6 gives an example: an image full of small-scale edges, in which the sizes of most edges are smaller than the blur size. Figure 6 shows the final deblurred images, intermediate images and intermediate kernels in different iterations. To ensure each algorithm achieve their best results, we use different iteration numbers in the finest scale. (d) and (e) show the deblurred images and the intermediate images with and without the proposed prior, respectively. In (d), obvious ringing artifacts spread over the whole image, while (e) recovers faithful image edges. It is because without the proposed prior, noise of the kernel is not suppressed in each iteration (as shown in (h)) and the final estimated kernel contains severe noise beyond the main trajectory of the blur kernel (i = 1,g = 5). Moreover, the main trajectory is less accurate than the estimated kernel using the proposed prior as shown in the last image of (i). Using the proposed prior, noise of the kernel is progressively suppressed during iterations in (i). Consequently, the quality of the deblurred image improves a lot in (e). The comparison indicates that, the proposed prior is able to assist to estimate an accurate blur kernel in the challenging case that image is full of small-scale edges.
We also conduct a quantitative evaluation with the commonly used dataset [33]. The reason for choosing this dataset is that all the blur kernels in it are from real camera shake, which is more convincing. Table 2 shows the average PSNR and SSIM of the deblurred image using various methods. Apparently, our method with the proposed prior outperforms the method without the proposed prior in both PSNR and SSIM, which means more accurate blur kernels are estimated with the proposed prior. Table 1, previous methods that preserve the continuity of the blur kernel can be divided in two categories: 1) proposing a kernel prior. 2) removing noise as a post-process method. From each category, we choose a typical method for comparison. The first method is a similar VOLUME 8, 2020  L 0 gradient kernel prior [16] that preserves the smoothness of the kernel to encourage the continuity. We denote it as the smooth prior in our paper. The second one is the noise pruning method [3], [4] that remove noise elements after the kernel estimation. Here our framework without the proposed prior act as the baseline model. For the L 0 gradient kernel prior, we replace the kernel regularization term in our framework with the L 0 gradient kernel prior for comparison method. For the noise pruning method, we use it as the post-process step after kernel estimation in the baseline model. Figure 6 (b) and (c) show the deblurred images and the intermediate images of the two methods. (f) and (g) show the iterations of kernel estimates of the two methods. Obviously, the noise pruning method removes the isolated noise in the kernel background (shown in the last image of (f)), but it does not remove the noise connected to the main trajectory in each iteration. As a result, the deblurred image in (b) degrades. The smooth prior does not remove noise in the kernel (shown in the last image of (e)) and the main trajectory is thinner than others due to its smoothing effect. So the deblurred edges are still blur in (c).

As shown in
For the qualitative evaluation in Table 2, the noise pruning method [4] outperforms the baseline by improving the average PSNR value from 26.41 to 26.49, but its value are still lower than the proposed prior (26.96). The result of the smooth prior is even lower than the baseline model. We also combine both the proposed prior and the noise pruning method in our framework, but the result are worse than the one without using the noise pruning method. It indicates the noise pruning method has a negative effect when used together with the proposed prior.
To further evaluate the quality of the estimated kernel, we use the error ratio [33]: where x GT is the ground truth sharp image andx t is the estimated image using the ground truth kernel. The error ratio describes the similarity of the deblurred image using the estimated kernel and the deblurred image using the ground truth kernel. The smaller the value is, the higher similarity is. Figure 7 plots success rates of different error ratios in the same way as [33]. For the non-blind deconvolution method, we use a simple yet effective non-blind deconvolution method [33] to recover the final deblurred image. Each number in the y-axis shows the percentage of test images whose error ratio are below a certain threshold and x-axis lists   Figure 13 (f). A smaller kernel size needs fewer scales in the multi-scale strategy [11], leading to fewer iterations. Using the appropriate kernel size, our method converges to the good estimation. The larger kernel size leads to a biased solution, while the smaller kernel size does not converge. The convergence of our method using another two kernel initialization methods are also evaluated.
all the thresholds. It indicates that our method provides more reliable results than other methods.

D. CONVERGENCE PROPERTY
As our energy function is non-linear and non-convex, we quantitatively evaluate convergence properties of our method on the Levin dataset [33]. We evaluate the similarity [35] between the intermediate kernel of each iteration and the ground truth kernel in Figure 8. As our method is implemented in the multi-scale manner (details in Section IV-D), we unroll iterations in all the scales and list them in sequence. Our method converges in iterations of each scale, which results in the periodic increase in curves. The convergence of different kernel sizes (appropriate, larger and smaller) are evaluated. A smaller kernel size needs fewer scales in the multi-scale strategy [11], leading to fewer iterations. The visualized examples show that a larger kernel size may converges to a biased solution, because it increases the degree of freedom of kernel estimation [20]. But a smaller size makes the method difficult to converge, as shown in the blue line of Figure 8. Consequently, choosing an appropriate kernel size is essential and there are several attempts to automatically find it [20], [36], [37]. The effect of the first estimation on the convergence property is also evaluated. Our framework initialize the kernel by [7]. We also evaluate another two initialization methods: random and uniform. Different initialization methods lead to different first estimation of both images and kernels according to (4) and (3). Our model converges using any of the three initializations as shown in Figure 8. Table 3 shows average PSNR and SSIM on the Levin Dataset [33] of the three initializations. It indicates that using [7] leads to the best deblur performance.
Besides the kernel size and the first estimation, the image type also has an effect on the convergence. For example, as recovering strong edges in each iteration is key to the success of MAP-based methods, image types lacking of strong edges make the method difficult to converge. Some failure cases are shown in Section VI-D.

VI. EXPERIMENTAL RESULTS
We compare the proposed method with state-of-the-art methods [2], [4], [7], [16], [38]. Recent years has witnessed the rapid progress of deep learning methods in the field of low-level image processing, especially in image deblurring [38], [40], [41]. However, most of them directly map the blurred image to the deblurred one without estimating a kernel, which is one of the big differences between conventional optimization-based methods and deep learning methods. We choose the recent deep learning method SRN [38] for comparison.
We test the proposed prior on some challenging cases for visual comparison. The parameter settings of our framework are the same as Section V-B. Moreover, the large deblurring dataset [27] with 640 images is used for evaluating how the proposed prior improves the previous method [4]. We also evaluate the efficiency of our method by comparing average runtime and PSNR on Köhler Dataset [42].

A. SOME CHALLENGING CASES
We conduct the experiment on three challenging cases: images full of small-scales, the noisy-blurred image and the large blur image.

1) SMALL-SCALE EDGE IMAGE
We evaluate our method on two small-scale edge images. Figure 9 shows a synthetic blurred image with small-scale grasses. Our estimated kernel in (f) is the most similar one to the ground truth kernel in (a), while other kernels are not as thin as the ground truth one. As a result, their deblurred The kernel of (c) contains isolated noise around the main trajectory, so the results contains ringing artifacts too. The kernel in (d) is not as thin as the ground truth one. The deep learning method SRN [38] does not remove blur thoroughly.
image are degraded. The deblurred image of Xu et al. [2] in (b) contains obvious ringing artifacts around the boundary. The zoomed-in flower region of [16] in (c) is still blurred compared with ours in (f). Using the dark channel method [4] in (d), the deblurred image is over smooth as shown in the zoomed-in grass region and also there are black dots spreading over the flower region. The result of SRN [38] does not remove the blur. Unfortunately, the deep learning method does not show its priority in recovering small-scale edge image. Without the guidance of the blur kernel, SRN [38] can not accurately find the original position of image edges, which leads to the blurred result. Our method not only removes the blur accurately but also recovers natural sharp image with fine textures. Figure 10 shows a real blurred image of building. Different from the grass in Figure 9, the edges of Figure 10 are stronger. For convenience, we use the same notation i and g as our framework, because the three methods share the same optimization framework. In (b) and (d), kernels are less accurate and deblurred images contains obvious ringing artifacts. The kernel of (c) contains isolated noise around the main trajectory, so the results contains ringing artifacts too. Ours achieves the best performance in both the kernel estimation and the deblurred image. For better comparison, we show the iteration numbers of Dark Channel [4] and Salient Structure [16]. We refer the iteration numbers in their published codes. We do not show iteration numbers of other conventional methods due to the lack of published codes. As most MAP-based methods share the same optimization framework, we use the same notation i and g to denote their iteration numbers. Coincidentally, all the competitors use the same iteration numbers. Kernel noise are obvious in (b), (c) and (d). Moreover, (b) and (d) fails to estimate the main trajectory of the kernel, which leads to ambiguous edges of the building. Without estimating a kernel, SRN [38] does not remove the blur accurately. Our method can estimate both the clear kernel trajectory and the natural sharp image.

2) NOISY-BLURRED IMAGEZ
Noisy-blurred image is another challenge for most deblurring method. It is because noise in the image can be easily regarded as strong edges, which leads to errors in the kernel estimation. Figure 11 shows deblurred results of an noisy-blurred image, in which Zhong et al. [39] is a kernel estimation method with the noise handling strategy. We also show the iteration numbers of each methods VOLUME 8, 2020 as Figure 10. For the estimated kernels in the second row, neither Zhong et al. [39] nor Dark Channel Prior method [4] estimate the main trajectory of the kernel, as a branch of the trajectory is lost in their red boxes. Consequently, the blur is not well removed in (c) as shown in the zoomed-in red box and the cloud in the blue box is absent in (d). Both our method and the salient structure [16] estimate accurate kernels and outperform other methods. However, different from [16] that involves both the kernel continuity preservation and the edge selection strategy, our method only use the kernel continuity preservation to estimate the kernel. The performance shows that even though our method is not designed for noisy-blurred image, the continuity preservation makes the kernel estimation more robust to image noise.

3) LARGE BLUR IMAGE
The large blur is difficult to remove due to the large kernel size involving more unknown elements to be estimated. The image in Figure 12 is a real captured photo from [16]. The blur size is around 64 pixels, which is larger than those of Levin dataset [33] whose blur sizes are all under 30 pixels. (b) shows the deblurred result of Xu et al. [2], in which the blur is not well removed as in the red box and there are obvious ringing artifacts in the blue box. The result in (c) also shows obvious ringing artifacts in both the two zoomed-in regions. In (d), the zoomed-in region of the red box is over-smooth and most fine textures are lost. The result of SRN [38] are still blur. Our result outperform others by recovering a sharp image with fine textures. We also evaluate our method on Köhler Dataset [42] in Section VI-C, in which more large blur images are quantitatively evaluated.

B. PERFORMANCE IMPROVEMENT WITH OUR PRIOR
The proposed kernel prior can also be used separately in any previous methods that use the same MAP-based framework. By replacing their kernel estimation sub-model with ours in Section IV-A, the proposed kernel prior can improve the performance of kernel estimation of previous methods. In this experiment, we choose the Dark Channel Prior method [4] as the baseline as it has the published code and achieves best performance in conventional optimization-based methods. The method uses the noise pruning as post-process method to remove noise in the estimated kernel. By using our kernel prior, we disable the noise pruning method in [4] for fair comparison. We test its performance on Sun Dataset [27] with 640 blurred images. All the 640 images can be divided into 8 groups, in which images in each group share the same blur kernel. The 8 blur kernels are shown in Figure 13. All the images in Sun Dataset [27] are nature images with forests, lakes or buildings.
We show the average PSNR values of each group in Table 4. The PSNR of Krishnan et al. [7] is reported in benchmark [27]. We evaluate Pan et al. [4] and Tao et al. [38] on the dataset using their published code. We replace the kernel estimation sub-model of Pan et al. [4] with ours and evaluate the performance on the same dataset. Using our prior, the average PSNR values are all higher than Pan et al. [4], especially for the larger kernels: k4, k6, k7 and k8. Using the proposed kernel prior, the average PSNR values improve in all the 8 kernels, especially in kernels with larger size: k4, k6, k7 and k8. It indicates that for images not belonging to challenging cases, our proposed prior can still benefit the kernel estimation and improve the performance of previous method.

C. EFFICIENCY
The proposed prior can be implemented more efficiently than other conventional methods. Our experimental environments are MATLAB2016b, Intel Core i7 and CPU @ 4.2GHZ*8.    The PSNR vs. runtime of recent conventional deblurring methods [4], [5], [7], [16] and the proposed prior on the Kohler Dataset [42]. Even though the average PSNR of ours is a little lower than the Dark Channel method [4] (28.05 vs. 28.44), the proposed prior is almost 3 times faster (370s vs. 990s).
To further evaluate the efficiency of the proposed prior, we compare the average runtime and PSNR of the kernel estimation model of several conventional methods [4], [5], [7], [16] on the same environment. For fair comparison, we use the same non-blind deconvolution method [4] to deblur images with their estimated kernels. We run the published code of each competitor on the Köhler Dataset [42]. The dataset contains 48 images whose sizes are 800 × 800 and their kernel sizes are from 31 to 145. We compute the average PSNR values and average runtime of each method as shown in Figure 14. The average PSNR value of our method is a little lower than the first rank, but ours is almost 3 times faster than it.

D. FAILURE CASES
As mentioned in Section V-A, the proposed kernel prior assists to recover strong and small-scale edges in the intermediate image, which improves the accuracy of the estimated kernel. Thus, the proposed prior will fail if the blurred image only contains fine textures without strong edges, because the fine textures are removed during the intermediate image estimation and few information can be used to estimate the kernel. Figure 15 shows two failure cases. As the two input images only contain weak and fine textures, the proposed method fails to estimate the blur kernel, which leads to degraded deblurred results.

VII. CONCLUSION
In the view of the significance of the kernel continuity description and preservation, we propose a kernel prior to preserve kernel continuity. The prior forms a continuity map of the kernel and encourages the sparsity of the map to preserve the kernel continuity. To deal with the non-convex and non-linear prior, we also propose an approximation to solve the cost function, which makes our method more efficient than many other conventional methods. Solid experimental results show that the proposed prior is able to help estimate a more accurate kernel even in some challenging cases: small-scale blurred images, noisy-blurred images and large blur images. Also, the proposed prior can further improve the performance of previous method by leveraging our kernel continuity prior. YANNING ZHANG received the B.S. degree from the Dalian University of Science and Engineering, in 1988, and the M.S. and Ph.D. degrees from Northwestern Polytechnical University (NPU), in 1993 and 1996, respectively. She is currently a Professor with the School of Computer Science, Northwestern Polytechnical University. She is also the Organization Chair of the Ninth Asian Conference on Computer Vision (ACCV2009). Her research works focus on signal and image processing, computer vision, and pattern recognition. She has published over 200 articles in international journals, conferences, and Chinese key journals. VOLUME 8, 2020