Blind Text Image Deblurring Algorithm Based on Multi-Scale Fusion and Sparse Priors

The goal of blind text image deblurring is to obtain a clean text image from the given blurry text image without knowing the blur kernel. Sparsity-based methods have been shown their effectiveness in various blind text image deblurring models. However, the blur kernel estimation methods based on sparse priors lack of the consideration for the brightness information about the blur kernel, which will affect the restoration effect of the blur kernel. Besides, previous methods seldom apply sparse priors to both spatial domain and transform domain information. We propose a novel blind text image deblurring model based on multi-scale fusion and sparse priors. Besides the sparse gradient prior on the latent clean text image, we add the sparse prior on the high-frequency wavelet coefficients of the latent text image, which will better constrain the solution space and obtain good clean images. The semi-quadratic splitting method is used to alternately optimize the blur kernel and the latent clean image. Meanwhile, we consider the influence of the brightness feature of the restored blur kernel. By multi-scale fusion technique on the basis of Laplacian weight and saliency weight, we fuse the computed blur kernels in three channels to improve the quality of blur kernel. The experimental results show that our algorithm has good results in the restoration of blur kernels and text images.


I. INTRODUCTION
With the rapid rise in the performance of computers and mobile devices, text images captured by handheld cameras carry more and more important information. However, in the process of image acquisition, text images inevitably have a blur effect, due to the interference of geometric aberrations, camera shake and object motion. The qualities of text images are significantly reduced. Therefore, it is very important to obtain the clean text images recovered from blurry text images [1], [2].
The method to recover a latent clean image from the blurry image can be modeled as where B ∈ R m×n×3 , U ∈ R m×n×3 , K ∈ R s×t and N ∈ R m×n×3 are respectively the blurry image, the latent clean The associate editor coordinating the review of this manuscript and approving it for publication was Muhammad Sharif .
image, the blur kernel and additive noise, ⊗ represents convolution operator. According to whether the blur kernel is known or not, image deblurring is divided into blind deblurring and non-blind deblurring. Blind image deblurring aims at recovering the blur kernel and the latent clean image, simultaneously. Blind image deblurring problem is a highly ill-posed problem because there are more unknown variables than known conditions. Therefore, it is necessary to add reasonable prior constrains on the latent clean images and blur kernels. In recent years, the growing number of scholars have proposed many methods for the blind natural image deblurring problem. However, the traditional blind image deblurring algorithm can not be directly applied to the blind text image deblurring problem because of the special characteristics of text images. In order to solve the blind text image deblurring problem, additional priors and constraints need to be added to the model (1). Prior constraints, including the L 0 , L 1 and L 2 -norm priors on the text image gradient or text image itself, were introduced in [3], [4], [5], and [6]. Although sparsity-based methods can achieve good deblurring effect, there is still room for improvement. These methods only concern on the sparsity of the image or image gradient domain, and ignore the sparsity of image transform domain information. These sparsity-based methods do not use the brightness characteristics of the recovered blur kernel. Figure 1 shows an example where the real-world blurry text image that can not be successfully recovered by the sparsity-based methods in [7], [8], [9], and [10].
In this paper, we make use of the sparsity priors of text images proposed by Fang et al. [10]. Beyond that, we add the sparsity prior of the high frequency wavelet coefficients for clean text images. Figure 2 shows the distributions of high frequency wavelet coefficients of a text image blurred by different blur kernels. Figure 2(a) is a clean text image. Figure 2(b) and Figure 2(c) represent the blurry text images after adding different blur kernels. Figure 2(d) represents the sparsity of the high frequency wavelet coefficients of the clean text image. Figure 2(e) and Figure 2(f) express the sparsity of high frequency wavelet coefficients of the blurry text images. From Figure 2, we can see that the sparsity of the high frequency wavelet coefficients of the clean image is greater than the sparsity of the high frequency wavelet coefficients of the blurry image.
The estimation of blur kernels is crucial in the process of blind deblurring of text images. In 2014, Pan et al. [7] designed a method for estimating the kernel based on L 2 -norm sparse prior. In 2020, Fang et al. [10] incorporated the L 0 -norm for regularizing the blur kernel because the L 0 -norm can describe the kernel sparsity distribution.
Besides, Fang et al. proposed a kind of skeleton extraction technique for filtering out the noisy pixels in the recovered blur kernels. However, these methods don't take into account the effect of blur kernel brightness on the recovery of clean text images.
Inspired by the work of Ancuti et al. [11], we use multiscale fusion to deal with the influence of the brightness information of the recovered blur kernel. On the basis of Laplacian weight and saliency weight, we fuse the computed blur kernels of three channels to adjust the brightness information of the recovered blur kernel.
In this paper, we propose a blind text image deblurring algorithm based on Laplacian pyramid multi-scale fusion and sparse priors. By embedding the multi-scale fusion algorithm, we enforce L 0 -norm to the blur kernel, the gradient and wavelet coefficients of the latent clean images. We use the semi-quadratic splitting method to alternately optimize the blur kernel and the latent clean image. We add the sparsity priors about the high frequency wavelet coefficients of clean latent images to improve restoration of image textures. Further, we utilize the multi-scale fusion algorithm to improve the quality of recovered blur kernel.
The main contributions of this paper can be summarized in the following aspects: 1) In the process of blur kernel restoration, a blur kernel restoration technique based on multi-scale fusion is provided.
2) In the process of text image restoration, the sparse prior of high-frequency wavelet coefficients of the latent clean text image is added to the model proposed by Fang et al. [10].
3) The experiment results show that our proposed algorithm has advantages over the contrast algorithms.  The rest chapters of this paper are designed as follows. Section II summarizes the related works. The proposed method is given in Section III. Section IV presents the experimental results and Section V concludes this paper.

II. RELATED WORK
For the blind deblurring of natural images, the image sparsity with the dictionary learning has been concerned in [12] and [13]. However, the quality of image restoration depends heavily on the quality of dictionary learning.
Therefore, most of researches tend to restrict image priors. Shan et al. [14] utilized local smoothness sparse prior to reduce ringing artifacts. Krishnan et al. [15] used the L 1 and L 2 -norms to regularize the latent clean image. Amizic et al. [16] adopted the gradient prior of the blurry image and the total variational prior of the blur kernel to simultaneously estimate the latent clean image and blur kernel. Ren et al. [17] utilized the low-rank property of the intensity and gradient of the latent image to estimate the latent clean image.
Wavelet transform and Laplacian prior have been widely used in the field of image processing. Based on wavelet corrected transform, Jamadandi and Uma [18] proposed an efficient deep learning algorithm to enhance underwater images. Wu et al. [19] proposed a new deblurring method of two-level wavelet convolution neural network embedded with discrete wavelet transform. Chen et al. [20] used the super-Laplacian prior method to acquire the structural information of non-locally similar patches. To surmount the ill-posedness of blind image deblurring problem, Almeida and Almeida [21] proposed a super-Laplacian prior as the sparse constraint of image gradient on the basis of L P -norm. In order to improve the texture information of the restored image, we add the sparsity prior on the high frequency wavelet coefficients of the latent clean image.
Multi-scale fusion methods are also often used in the area of image processing. Using a kind of Laplacian pyramid representation, Ancuti and Ancuti [22] incroduced a multi-scale fusion algorithm to image defogging. By the multi-scale fusion strategy, Ancuti et al. [11] proposed an effective multi-scale fusion method to enhance underwater degraded images. Using the multi-scale fusion strategy, Jiang et al. [23] constructed the multi-scale pyramid structure for single image rain streak removal. On the account of U-Net architecture, Dong et al. [24] proposed a multi-scale boosted dehazing network with dense feature fusion to solve the image dehazing problem. In order to fully consider the influence of the brightness of the blur kernel on the recovery of clean images, we use a multi-scale fusion algorithm to process the blur kernel.
Compared to natural images, text images are usually characterized by sparse tone, small contrast change, and constant color in the stroke area. Accordingly, text image deblurring methods are slightly different from natural image deblurring methods. By the L 0 -norm regularization, Pan et al. [7] developed the intensity and gradient priors for text image deblurring, and proposed an effective blur kernel optimization method. On the basis of Pan's text image deblurring algorithm, Fang et al. [10] proposed a new blind text image deblurring algorithm by performing L 0 -norm constraint on the blur kernel. Specially, Fang et al. [10] designed the skeleton extraction technique to deal with the computed blur kernel.
Text image motion deblurring is a significant problem in the extensive image deblurring area [25]. There are various methods to deal with this problem, including image segmentation methods, sparsity-based methods, deep learning methods and so on.
The image segmentation methods make use of the characteristics of the background layer and text layer to implement blind deblurring of text images. Chen et al. [26] took the obvious difference in the intensity between the text image and the background image to segregate the blurry text image. Cho et al. [27] used the stroke width transform algorithm [28] to process the text and background parts of the text image respectively. Cao et al. [29] used the dictionary learning method to separate text image and background part.
Sparsity-based methods use the prior sparsity to add constraints in the blind deblurring model. Li et al. [3] proposed a framework for removing motion blur from blind images based on L 0 norm constraint. On account of the specific property of latent clean images, Fang et al. [30] developed a new regularization term for plate images deblurring. By combining the dark channel and light channel, Wen et al. [4] proposed the L 0 regularization blur kernel estimation method that can measure the similarity between adjacent kernels. Yang and Wu [31] analyzed the influence of blur kernel on the image contrast and proposed a dual channel contrast prior algorithm. Hsieh et al. [5] proposed an imposed zero patch minimum constraint in the blind deblurring model. Combing the multi-scale recurrent network and sparse representation, Li et al. [32] proposed a novel edge extraction model to preserve the edge of latent images. Ge et al. [6] performed the L 1 -norm to a nonlinear channel prior of latent clean images, which is defined by the ratio of the dark channel to the bright channel. However, the above algorithms only consider the sparsity of the image itself or the image in the gradient domain, and these methods do not take full advantage of the sparsity of clean text images.
With the great success of deep learning in computer vision, some deep learning deblurring methods have been proposed. By combining multi-scale pyramid feature and attention mechanism, Zhang [33] constructed a generative adversarial network to obtain latent images. Esmaeilzehi et al. [34] proposed a two-stage convolutional network to carry out the processes of up-sampling and deblurring. For the dynamic scene blind deblurring problem, Wan et al. [35] proposed a novel multi-scale channel attention network by using the spatial pyramid pooling channel attention strategy. Chakrabarti [36] presented a new blind motion deblurring network by predicting the Fourier coefficients of the deconvolution filter. Kupyn et al. [37] presented an end-to-end generative adversarial network model for motion deblurring. Zhang et al. [38] proposed a blind motion deblurring network composed of three deep convolutional neural networks and a recurrent neural network. However, when the brightness of the blur kernel is high, the deep learning deblurring methods still can not achieve a good recovery effect.

III. PROPOSED METHOD
We propose a joint sparsity prior and multi-scale fusion model for blind text image deblurring, which contains the sparsity priors for latent clean image gradient, blur kernel intensity and high frequency wavelet coefficients of the latent clean image. In this paper, the L 0 -norm of a matrix, denoted by ∥ · ∥ 0 , is used to characterize the sparsity of the priors, since L 0 -norm is the number of non-zero elements in the matrix.
In this paper, the blur kernel K is a s-by-t dimensional grayscale image, which satisfies the following condition Let W denote the wavelet transform, then WU represents the wavelet transform of the latent image U . For simplicity, let (WU ) H denote the high frequency wavelet coefficients of the image U . In this paper, we use the single-level wavelet decomposition to compute (WU ) H , namely, where m × n × 3 dimensional matrices (WU ) H ,1 , (WU ) H ,2 and (WU ) H ,3 respectively denote the horizontal, vertical and diagonal detail coefficients. Let ∇ = (∇ x , ∇ y ) T represent the gradient operator with x direction partial derivative operator ∇ x and y direction partial derivative operator ∇ y . Let ∇U = ∂U ∂X , ∂U ∂Y ∈ R m×2n×3 denote the gradient image of the latent clean U . We propose the blind deblurring framework of text images by incorporating sparse priors, namely, where α U and α K are the corresponding positive weight par-ameters.
We solve the optimization problem (5) by the half quadratic splitting method [39] joint with blur kernel skeleton extraction and multi-scale fusion technique.

A. BLUR KERNEL ESTIMATION
Different from traditional methods, we respectively estimate the blur kernels in R, G, B channels. In other words, we com-puteK ∈ R s×t×3 that is the solution of the following optimization problem (6) VOLUME 11, 2023 We use the half quadratic splitting algorithm [39] to solve the optimization problem (6). By introducing a s × t dimensional auxiliary variable Z , we rewrite the problem (6) as The problem (7) can be solved by estimatingK and Z alternatively.
Given Z an dU , we computeK by solving We use the method proposed by Pan et al. [7] to solve problem (8).
Obviously, the problem (8) is a classical convex optimization problem, which can be effectively solved by Fourier transform. The closed solution of problem (8) is where F(·) and F −1 (·) respectively denote Fourier transform and Fourier inverse transform, F(·) is the complex conjugate of F(·), • is the element-wise multiplication operator.
GivenK and U , we compute N by solving According to the hard thresholding, the approximate soluteon of (10) can be computed by where T α K /θ 1 (·) is the hard threshold operator defined by

B. BLUR KERNEL RESTORATION
The blur kernel computed by optimization algorithm may have some noise. Therefore, we use the skeleton extraction method [10] to denoise the compute blur kernelK ∈ R s×t×3 . Firstly, the camera trajectory of the blur kernel is extracted by using the skeleton extraction algorithm. Then the width of the blur kernel is calculated. Finally, weight of the width about the blur kernel is constrained by using a Gaussian mask, and the noise outside this range is removed.
Since the brightness of the blur kernel has a certain impact on the image restoration effect, we adopt the multiscale fusion strategy to fuse three channel blur kernels in order to correct the brightness difference between the restored blur kernel and the original blur kernel. The fusion process is shown in Figure 3.
First, for the blur kernel in each channel, we compute the Laplacian weight by calculating the absolute value of the Laplacian filter, and the significance weight using the significance estimation method proposed by Achanta et al. [40]. We compute the normalized weights of the blur kernel of each channel by where δ is the regularization term, Lw i and Sw i respectively denote the Laplacian weight and significance weight. Second, we implement multi-scale fusion of the blur kernel by weighted Gaussian pyramid and Laplacian pyramid. We use a low-pass Gaussian filter for each layer of the pyramid image generated by each channel of the blur kernel. We subtract a high sampling version of the low-pass image from the input image, and use the low-pass image as subsequent input image. Finally, we apply the Laplace reconstruction method to generate the fused single channel blur kernel K ∈ R s×t .

C. TEXT IMAGE RECONSTRUCTION
For the clean text image reconstruction part, we add the sparsity priors on the gradient domain and transform domain of the latent clean image, namely, According to the half quadratic splitting algorithm [39], we introduce an auxiliary variable V with the same dimension as (WU ) H , and an auxiliary variable M with the same dimension as ∇U .The problem (14) is rewritten as We solve problem (15) by alternatively optimizing the variables M , V , U while fixing the other variables. Given the fixed V and U , we compute M by Similar to problem (10), the approximate solution of problem (16) is Given the fixed M and U , we compute V by Similarly, the approximate solution of problem (18) is Reconstruct clean text image U by solving problem (15).
Given the fixed M and V ,we compute U by Similar to problem (8), the closed-form solution of (20) is where W * is the conjugate transpose matrix of the wavelet transform matrix, and F M is defined by      of our algorithm are shown in algorithm 1, where bilateral(·) denotes the bilateral filter operation.

IV. EXPERIMENTAL RESULTS
All experimental results are calculated in MATLAB 2020a environment. Algorithm 1 shows the main steps for the blind text image deblurring process. In the algorithm: 1)Level is determined by the size of the minimal pyramid image of K , which is set in the same way as Pan et al. [7]; 2) Iter is set to be 5, which is set in the same way as Fang et al. [10]; 3)The coefficients of constraints α U , α W and α K are all set as 0.004; which is set in the same way as Fang et al. [10]; 4) We assign 0.0011 and 4e −7 to θ 1 and θ 2 , respectively. θ 3 is set to be 0.002, which is set in the same way as Fang et al. [10].

A. RESULTS ON SYNTHESIZED IMAGE DATASETS
To verify the effectiveness of our proposed algorithm, the dataset mentioned in [10] was selected. The blur kernel recovery effect and the blind image deblurring effect of our algorithm is compared with the methods proposed in [7], [8], [9], and [10]. The visual comparison results on test images are provided in Figure 4 - Figure 8. Apparently, our proposed algorithm can remove unpleasant visual artifacts effectively and recover more details than the comparative algorithms. Figure 4 shows the experimental results of blind deblurring of classic text images. It can be seen from the experimental results that the algorithms proposed in [7], [9], and [10] can restore clean text images to a certain extent, but there will be ringing effect. The algorithm proposed in [8] can reduce ringing effect as much as possible, but there is a certain difference between the restored blur kernel and the original blur kernel in brightness information. The restoration effect of our proposed algorithm is shown from Figure 4 Figure 8 are the text images containing more complex background information. Using these images for experiments can increase the applicability of our algorithm. From the experimental results, it can be seen that the method proposed in [7] tends to lead to the ringing effect. The methods provided in [8], [9], and [10] can reduce the ringing effect, but it can easily induce the loss of detailed information of the recovered text images. Figure 5   the experimental results of our proposed algorithm. It can be seen that our algorithm can reduce the ringing effect while retaining more detailed information.
We employ both the peak signal-to-noise ratio (PSNR) index and structural similarity ratio (SSIM) index as the image quality metrics to evaluate the restored images and blur kernels. The numerical results are shown in Figure 9-11. From the numerical results, we can see that our proposed algorithm achieves the highest performance than other comparative methods.
The average SSIM improvements of our algorithm for the restored image and blur kernel over the methods in [7], [8], [9], and [10] are respectively 0.1237, 0.0477, 0.0546, 0.0357 and 0.1044, 0.0529, 0.0360, 0.0226. Meanwhile, we compare the running time of five leading blind text image deblurring methods and the results are shown in Table 1.

B. RESULTS ON REAL IMAGES DATASETS
In this section, we choose real blurred text images to evaluate our algorithm. Unlike the dataset of Pan et al. [7], we do not know the true blur kernel of these blurred text images. Figure 12-14 demonstrates the recovered images from a real blind text image provided in [41]. Our method achieves the best visual effect. For example, we can clearly see ''214'' in the red close-up from the recovered image by our method.
At the same time, we compare the running time of five leading blind text image deblurring methods and the results are shown in Table 2.

V. ANALYSIS AND CONCLUSION
In this paper, we design a blind deblurring algorithm for text images based on Laplacian pyramid multi-scale fusion and L 0 sparse prior framework. Our algorithm considers the effect of restored blur kernel luminance features on the restored text images. On the basis of blur kernel recovery method proposed by Fang et al. [10], we propose a multi-scale fusion method which can keep curvilinear trajectory and luminance information of the blur kernel unchanged. Moreover, we add the L 0 sparse prior for the high frequency wavelet coefficients of clean text images to the deblurring model proposed by Fang et al. [10]. Considering the sparsity of both image gradient domain and wavelet domain can improve the effect of image restoration. The experimental results show that our algorithm has improved the recovery effect of the brightness information of the blur kernel compared with comparative algorithms. The visual effect of our algorithm is also better than comparative algorithms.
In order to analyze the effectiveness of the algorithm in this paper more comprehensively. In the following two sections. We present the effects of input parameters on the performance of the algorithm and the limitations of the algorithm in this paper, respectively.

A. EFFECT OF INPUT PARAMETERS
We analyze the effects of input parameters θ 1 and θ 2 in energy functions (6) and (15) on the image deblurring performance in our algorithm. For the parameter θ 1 , we fix θ 1 = 4e − 7. The value of θ 1 varies from 0.0006 to 0.0016 with an increase step of 0.0001. We conduct experiments on Pan's dataset [7] to compare the magnitude of PSNR and SSIM values of recovered clear text images at different values of θ 1 . The experimental results are shown in Figure 15 (a) and (b). From the experimental results, we can conclude that the PSNR and SSIM values are the largest when θ 1 = 0.0011. Meanwhile, we take the same approach for verify the reliability for choosing the parameter θ 2 .The experimental results are shown in Figure 15 (c) and (d). From the experimental results, it can be concluded that the blind deblurring effect of our algorithm is best when θ 1 = 0.0011 and θ 2 = 4e − 7.

B. LIMITATIONS
Although our algorithm can achieve a better blind deblurring effect for text images. However, it can be seen from Table 1 and Table 2 that our algorithm still has some gap in running time compared with the methods in Pan et al. [7] and Fang et al. [10]. This may be because our algorithm is divided into three layers to do when recovering the blur kernel. The estimated three blur kernels are then fused using a multiscale fusion algorithm to generate the final blur kernel. Our future work will focus on exploring more efficient blur kernel recovery algorithms to greatly improve the running speed.