Wavelet-Based Enhanced Medical Image Super Resolution

Low-resolution medical images can seriously interfere with the medical diagnosis, and poor image quality can lead to loss of detailed information. Therefore, improving the quality of medical images and accelerating the reconstruction is of particular importance for diagnosis. To solve this problem, we propose a wavelet-based mini-grid network medical image super-resolution (WMSR) method, which is similar to the three-layer hidden-layer-based super-resolution convolutional neural network (SRCNN) method. Due to the amplification characteristics of wavelets, a stationary wavelet transform (SWT) is used instead of a discrete wavelet transform (DWT). Also, due to the nature of redundant (scale-by-scale) wavelets, it is possible to retain additional information about the image and restore high-resolution images in detail. For a large amount of training data, wavelet sub-band images, including approximation and frequency subbands are combined into a predefined full-scale factor. The mapping between the wavelet sub-band image and its approximate image is then determined. In order to ensure the reproducibility of the image, a method of adding a sub-pixel layer is proposed to realize the hidden layer, and replacing the small mini-grid-network on the hidden layer is of considerable significance to speed up the image recovery speed. Experimental results on the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) show that the model has better performance.


I. INTRODUCTION
The medical imaging system provides detailed information about the anatomy of human organs and the functions of human organs. Typical conventional medical imaging systems for expert diagnoses, such as MRI, CT, PET-CT, and Ultrasound [1]- [3]. However, these images are usually in low quality and lack of internal information. Due to hardware and current imaging technology limitations, medical professionals and researchers prefer image super-resolution processing technology for medical diagnosis [4]. The Single Image Super-Resolution (SISR) problem is considered very complex in theory because the number of unknown variables in the High-Resolution (HR) image is better than in the Low-Resolution (LR) image. To solve this problem, scientists have The associate editor coordinating the review of this manuscript and approving it for publication was Yong Yang . introduced several techniques in the field of Super-resolution, which is mainly divided into three aspects based on edges [5], interpolation-based methods [6], and sample-based methods [7]- [9].
The sparse-based implementation method first proposed by Yang [10], and it often takes the first place in the field of high-quality rehabilitation. Later, Yang et al. [11] introduced an improved popular technique for image super-quality through sparse representation. In that article, the authors believe that the image block can be well represented by appropriate dictionary selection. Inspired by this observation, we look for a scarce view for each low-resolution input patch and then use the coefficients of this view to producing high-resolution output. Nowadays, researchers are more interested in the neural network, deep learning-based approach for solving the SISR problem, due to the enormous capacity (payload) of the neural network model, and holistic learning. These neural networks help to acquire the functionality applied in previous methods and also improve many deep learning algorithms. These improved Deep neural network DNN methods are cost-effective and significantly reduced with sufficient quality.
Our proposed method is inspired by the category of the wavelet domain SISR algorithms [12]- [15]. Many of these algorithms provide convenient performance results. However, their computational cost is too high. In [12], the author introduces a wavelet dictionary learning algorithm that learns a compact dictionary for single image super-resolution (SISR). Later, a related method, the DWT dictionary learning method [13] was introduced, which was inspired by DWT technology. Deeba [14] used wavelet properties in conjunction with coupled dictionary learning methods. Most of these algorithms produce excellent results. However, their calculation costs are high. As deep learning algorithms grow, acceptable quality increases significantly, and computational costs reduce.
For a deep convolution network, the most reliable method is SRCNN [8], which aims to extract high-resolution image from the low-resolution image by CNN. SRCNN [8] in the wavelet domain used to improve excellent visual effects [15]. In SRCNN [8], the authors used three-layer network architectures to learn the complex nonlinear mapping between HR and LR image patches. After that, a deep network architecture was proposed by [16], authors use residual images for training instead of using HR and LR images, and adjustable gradient clipping to increase the convergence of their algorithm. Besides, the same author [8] proposed an accelerated version of the SRCNN [8] algorithm called the Fast Super-Resolution Convolutional Neural Network (FSRCNN) [17] algorithm, which can obtain better results without interpolation between LR and HR images. This will reduce the mapping in the feature learning steps. Shi [18] proposed the first real-time image and video super-resolution using the sub-pixel convolutional neural network named as ESPCN [18]. However, compared to the SRCNN method, the ESPCN method lacks context information after reconfiguration. Super-resolution with multiple degradation algorithm SRMD by deep network model was proposed by [19], with the use of degradation maps obtained using the dimensional reduction analysis of necessary components (PCA) and then stretching. In this way, they learned one network model for many scales.
Inspired by the three-layer network SRCNN [8], Gao et al. [4] use a deep convolution network to achieve super-resolution of a single image on a medical data set. The treatment diagnosis can be further improved by reconstructing these data sets. Although better quality can be obtained, it usually costs more to restore the HR images. Shortening the image recovery time has become an issue and should be resolved instantaneously. The authors in [15] introduced the DWT wavelet domain-based deep learning method and achieved good results, but the author did not make full use of the potential of deep learning and wavelet performance. Authors [16] introduced the fast medical image super-resolution by speeding the network of [18].
Our proposed wavelet-based approach integrates the benefits of end-to-end network learning with the potential of model capacity [16] along with wavelet properties such as redundancy, directionality, sparsity [20], [21] and etc. To apply the wavelet analysis, we choose the SWT wavelet instead of the DWT due to its upsampling property to preserve more context information. Our proposed wavelet-based minigrid network for the medical image super-resolution method focuses on the faster reconstruction as well as enhancing the visual quality more significant for better diagnosis purposes. Therefore, in order to improve the visual quality of the algorithm, we adopted SWT and combined the network with sub-pixel convolution and mini-grid network to reduce the super-resolution time. Specifically, we implemented three hidden layers to maintain information while training the image.
For model training, we designed a wavelet domain deep neural network architecture that trains the network between approximate wavelet subbands and their corresponding subband images. It can be clearly seen from the experiment that by adding wavelets in the end-to-end network, the visual performance of the image can be improved at a reasonable cost, thereby realizing the task of the SISR. By using a public dataset, the proposed method is well compared to the latest algorithms. The quality of the visual measurements is analyzed by the peak signal-to-noise ratio (PSNR) and structural similarity index measurements (SSIM).
Below we first explain how to propose and improve the speed of the network in the wavelet domain. Then, we outline how we conducted the experiment and described the experimental results in the next section. At the end of this article, we summarized the proposed approach and stated the future work.

II. PROPOSED METHOD
In this paper, a mini-grid wavelet-based model is proposed, due to unique characteristics of wavelets, including wavelet sub-bands are sparse, and they exploit multi-scale modeling. We chose the SWT wavelet because of its upscaling features, so the size of the wavelet subband remains the same while retaining the details of the wavelet subbands. Figure1 shows DWT and SWT decomposition, and the technique used in these [22]- [25] regarded the LR image as a wavelet approximation image of the corresponding HR image.
Here wavelet analysis filters are represented as h 1 p ,h ,2 q ,g 1 p ,g 2 q where A s−1 (p, q) , H q−1 (p, q) , V q−1 (p, q) , and D s−1 (p, q) , are wavelet sub bands approximation, horizontal,  vertical, and diagonal, respectively. The visual decomposition is represented in Figure 2. The wavelet synthesis equation is written as It can be seen from Figure 2 that the wavelet subband is very sparse, which reveals the obvious directionality of the image. Further application of dimensions will lead to the lack of fine features of this directionality. A first-order inverse transform is performed to obtain an HR image. Figure2 depicts the strong dependency among wavelet coefficients toward wavelet sub-bands. In image processing, reducing the size is a well-known problem, and many methods have been introduced for this [26]- [28]. In [26], the authors proposed a method that combines the dimensionality reduction cycle using principal component analysis (PCA) and multidimensional scaling capabilities. This approach can detect non-linear degrees of option underlying complicated common observations such as an individual signature or images faces in varying viewing positions. The authors of [27] described the KPCA method is the best among three (principal component analysis (PCA), independent component analysis (ICA), and kernel principal component analysis(KPCA)) for the vector support machine (SVM) feature extraction. Authors in [28], proposed an optimization algorithm with geometry represents higher dimensional data, allows for a reduction in computational measurement over previous PCA and MDS models. Our proposed method studies a single network model at several scales, where wavelet domain decomposition is used before training the network, and then the image of the wavelet subband is used as the input to the training.
The proposed method is changed prior to the neural network and wavelet-based processes, based on the following perspectives.
• SWT wavelet decomposition is utilized in our proposed method to evaluate the wavelet coefficients.
• We propose a deep network architecture similar to the fast medical image super-resolution method based on  deep learning networks [29]. Still, we train the network on wavelet domain images instead of residuals. However, the author [15] used DWT in conjunction with a three-layer neural network inspired by SRCNN [8].
• We designed a deep neural network based on the wavelet domain, proposed a super-resolution accelerated wavelet domain, deep neural network model, to determine sparse output, improveaccuracy, speed of reconstruction, and training efficiency. Figure 3 shows the structure framework of the proposed model which depicts a proposed method for a fast medical reconstruction method based on a three-layer deep learning network called a ''mini-grid-network,'' hidden layer, and subpixel layer. Because the ''mini-grid-network'' is a kind of small convolution neural network, sub-pixel convolution can be used as the output layer of the super-resolution image directly. The output sparsity can be determined by wavelet, thus improving the accuracy of image reconstruction, while figure 4 represents our three-layered network-based output.

A. SUB-PIXEL CONVOLUTION LAYER
The sub-pixel convolutional layer is used as the last layer in the proposed model. Shi [18] introduced the sub-pixel convolutional layer to reconstruct a low-resolution image. As shown in Figure 3, by using an upscaling filter for each feature map (R * R channel), the sub-pixel convolution layer can obtain a high-resolution image directly from the lowresolution feature map. Several kernels W with size k can be activated in a low-resolution subspace.
In the convolutional layer, W is the kernel of size K and can be activated in low-resolution space. R * R is the number of active patterns, as shown in Figure 3. Activation weights are being activated for the active pattern position [k/R] 2 . These patterns are regularly activated during the convolution of the image according to the location of the different subpixels mod (x, y), mod (y, r). Here (x, y) represents the high-resolution output pixel coordinates used to rearrange the elements.

B. MINI-GRID-NETWORK
Two convolution kernels of size 3 * 3 are nested in hidden layers to reduce the time named mini-grid-network. After analyzing model SRCNN (9-5-5), the feature map is achieved better with the second layer having the configuration of 5 * 5 convolution kernel. In the proposed method, a 5 * 5 convolution kernel is replaced by this mini grid-network to achieve the same results much faster as in [29]. To achieve the greater susceptibility, large convolution kernels will be used with increased numbers of parameters, but it also increases the number of calculations. As convolution kernel size is directly proportional to the number of parameters, so considering this small size of the kernel is favorable. In the proposed method, we utilized the ReLU function instead of the Tanh activation function due to its lower calculation property. ReLU only determines that the input is greater than zero. The same receptive field is achieved by 3 * 3 convolution kernels by mini-network.
Here T represents time, the size of the input image is represented by N, kernel size is K and number of the filter is represented by F, padding is represented by P and step length in horizontal as well as in vertical direction is represented by S. we used padding P = 0 and S = 1. Table1 shows the calculation of time complexity and parameters in the mini-grid-network.

C. HIDDEN LAYERS
The deep network consists of multiple layers, each parameter of each layer is utilized for feature learning purpose, as the layers increase the feature learning rate is also increased. Additionally, mini-grid-network is added for exploiting the speed of the network due to its good quick performance on the network [29]. The proposed model comprised of three layers, conv1, and con3 comprised of 32 kernel size with 3 * 3 conv size with Tanh activation function, while conv2 is named as mini-grid-network.
This mini-grid-network consists of two conv layers with a 3 * 3 conv kernel and ReLU activation function. Table2 listed all parameters.

III. EXPERIMENTAL SETUP FOR TRAINING AND TESTING
As our model is wavelet-based, so we used the onelevel wavelet decomposition before training and utilized approximation image and its corresponding sub-band images for training the model. Wavelets have redundancy property across each scale, provide approximation wavelet sub-band as input at a certain scale, reconstruction of approximation image can be perfectlyreconstructed due to its wavelet redundancy property.
LR image is decomposed into one level wavelet decomposition represented by x, and sub-bands (horizontal, vertical, and diagonal) sub-bands represented by Y. Here in our proposed method, we learn the relationship between LR approximation image and wavelet sub-band (horizontal, vertical, and diagonal) images. One problem with SRCNN [8] network is that the details about the input image should be saved after getting the output image. Those learned features are utilized and the input image is removed. If the network is so deep with several layers, that will be an end to end learning in this case and increase the overload and also requires memory. Because of this, the vanishing gradient problem [30] occurs, and it needs a solution. In the proposed network, we solve this problem by learning the wavelet coefficients.
In our proposed network model parameters are adjusted with the mentioned values, the learning rate is 10 −4 is set, momentum is 0.9, weight decay is 0. Mini-Batch Gradient Descent algorithm is chosen in our process.

Mini-Batch Gradient Descent
j ) represents the loss function. In the proposed method Gaussian distribution is implemented for each convolution layer to initialize the weights, described in the equation below.
σ represents the standard deviation, σ 2 is variance in it, and µ is the mean of the distribution. For the training dataset, batch size sets to the value of 128, while for testing, we use 32 batch size. To calculate the loss between label value and predictive values Euclidean loss [31] function is used and described as below.
Here N and n represents the total input image and number of the input image, Y n is predictive value, and Y n is label value. Wavelets have redundancy property across each VOLUME 8, 2020 FIGURE 5. Performance curve comparison with Bicubi, FMISR [29], and proposed methods at0.1 and 0.01 learning rates.
scale, provide approximation wavelet sub-band as input at a specific scale, reconstruction of approximation image can be perfectly reconstructed due to its wavelet redundancy property. For SISR, we learn the corresponding mapping between approximation image and its corresponding coefficients by employing the redundancy property of wavelet. As can be seen from Table 3, the algorithm provides good results by applying a deep neural network architecture in the wavelet domain. We use a depth of 20 weights listed in Table.1. Performance Table for proposed and FMISR [29] algorithm for Knee image(PSNR), is shown in Table3, and performance curves for different learning rates (0.1, 0.01) at scale parameter of 2 is represented in Figure 5. Hence, our approach provides better performance at mentioned learning rates. For training and testing purposes, our experiment environment setup consists of a windows based machine with intel(R) Core(TM) i5-7300HQ CPU @ 3.40GHz, NVIDIA GeForce GTX 1080-Ti. Additionally, Matlab 2017 with CUDA Toolkit and Anaconda is utilized for the setup.
Computational time concerning different comparable algorithms is calculated and shown in table 4.
As can be seen from the Table 4, our method is faster and better than the bicubic, SRCNN [8], and VDSR [16] methods. Still, due to the same small network and hidden layer, the calculation time of our approach is almost similar to that of FMISR [29] but still can provide a better calculation of level 2.
publicly available datasets, including Montgomery County X-ray, Teeth, Abdomen1, and knee images. Quantitative analysis was performed according to PSNR [14] and SSIM [14]. The mathematical definition is as follows.

A. PEAK SIGNAL-TO-NOISE RATIO (PSNR)
The Peak signal-to-noise ratio (PSNR) is used for the quantitative performance measure. Given a true image (original HR image) F and its estimatedF, with M×N pixels size, The PSNR is described as PSNR(F,F) = 10 log 10 255 2 where MSE (F,F) represents the mean-square error between two images (F,F). Given (F,F), MSE defined as VOLUME 8, 2020

FIGURE 9.
Teeth image based on a scaling factor of 2, in which (a) original, (b) Bicubic, (c) SRCNN [8], (d) VDSR [16], (e) FMISR [29], and (f) proposed approach sub-figures respectively.  An error is calculated between the real HR image and the reconstructed HR image. The higher the PSNR value, the better the reconstruction image.

B. STRUCTURAL SIMILARITY INDEX MEASUREMENT (SSIM)
For high-resolution quality evaluation of reconstructions, SSIM (structural similarity index) is extensively used. Wang and Zhou [32] and the mathematical representation of the SSIM index defined as: where µ x , µ y are the average of x and y respectively, d2 σ 2 x , σ 2 y are the variance of x and y respectively, σ xy is the covariance of x and y, C1, C2 are the constants. Our model is compared with four different methods included the bicubic technique, SRCNN [8], VDSR [16], and FMISR [29], which is represented in Table 5 respectively. Trained models of these compared algorithms are provided by authors. The proposed algorithm gives better results than compared algorithms. Figures 6-9 represent the comparative visual resultswith the scale 2 parameters. The proposed wavelet domain-based mini-grid-network for medical image super-resolution provides sharper edges and textures based visual results at scale 2, represented by Figures 6-9, respectively.

V. CONCLUSION
In this paper, we propose an effective wavelet-based deep neural network model to achieve super-resolution of a single image. In the experiments, we used medical datasets of four types of images (abdomen, X-rays, knees, and teeth). Compared with other deep neural network methods, the proposed network expands the convolutional layer to obtain a more realistic image reconstruction, thereby significantly reducing the computing time based on the mini-grid-network. To shorten the time of image reconstruction, we optimize the speed structure by combining sub-pixel convolution layers and ''mini-grid-network.'' Besides, we implemented a hidden layer to preserve information when training images to improve the quality of reconstruction. By using wavelets, many useful features of neural networks are used in SISR tasks such as large model capacity, end-to-end learning, and high performance in the wavelet domain. The SWT used instead of DWT due to its upscaling property experimental analysis is carried out to validate the efficiency of the proposed model. In the future, this work can be extended by applying other wavelet transforms, for example, Multiresolution discrete wavelet transform, and dual-tree complex wavelet transform.