Super Resolution of B-Mode Ultrasound Images With Deep Learning

Ultrasound offers a safe, non-invasive, and inexpensive way of imaging. However, due to its natural intrinsic imaging characteristics, it produces poor quality images with low resolution (LR) compared to other medical imaging modalities. Various image enhancement techniques have been extensively studied to overcome these shortcomings. Super-resolution (SR) is one of these methods, which endeavor to obtain high resolution (HR) images from LR images while enlarging them. Numerous studies have already utilized different SR techniques in various stages of ultrasound imaging (USI). Unlike other studies, which aimed at obtaining SR in the pre-processing phase or early stages of the post-processing phase of USI, we achieved SR on B-mode ultrasound images, which is the last stage of USI. We constructed a deep convolutional neural network (CNN) and trained it with a very large dataset of B-mode ultrasound images for the scale factors 2, 3, 4, and 8. We evaluated the performance of our proposed model quantitatively with eight image quality measures. The quantitative results revealed that our algorithm is much more successful than other methods at each magnification factor. Furthermore, we also verified that there is a statistically significant difference between our approach and others. Besides, qualitative analysis of the reconstructed images also confirms that it produces much better quality HR images than other methods in terms of the human visual system.


I. INTRODUCTION
Ultrasound imaging (USI) has become a standard imaging modality since it enables to execute a safe, non-invasive, and portable way of imaging [1]. Besides, it is an inexpensive modality compared to other imaging modalities, such as computerized tomography (CT), X-ray, and magnetic resonance imaging (MRI). In medical imaging, it is nearly the most preferred method for imaging of the cardiovascular system, abdominals, urology, vascular system, obstetrics, gynecology, and so on. On the other hand, USI suffers from the artifacts emerging during the travel of sound waves through the medium. A typical cause of artifacts is the transmission of ultrasound signals in a short duration in time. Thus, the signals are affected by multiple noise sources [2]. The speckle noise is another fundamental problem destructively affecting ultrasound images. The speckle is the outcome of the coherent interface of constructive and destructive backscattered echoes. Inhomogeneity of tissues also distorts sound waves.
The associate editor coordinating the review of this manuscript and approving it for publication was Gulistan Raja .
A variety of techniques have been extensively investigated to overcome the poor image quality in USI. The main goal of these techniques involves the reduction of speckle noise, removal of blurring effect, preserving high-frequency components, and enhancement of the resolution [2]. Improving the resolution is referred to as super-resolution (SR). The SR aims to produce finer details missing in the LR image while enlarging it by a scale factor to obtain a high resolution (HR) image. In this study, we obtained SR on B-mode ultrasound images. Almost 28 thousand ultrasound images were employed for training a deep convolutional neural network (CNN) to produce HR images. We denominate our proposed model as DECUSR 1 (deep convolutional network for ultrasound super-resolution). We compared DECUSR with four well-known interpolation methods (Bicubic, Bilinear, Lanczos, and Nearest interpolation), the method proposed in [3], and two deep learning (DL) models: super-resolution convolutional neural network (SRCNN) [4], and enhanced deep super-resolution network (EDSR) [5]. We used eight image quality measures (IQMs) in our quantitative evaluations. We used two separate test sets of images from two different sources. The findings in both test sets reveal that our method surpasses other methods significantly. We confirmed that there is a statistically significant difference between our method and others.
The qualitative comparisons of HR images reconstructed by each algorithm also confirm that our method produces much better quality images than others. In other words, the proposed model greatly surpasses other methods in terms of the human visual system (HVS) as well.
As a summary, the main contributions of this paper are as follows: • We obtain SR at multiple scales factors (2, 3, 4, and 8). • We developed a novel CNN model that offers highquality ultrasound image reconstruction.
• We compared our model with other models by using eight IQMs, and also qualitatively in terms of the human visual system.
• We also checked the statistical significance of our method against other methods.
The rest of the paper is organized as follows. Section 2 gives a summary of related works previously done. In Section 3, the architecture of the proposed model is described in detail. A comprehensive description of the experiments and results is given in Section 4. In the conclusion section, we briefly outline and discuss the contributions of this paper.

II. RELATED WORK
SR is being employed for increasing resolution while preserving or even improving details in images. It can be implemented on still images or multiple images in a sequence. Multi-frame SR algorithms first register the sequences of images to estimate relative motion. The relevant sequences of the same scene are then projected into a coordinate system introducing an interpolation method. As a final step, some image enhancement algorithms are applied to interpolate finer details to obtain better image quality.
On the other hand, a single image super-resolution (SISR) aims at obtaining an HR image from a single LR image. In this study, we adopted a learning-based SISR approach by exploiting a CNN. In the next two sections, we list related works on SR in two different topics: SR in real-world imaging, and USI.

A. SUPER RESOLUTION IN REAL WORLD IMAGING
Plentiful methods exploiting either single or multiple images have been introduced in the last few decades to solve the SR problem. Such as interpolation-based [6], reconstructional [7], frequency domain applications [8], and in recent studies mostly learning-based methods in SISR [9]- [12]. SRCNN network, proposed by Dong et al. [4], is the first DL algorithm applied to SR. After then, many different network architectures have been designed for SISR. The former networks were mostly a kind of CNN with some special architecture adjustments or loss functions. Some of them used residual learning approach; DRCN [13], VDSR [14], DRRN [15]. ESPCN [16], and EDSR [5], which has won the NTIRE 2017 challenge [17], have used the sub-pixel convolutional layer structure. The authors of FSRCNN [18] aimed to implement a fast and efficient algorithm. Densely connected layers were used with DenseNet [19] and RDN [20]. MemNet [21] proposed a special architecture consisting of the recursive unit and a gate unit. Laplacian pyramid architecture was used in LapSRN [22]. EnhanceNet [23] has focused on creating realistic textures. Very recent special CNNs are HCNN [24] introducing hierarchical structure, and MGEP-SRCNN [25], which proposes multilabel gene expression programming (MGEP).
Apart from CNN architecture, especially in recent studies, generative adversarial networks (GANs) have exhibited outstanding performance in SISR. The authors of SRGAN [26] intended to recover finer texture details, especially at large upscaling factors, by introducing a perceptual loss function established with an adversarial loss and a content loss. Reference [27] derived ESRGAN from SRGAN. RTSRGAN [28] combined the advantages of ESRGAN and the real-time performance of ESPCN. DGAN [29] used multi generators in the network structure. Cycle-in-cycle GAN is recently used in MCinCGAN [30]. CGAN [31] integrated Laplacian pyramid architecture into GAN. Wasserstein GAN architecture is utilized in WGAN [32]. Feature-guided SR is proposed in FG-SRGAN [33] by pointing out that it is impractical to super resolve LR images due to the absence of groundtruth HR images in the real world. The authors of GCN [34] proposed collaborative network architecture. Gradient Magnitude Similarity Deviation (GMSD) metric is adapted to GMGAN [35] in order to produce HR images in line with the human vision system (HVS). The authors of SRNTT [36] focused on the information loss in LR images, and, proposed a reference-based SR approach for generating better texture details from reference images. A probabilistic generative framework, PGM, which offers the low computational cost and robustness to noise, is proposed in [37]. Reference [38] proposed G-GANISR exploiting the least square loss function instead of cross-entropy. The authors targeted to exhaust all image details without loss of information by gradually increasing the charge of the discriminator.

B. SUPER RESOLUTION IN ULTRASOUND IMAGING
We summarize here the SR studies implemented in the postprocessing phase of USI since the pre-processing phase of USI is out of the scope of this paper. A multi-frame SR task was experimented in [39]. The authors mainly focused on the registration step to get a better ultrasound image quality. RF line sequences were used for bilinear deformable block matching (BDBM), which was adapted for motion estimation and small local deformations. Very similar implementation was also performed in [40]. To measure the thickness of intima-media more accurately, it is aimed at [41] to reduce speckle noise while obtaining an HR image by utilizing SR. VOLUME 8, 2020 For this purpose, anisotropic diffusion regularization was performed by employing a maximum a-posteriori (MAP) approach with transformation information in the frequency domain. In [42], a patch-based Gaussian process regression with multiple annotators (GPRMA) was performed for computing an estimated actual HR image. The authors endeavored to minimize the evidence logs utilizing gradient descend with a radial basis function (RBF) kernel for parameter estimation. Instead of utilizing a single image in breast tumor classification, a multi-image SR approach using complementary information from multiple images before tissue analysis is recommended in [43]. Because 1 -norm was proven to be more robust in comparison to 2 -norm, 1 -norm was chosen as a similarity cost function with total bilateral variation (BTV) regularization. The authors achieved the best results for all texture methods, especially for 5 LR images. Reference [44] proposed a p -norm regularizer for ultrasound tissue reflectivity function for motion estimation. The researchers proposed an approach to decimation and blurring operators for solving the related optimization problem.
The authors in [45], implemented a two-dimensional homomorphic deconvolution filter, ensuring real-time processing to the first and second harmonics in the RF image array. As a result, the second-harmonics resulted in better spatial resolution than the first-harmonics.
A sparse coding scheme was implemented in [46] with non-local structural similarity regularizer based on the fact that medical images often have many repetitive image structures. To improve the quality of the constructed HR image, the researchers adapted a non-local similarity constraint into the patch aggregation process after finding similar patches by the Gaussian neighborhood. The proposed algorithm was applied to MRI, CT, and USI. Another research for SR with sparse representation was carried out in [42] for the reconstruction of ultrasound images. Reference [47] employed the idea that RF signal envelopes have to be improved to increase the resolution of images. The authors implemented a multi-dimensional autoregressive (AR) modeling using inverse Fourier transform of non-demodulated In-phase and Quadrature (IQ) signals, instead of directly estimating the envelopes of RF signals. Another experiment in [48] was performed with the Alternating Direction Method of Multipliers (ADMM) technique on IQ signals.
Yoon and Ye [49] utilized a CNN model for achieving SR on randomly subsampled RF data, rather than using every bit of it since there is only a little incremental change in the sequences of RF signals. In this way, they achieved fast run-time reconstruction of ultrasound images. Reference [3] employed a weighted regression for SR for a 2x scale. Another DL architecture akin to the optical flow method was designed in [50] for implementing the multiframe SR approach with motion estimation. The researchers specifically focused on the motion estimation step of the multi-frame SR task for enhancing the quality of images. Reference [51] employed an SRGAN network with sub-pixel feature channels for generating intravascular ultrasound images.
Reference [52] also used SRGAN with small modifications to achieve SR on the lateral axis of USI. The authors obtained SR on RF data before scan conversion to B-mode for a 4x scale. An unsupervised learning-based method with dilated CNN and residual learning for a single ultrasound image has been proposed by Lu and Liu [53]. The method does not require a training dataset since it learns the LR-HR map from the given image. Reference [54] utilized repetitive data in the non-local neighborhood of samples by first denoising image and then applying a Bayesian approach for SR. Another Bayesian approach with the SR technique is implemented in [55] for ultrasound image restoration.

III. METHOD
In this study, we attained the SR of ultrasound images by using a CNN. We empowered our model to learn the functional relationship between LR and HR images by training it with a large training set of B-mode ultrasound images. We will explain the entire process in detail in the next sections.

A. SUPER RESOLUTION
The super-resolution (SR) is a process of obtaining high resolution (HR) images from a single or multiple low resolution (LR) images generated from either multiple observations of the same scene or produced synthetically from HR images. It strives to produce finer details as well while improving the actual resolution. Thanks to successful results, it has been used for many different applications, such as medical image processing, image conversion from standard-definition television (SDTV) to high definition television (HDTV), image mosaicking, aerial, and satellite imaging, and image enhancement [56]. There are two main attitudes of the SR task: software-based methods and hardware-based methods. One of the hardware-based methods aims to decrease the pixel size beyond a certain threshold. Current technologies have already reached the limit of such threshold sizes. Besides, decreasing the pixel size gives rise to a drop in the amount of light reaching the corresponding cell of the pixel on the sensor. Subsequently, it causes an increase of the shot noise. The other member of the hardware-based method aims to increase the capacitance of the imaging device. That is, it finally slows down the charge transfer rate. In addition to the issues mentioned above, hardware-based approaches are generally expensive, especially in large-scale imaging devices. Therefore, algorithmic-based methods are more preferred than hardware-based approaches.
The task of obtaining LR images from HR images is formulated with the following definition: where x and y are LR and HR images, respectively. M is a warping function, B is a blurring function, and D is a downsampling function. η stands for additive noise. As seen from (1), the SR task is an inverse problem that HR images can be estimated from LR images. From given HR images, LR images are simulated with a series of operations defined in Eq. (1). Afterward, the missing information between LR and HR images is being estimated using the relationships between them. Very recent algorithms try to recover these relations by exploiting a variety of learning algorithms. Likewise, the proposed model is also a learning algorithm that aims to comprehend these relationships between LR and HR images.

B. OUR PROPOSED METHOD
We provide theoretical background, implementation, and training details of our proposed algorithm in the next sections.

1) MOTIVATION
A convolutional neural network is comprised of convolutional layers with a particular activation function and parameters = {W, B}. Where W is a set of convolution kernels (filters), and B is a set of biases. At each layer, the input image x is convolved with a set of K kernels W = {W 1 ,W 2 ,. . . ,W K } and added biases B = {b 1 , b 2 , . . . , b K }. The convolution operation at each layer generates a new feature map x k . Afterward, these features are subjected to an activation function, which results in an element-wise non-linear transform. Such that, a = σ (W * x + b). Here, ' * ' and σ respectively denote the convolution operation and activation function.
The same process is repeated for x and at every convolutional layer l: Consequently, a convolutional neural network with an L number of layers is given as: where, W n represents the K number of kernels in layer n. Our proposed model, as a CNN, maps a given LR image x to estimated HR imageŷ = f (x; ) that is closed to the ground-truth image y. For the optimization of the network error over given a training dataset D x (i) , y (i) N i=1 , we define a loss function such that: where N is the number of training samples. In (4) we introduce the mean-squared-error (MSE) for computing the loss value by quantifying the discrepancy or distance between each predicted imageŷ and its corresponding ground-truth image y. MSE defined as follows: where P and R are pixel sizes in both direction, height, and width. So, it is required here to minimize the difference betweenŷ and y. That is, the network must learn to accurately predict the actual output y from given LR image x as much as possible.
Including a regularization term, we define an objective function such that; where r is a regularization term with weight λ > 0, and L is the loss function defined in (4). Now that our goal is to minimize the objective function J . In other words, we would like to find our network's parameters = {W, B}, such that make the objective function be the minimum. This optimization problem could be solved with the gradient descent algorithm.
Since our training set D is too large for computation, we use a mini-batch training method that makes our objective function as follows: Each time we draw M N number of samples (128 samples, in our case) from D, and compute L in the forward pass, and the gradient ∇L in the backward pass. The parameter update ∇(W, B) is found from the error gradient ∇L and regularization gradient ∇r. For updating the parameters = {W, B}, we used a well-known gradient-based optimization method, ADAM [57]. It is a first-order gradient-based optimization algorithm for stochastic objective functions.

2) IMPLEMENTATION OF THE NETWORK
We dominate our method as DECUSR (Deep Convolutional Network for Ultrasound Super Resolution) for brevity. It essentially consists of three different components: feature extraction block (FEB), repeating blocks (RBs), and upsampling layers. There are three different types of layers in the network: convolution, upsampling, and concatenation. We present the architecture in Fig. 1. Each convolution layer has 16 filters except for the last one, which has only one 1 × 1 filter. For a convolutional layer in our network, we can write (2) as follows since we use ReLU as the activation function: where k = 1 for the last layer and k = 16 for the rest. Traditional networks generally have a single upsampling layer in the case that the network upscales input image by itself. In contrast, others intake an LR image, which is already-upscaled by an interpolation method. The critical question is, at what stage and how to upscale LR image? E.g., directly upscaling after the input layer, or after a kind of feature extraction process. We think that there will be some amount of information loss in either way. Hence, we employed two separate upsampling layers in our network: feature upscaling layer L FUP and direct upscaling layer L DUP . L FUP enlarges the extracted features generated by FEB, whereas L DUP directly enlarges the LR image. We observed in our experiments that both upsampling layers positively contribute to the network.
Similar to the idea of densely connected layers [19], we adopted densely connected RBs to better propagate the information throughout the network. We use concatenation layers in RBs for dense connections. There are also three subsequent convolutional layers in RBs. The first and second convolution layers have 16 3 × 3 filters, whereas the last one has 16 1 × 1. A concatenation layer (L CONCAT ) combines the outputs of both the upsampling layer (L DUP and L FUP ) and the outputs of each preceding RB. The accumulated information by the help of the concatenation in the RBs enables the network to learn the non-linear transition from LR to HR images in an efficient and balanced manner. In this way, our model becomes more robust. Besides, it learns much better. In the next section, we explain the mechanism of RB with the concatenation in detail.

a: REPEATING BLOCKS AND CONCATENATION
Let X l and T l ∈ R wxhxk respectively be the output and the feature map (a tensor in our case) of th layer in a network. Here, w, h, and k ∈Z + , and respectively denote the width, height, and depth of the layer. Typically, each layer in ordinary CNN implements a non-linear transformation F(·). A traditional CNN deems the output of the th layer as the input of the ( + 1) th layer. Such that, the output of the th layer can be formulated as X l = F l (X l−1 ). Unlike ordinary CNN layers, a concatenation layer stacks its inputs into a single tensor in the same order. For a n number of input feature maps, the output of a concatenation layer can be written as X concat = F concat T wxhxk 1 1 , T wxhxk 2 2 , . . . , T wxhxk n n , which produces a new feature map T wxhx(k 1 +k 2 ...+k n ) .
The first layer in an RB is a concatenation layer, which stacks L FUP , L DUP, and the outputs of each preceding RBs in the given order. Namely, k th repeating block RB k is connected to L FUP , L DUP, and RB 1 out , . . . , RB k−1 out via its concatenation layer. We initialed the baseline model with three RBs and endeavored in our experiments to find out the optimum number of them. According to our evaluations, our network achieves the top performance with four RBs. We discuss this later in the next sections.

b: DISSECTING THE COMPONENTS
Before finding out how many RBs should exist, we examined the contributions of the components of the network. For this  purpose, we trained our baseline model (with 3 RBs) in the presence and/or absence of FEB, L FUP , L DUP , L CONCAT for a 2x scale. We give training graphs for each scenario in Fig. 2. The MSE loss of the baseline model in the absence of FEB, L DUP , and L CONCAT is shown with orange, green, and red colors, respectively. The blue line shows the baseline in the presence of all components. The loss is minimum when all components are present. The figure clearly shows that their presence positively contributes to the network. Table 1 also shows PSNR scores of each scenario obtained in TestSet-I. According to the results, L DUP slightly improves the network's performance (0.1 dB). PSNR score is 43.5736 (dB) in the absence of the layers L CONCAT and L DUP , whereas 43.8772 (dB) only in the absence of L DUP . These results mean that the concatenation layers considerably contribute to the network's performance (by 0.3036 dB PSNR). Likewise, PSNR score is 43.8147 (dB) in the absence of FEB. Therefore, it contributes to the network by 0.0625 dB PSNR. As a result, we see that each component contributes more or less positively to our network. Based on these observations, we incorporated all components into our network.

c: THE NUMBER OF REPEATING BLOCKS (RBS)
We constructed the baseline network with 3 RBs at the beginning of our experiments. After then, we trained the network by removing existing RBs or plugging new ones. We checked the network's performance after every addition or reduction. We stopped the procedure when the next two additions did not improve performance. In this way, we obtained six different models with a single RB to six RBs, since five and six RBs could not perform better than four RBs. Figure 3 illustrates the training loss in MSE and PSNR scores in TestSet-I by the different number of RBs. Using less than three RBs severely deteriorates the performance.
On the other hand, the network with four RBs converges rapidly in the second epoch and stabilizes the convergence throughout the training. Five RBs also exhibit a very similar learning curve as four RBs, but with slightly lower in the performance. The performance deteriorates noticeably with six RBs. Based on these results, we decided to employ four RBs in our network.

IV. EXPERIMENTS AND RESULTS
In this section, we discuss the details of the network's architecture, datasets, training, and evaluation.

A. DATASETS
We benefited from two different web sites to create training and validation sets and two separate test sets. In total, we downloaded 36149 B-mod ultrasound images from the former web site, http://www.ultrasoundcases.info. We excluded images that are not in good condition. From the remaining 29696 images, we composed the validation and test set 'TestSet-I' by randomly selecting a thousand images for each set, without replacement. Finally, the training set consisted of the remaining 27696 images.
The second test set, 'TestSet-II,' consisted of the images only from the following web site http://www.ultrasoundimages.com. We cropped randomly selected 500 images so that their size is 600 × 450 pixels. In this way, we eliminated and/or minimized unnecessary dark regions at sides, or marks printed by imaging devices. The training and validation sets consisted of the images collected only from the former website.

B. TRAINING
We constructed our model with KERAS. 2 It consists of a total of 39265 parameters. In III-B.2.c, we delineate the entire network architecture in detail. All convolution layers are initialized with Glorot uniform initializer. We employed a rectified linear unit (ReLU) as the activation function in the convolutional layers. Padding is set to 'same' so that the size of the output image does not change.
We obtained LR images dividing each training image by 255.0 after downscaling by bicubic interpolation. The images were cropped to make their size is multiple of the scale. We trained our model on NVIDIA GeForce RTX 2080Ti GPU at most ten epochs with batch size 128, learning rate 1×10.3, minimum learning rate 1 × 10.7, early stopping patience 5, and weight decay 1 × 10.6. The learning rate was halved after the training is not improved for three epochs. All training parameters are summarized in Table 3.
The size of input image patches is 16,11,8, and 4 for the scales 2, 3, 4, and 8, respectively. A similar approach has been applied in determining the stride values. We used stride values 6, 4, 3, and 1 for scales 2, 3, 4, and 8, respectively. In this way, we obtained approximately the same number of training image patches for each scale. The images in the training set were randomly reordered at the beginning of each epoch. Similarly, the image patches were also extracted in random order from each input image.

C. EVALUATION
The quality of HR images obtained from each method was quantitatively measured by the following image quality metrics: peak signal to noise ratio (PSNR), structural similarity index (SSIM) [58], multi-scale structural similarity index (MS-SSIM) [59], feature similarity index (FSIM) [60], mean absolute deviation (MAD), Haar wavelet-based perceptual similarity index (HaarPSi) [61], gradient magnitude similarity deviation (GMSD) [62], and visual fidelity information (VIF) [63]. Since PSNR is one of the most widely used measures for image quality assessment, we present the definition of PSNR as follows: where L is the maximum intensity value in images (255 for this case), and MSE is defined in (5). The performance of the proposed method is compared with Bicubic, Bilinear, Lanczos and Nearest interpolation methods, SRCNN, EDSR (baseline model with approximately 1.5M parameters), and the method proposed in [3]. We shortened the name of this method as RWRUSR to easy reference. The benchmarks were performed for all scales.
We did our best to obtain the best results from RWRUSR. Since the authors did not mention how they upscaled the LR image before constructing the design matrix in the regression procedure, we tried three different ways of upscaling the LR image in each direction (horizontal or vertical) for each pass of regression. In the first approach, we just repeated existing rows and columns in the new rows and columns, respectively. In the second approach, instead of repeating the same values, we fill in the average values of adjacent rows or columns. In the last approach, we used the bicubic interpolation method for upscaling LR images. Since it offered much better results than others, the first technique was employed for the scales 2, 4, and 8. The authors state in their work that magnification by multiples of two can be yielded by repeatedly implementing the algorithm. Namely, the authors used the algorithm only for the magnification factors that are multiples of two. Therefore, for scale 3, we introduced the bicubic interpolation method for upscaling LR images since it yielded better results than the first two approaches.

D. RESULTS
We evaluated the performance of each method in quantitative and qualitative terms. We present in Table 4 the quantitative results of the methods in both test sets. The highest score is marked with red color, whereas the second one with blue. According to most IQMs, our method outperforms other algorithms by a noticeable margin at all scale factors. The secondbest performance is mostly achieved by EDSR in TestSet-I, and by SRCNN in TestSet-II. Except for FSIM measure at scales 3, 4, and 8, our method achieves the highest scores in TestSet-II as well. SRCNN surpassed EDSR in this test by mostly achieving the second-highest scores. In both tests, no method other than DL methods ranked the top two. Experiments show that Lanczos interpolation outperforms other interpolation methods. We observed in our experiments that EDSR yields a very low PSNR score at each scale. To us, this may be due to the use of 1 loss. The authors trained EDSR with 1 loss, instead of 2 loss, which maximizes PSNR, for the reason that it offers fast convergence. Therefore, it yields too low PSNR score compared to others. However, this does not apply to other metrics. It mostly offers the second-best performance in other IQMs in TestSet-I. It even achieves the highest FSIM score in TestSet-II for scales 3, 4, and 8, as we mentioned above.
For visual evaluation, we represent these results graphically in Figs   mance ranking of the algorithms in both test sets can easily be seen in both figures.
To make a qualitative comparison between the methods, we present in Figs. 5 and 6 the same region of the HR images reconstructed by each method from a given LR image in each test set. The full-size original images with their ground-truth patches are also given on the left. We represent the PSNR and SSIM scores of each method below the images. It is very clear from the figures that the best quality images, in terms of HVS, are the ones reconstructed by the proposed method. Our method very successfully preserved high-frequency details. The lowest quality images were yielded by Nearest interpolation, and then RWRUSR. The aliasing artifacts yielded by both algorithms may easily be seen in all figures. Particularly at large scales. The images produced by EDSR have some pixel corruptions, especially in dark areas. In contrast, our method yields very good resolution and sharpness. The larger the scale, the higher the difference in image quality between our method and others.
We also checked the statistical significance of our method's success against other methods by one-way analysis of variance (ANOVA) and Welch's t-test at 0.05 level of significance. We performed 224 measurements (4 scales x 8 IQMs x 7 methods) in total for each test set. We present the full list of p values of the measurements in Table 1 Appendix B.
The p values greater than 0.05, which indicate no significant difference between our method and others, are marked in red. Respectively, 211 and 208 measurements performed in TestSet-I and TestSet-II, confirmed the significance of our method. Most of the remaining 29 analyses, which are showing no significance, are the ones performed against EDSR, and then SRCNN. Almost half of these analyses were VOLUME 8, 2020 FIGURE 5. Visual comparison of an HR image reconstructed by each method. Values below the images, respectively, designate PSNR and SSIM scores. Our method produces much better images in terms of the human visual system. performed in FSIM, and the rest are mostly in VIF and SSIM measures. All measurements in HAARPSI and MS-SSIM confirm our method's significance. Apart from only one, two, and three measurements, respectively, all measurements in GMSD, MAD, and PSNR, also confirm the significance of our method.
In addition to statistical analyses, we present the run-time performances of all methods in Fig. 4. The figure reveals that  our method surpasses other algorithms with moderate runtime overhead.

V. CONCLUSION
We implemented in this study a novel deep learning approach to improve the low resolution and quality of ultrasound images. We achieved SR on the B-mode ultrasound images, which is the last stage of USI, rather than obtaining SR in the pre-processing phase or early stages of the post-processing phase of USI.
The proposed approach utilizes a deep learning algorithm that learns from samples the non-linear relationships between LR and HR ultrasound images. We trained our network with a huge training set of final B-mode ultrasound images to TABLE 5. Statistical significance analysis of our method against others in terms of ANOVA and WELCH'S t-test at 0.05 level of significance. We round of the p-values up to three digits since it is sufficient. p-values indicating no significant difference are marked in red. Respectively, 211 and 208 out of the 224 measurements in TestSet-I and TestSet-II confirmed the significance of our method.
obtain SR for the scale factors 2, 3, 4, and 8. We benchmarked our method with four well-known interpolation methods and two state-of-the-art DL networks. We used eight IQMs for quantitative evaluations. We also evaluated the quality of the reconstructed images by each method in terms of HVS.
We created two separate large test sets composed of images from two separate sources. The experiments showed that the proposed model surpasses other methods by a significant margin. Our algorithm achieved much better scores than others almost in all IQMs at each magnification factor. We also confirmed the statistical significance of our method by two statistical analyses. Besides, we demonstrated the success of our proposed method in producing high-quality images in terms of HVS. It reconstructs very good images in many aspects, such as resolution, edge-preserving, and sharpness.
As a conclusion, we showed with this study that our proposed approach exhibits excellent performance in qualitative and quantitative terms. Our method maintains high-quality ultrasound image reconstruction. Since it is software-based, it can be easily adapted to other image processing algorithms or integrated into imaging devices without the need for any hardware modifications or expansions.

APPENDIXES APPENDIX-A
Visual representations of method's scores in TestSet-I and TestSet-II.

APPENDIX-B
Statistical significance analysis of the proposed method with others.

VI. ACKNOWLEDGMENT
The authors would like to thank Dr. Taco Geertsma, who is the founder of the website www.ultrasoundcases.info, for granting permission to use the images from the website. They would like to thank the owner of the website www.ultrasoundimages.com for sharing sample ultrasound images.