Noise-Robust MRI Upsampling Using Adaptive Local Steering Kernel

Upsampling and denoising of magnetic resonance images are conventionally performed separately, which would introduce unwanted artifacts such as blurring. To address this problem, we propose an innovative adaptive interpolation framework to achieve simultaneous image upsampling, denoising, and detail enhancement. First, local steering kernel (LSK) function is leveraged to adapt the interpolation weights according to geometric structures in the magnetic resonance (MR) images. An adaptive sharpening of the LSK weight matrix and a Rician bias correction are then adopted to remove Rician noise and enhance fine details. In this regard, the adaptive LSK extends the zero-order point estimation framework to higher orders of regression, and therefore facilitates edge preservation and detail reconstruction. The post-processing Rician correction avoids the bias caused by the asymmetry of Rician noise distributions. Experimental results using both real and synthetic clinical MR cranial images (with and without noise) demonstrated that our algorithm produced better reconstruction results than several traditional interpolation-based upsampling methods, including nearest neighbor (NN), non-local means (NLM), self-learning super resolution (SLSR), Gaussian process regression (GPR), and even comparable to four deep-learning-based methods but with less data requirements and computational complexity. The proposed technique resulted in PSNR and SSIM values were ~3%–16% higher than any of the other traditional algorithms tested, and our method recovered more clear textures from noisy images compared with deep-learning-based methods. As such, the presented technique is a viable new approach to MR upsampling, particularly for noisy images.


I. INTRODUCTION
One of the primary objectives of medical imaging is the automated extraction and modeling of 3D anatomical regions of interest (ROIs) within the human body [1]. A variety of imaging modalities have been developed to achieve this goal, including computed tomography and magnetic resonance imaging (MRI). MRI is capable of achieving excellent soft-tissue contrast and is thereby effective for viewing differences between normal and diseased tissue. Higher res- The associate editor coordinating the review of this manuscript and approving it for publication was Nadeem Iqbal. olution images provide a more comprehensive understanding of anatomy at the cost of reduced signal-to-noise ratio (SNR) and increased imaging time [2]. However, clinically, an MRI scan is often fast because long scan time increases costs, leads to patient discomfort, and induces motion artifacts in images [2]. As a result, the resolution of a clinical magnetic resonance (MR) image is limited.
Conventional interpolation methods, such as spline interpolation, have been used to increase MR image resolution. More specifically, a voxel in a high-resolution (HR) image is estimated using a weighted averaging of several sampled voxels. The coefficient for each sampled voxel reflects the similarity between sampled and target voxels. Commonly, sampled voxels are selected within a spatial neighborhood of the target voxel, and thus coefficients are typically fixed as a function of the spatial distance between target and sampled voxels. Due to the simplicity, images reconstructed using these traditional interpolation methods often suffer from blurred edges and stair-casing artifacts.
To solve these two problems, several researchers have proposed methods that performed adaptive interpolation [3]- [10]. In the pioneering work of Manjón et al. [3], sampled voxels were selected from a large cube centered on the target voxel. Interpolation coefficients were then determined by intensity distances between two 3D image patches located around the target and sampled voxels. Rather than using fixed coefficients as conventional interpolation methods do, Manjón's method used adaptive coefficients that were more flexible to image structures and thereby avoid blurring edges in the interpolated results. Several variant techniques were subsequently proposed, which all focused on coefficient refinement. For example, Plenge et al. [7] took advantage of in-plane patches having a higher spatial resolution than those resulting from slice-selection. They calculated coefficients by measuring the 3D patch similarity in the in-plane directions, so as to estimate voxels in slice-selection. Within a similar notion, high-resolution images acquired with other MRI modalities have also been leveraged in several approaches to calculate interpolation weights so as to better reflect image structure [8], [9].
Besides image interpolation, deep learning has witnessed a tremendous amount of attention over the last few years in the field of image processing, because of the high representational capacity. With extensive parameters and a good learning process, deep-learning-based models have the ability of fitting on a large number of training data and exploiting the underlying structures of natural images. Dong et al. [22] introduced CNN into the SR task and proposed the SRCNN model that was composed of a three-layer network to learn the mapping from LR images to HR images. This model achieved much better performance compared with the traditional algorithms. Kim et al. [24] proposed the VDSR model that used a very deep network with 20 layers which produced improved performance compared with SRCNN. A main contribution of this method was to employ residual learning which encouraged a fast convergence rate in the training process. Tai et al. introduced recursive blocks in DRRN [25] and memory block in Memnet [31] for deeper networks. Haris et al. [32] developed a novel architecture which was named as DBPN. This model exploited iterative up and down sampling layers, providing an error feedback mechanism for project errors at each stage. DBPN improved super-resolution performance, yielding superior results and in particular establishing new state-of-the-art performance for large scale factors such as ×8 on multiple datasets. Zhang et al. [33] proposed the channel attention mechanism to build a deep model called RCAN to further improve the performance of SR. An image super-resolution feedback network (SRFBN) [34] was pro-posed to redefine the low-level representation with high-level information, which was realized by the hidden state in the constrained recurrent neural network (RNN).
Although their performance is state-of-the-art, these deeplearning-based methods also have some disadvantages. Firstly, the deep-learning-based methods often require a large number of training data sets, while a large MRI dataset is hard to obtain. Second, the generalization capabilities of deep-learning-based methods are also very limited, and they require a close distribution of training data and testing data. On the contrary, the conventional image processing methods often have less data requirements and wide range of applications. Last but not least, there is a need for pre-denoising before upsampling noisy images, so the edges of final results are blurry or smooth. In this regard, this paper focuses on the conventional method in MRI image upsampling.
For interpolation methods, they cannot consistently reconstruct high-frequency details from low-resolution (LR) images. This is because interpolation-based algorithms all belong to the framework of zero-order regression estimation [10]. To this end, a regression-inspired upsampling method using second-order polynomials was proposed in our previous study [10]. However, both adaptive interpolation methods and our previous work fail to account for the noise present in MR images, mainly produced by echo planar imaging [11]. Image denoising is typically used in these methods to remove noise beforehand. It has been demonstrated that a simultaneous interpolation and denoising eventually outperforms an asynchronous framework [10]. Motivated by this, we propose to develop a new unified MR image upsampling framework that simultaneously performs HR image reconstruction, denoising, and high-frequency detail enhancement, and thereby leads to more accurate treatment contours. Unlike the abovementioned adaptive interpolation methods using intensity distance to calculate interpolation coefficients, the proposed method refined the adaptive interpolation coefficient by comparing patches' structure difference. Moreover, although the proposed method was also a zero-order regression-based method, the incorporated adaptive weight sharpening strategy enabled detail preservation.
The main contributions of the proposed method are the following four aspects: 1) With the assumption that input image is clean, most image upsampling methods have to include an additional step to remove image noise. But this denoising step blurs fine image details as well. To this end, the proposed method presents a unified framework that simultaneously carries out upsampling, denoising, and detail sharpening.
2) Compared with deep-learning-based methods, the proposed method recovers more clear textures from noisy images with less data requirements and computational complexity.
3) Local steering kernel (LSK) is adopted to help interpolation weights be sensitive to local structure of images. 4) An adaptive sharpening technique is proposed to extend the traditional weighted-averaging interpolation framework (zero-order estimation) to a two-order estimation framework. VOLUME 8, 2020 Besides, using adaptive weight sharpening enforces the distribution of interpolation weights to be consistent with local image structure, which helps denoise background and sharpen fine details. 5) Rician noise correction is leveraged to correct the bias caused by the asymmetry of Rician distributions.
Our method is described in the following three subsections. Specifically, details of the LSK and adaptive sharpening methods are elaborated in section II-A and II-B. In section II-C, a Rician noise correction step is introduced.
The remainder of this paper is structured as follows. Section II details the proposed algorithm. Experimental results for several MR phantoms and real data are provided in Section III, together with visual and quantitative comparisons with other methods. Finally, the conclusions are described in Section IV.

II. METHODS
We have noticed that both image denoising and interpolation could be realized by the weighted averaging framework: where y i is the intensity value of voxel i that needs to be estimated, is a restricted search volume surrounding the voxel i, and y k is the sampled voxel within the search zone, w measures the similarity between voxels k and i respectively. Z is a normalization constant. Intuitively, has a determining influence on this intensity estimation.
As for denoising, averaging voxels with a similar structure can reduce noise without compromising details [12]- [14]. More specifically, we have found in our pilot study [14] that when denoising voxels in flat regions, the weights used for averaging should be nearly isotropic since using sampled voxels in all directions facilitates noise removal. Meanwhile, when denoising voxels in textured regions, weights used for averaging should be anisotropic so that details are preserved by using sampled voxels along a particular direction. In other word, weights should be adapted to image structures so that noise removal and detail preservation are balanced in the context of denoising.
On the other hand, coefficient computation is also a fundamental issue for interpolation-based MR image upsampling. Previous studies [8], [10] have advocated that using highorder patch statistics in deriving interpolation weights can be a big help for reconstructing high frequency details. In other words, weights should be adapted to high-order patch features so that details are preserved in the context of interpolation.
Based on the above analysis, we propose a modified weighting strategy to unify MR image upsampling and denoising under the framework of weighted averaging. More specifically, the coefficients should be sensitive to image structure and reflect the similarity of high-order patch features. To this end, a specific kernel based on image structure tensor was adopted in the proposed method to compare highorder patch structure. An adaptive sharpening scheme was also used to ensure the coefficients responded appropriately to image structures. Finally, a Rician correction step, which was conventionally used in several MR image denoising methods [11], was employed. Fig. 1  LSK [26] is an image feature that describes the local structure information of the image and has good robustness to noise. The basic idea of LSK is to measure both geometric and photometric local similarity of a pixel to its neighbors. It analyzes the pixel value differences based on estimated gradients in a local window. It can robustly estimate the intrinsic image structure and address pixel-level image noise and ambiguity [27]. For example, patches containing a flat region, textural clutter, or structural part can behave very differently for LSKbased descriptors [26]. In addition, extensive experiments have shown that feature descriptors using LSKs are robust to brightness variation and noise interference through estimation of the local intrinsic structure. The effectiveness of the LSK has been verified in various applications, e.g., image reconstruction [28], image deblurring [15], object detection [29] and single image super-resolution [30].
By analyzing the intensity differences voxel-wisely on estimated gradients, LSK captures structure information and uses it to determine the shape and size of a canonical Gaussian kernel. Mathematically, this kernel is represented as: where C i denotes a 2 × 2 symmetric covariance matrix centered at voxel x i , which is estimated from a collection of first derivatives along the vertical and horizontal directions. The term x − x i represents the spatial distance between sampled (x i ) and target voxels (x); h is a smoothing parameter that controls the decay of the Gaussian function. It is important to note that the matrix C i is an image structure tensor. The specific formation of this matrix is described as: where N (x i ) represents a search window centered at x i and m k is the voxel located inside N (x i ). The terms z 1 ( * ) and z 2 ( * ) denote first derivatives along the vertical and horizontal directions, respectively. LSKs are able to reflect local structure because they are computed using local statistics. Fig. 2 illustrates LSK shape for neighboring sampled voxels for a variety of image structures (i.e., texture, flat, strong and weak edge) in a T1 image with different noise levels (0%, 5%, and 9% of maximum intensity). The yellow dot inside each red box denotes the target voxel and the rest voxels inside the red box are sampled voxels. The blue box depicts the LSK values of the sampled  voxels. In the blue box, the dark red color indicates that sampled voxels in this location have a comparatively large weight (near 1) in the weighted averaging, while the blue color represents corresponding sampled voxel with a smaller weight (near 0). As is clearly shown in Fig. 2, LSKs reliably capture local data structures for a clean image even in regions of complex texture.
However, it is also important to note that, in the noisy case, the shape and orientation of LSKs are quite deviated from the noiseless case. This is possible due to the degradation of image structures in the presence of noise. Therefore, for noisy images, LSKs must be refined by certain strategies to reflect the latent image structures that have been corrupted by noise.

B. ADAPTIVE WEIGHT SHARPENING
Although LSK is not capable of reflecting latent image structures for noisy images, it, as shown in Fig.2, still appears to be anisotropic in textured regions and isotropic in flat regions among these images. But compared to the distributions of LSK for a clean image, we find that such distribution tendency is actually weakened in noisy images. Motivated by these two findings, we propose reinforcing the tendency of LSK distribution on noisy images by adaptive sharpening in order to mimic the LSK distribution on clean image. In other words, for LSKs in flat regions, the spread of LSKs becomes wider and essentially isotropic; for LSKs in textured regions, the spread of LSKs shrinks to the texture outlines.
In order to achieve this adaptation, adaptive sharpening was applied to LSK values. More specifically, after collecting LSKs into a weighting matrix W i , an adaptive sharpening was carried out on W i by adding or subtracting a fraction of the high-pass filtered weight matrix back to the same matrix again:Ŵ Here the matrix H represents a high-pass filter and λ is a scalar used to modulate sharpness. As indicated in [17], the sign λ of determines whether the input (W i ) is sharpened (positive) or smoothed (negative). Since a smoothed W i is achieved by taking out the high frequency information of W i , the spread of its corresponding LSKs will be more isotropic than the original one; similarly, a sharpened W i implies that the spread of its corresponding LSKs will be biased towards a particular direction. Therefore, according to the previous analysis, for estimating voxels in flat regions, W i ought to be smoothened; while for estimating voxels in textured regions, W i ought to be sharpened. In this way, the adaptively sharpened weight matrix adapts better to the latent image structures than the original one does. At first, before performing adaptive sharpening on weight matrix, each image voxel ought to be classified according to the type of its neighboring patch structure (i.e., flat, weak, and strong texture). In this paper, image structure tensor (C i in Eq. (3)) was used to discriminate textured voxels from flat voxels. The relative discrepancies between two eigenvalues of C i reflect how strongly the distribution of gradients in an image patch is biased towards a particular direction [13]. In other words, smaller eigenvalue differences exist for voxels in smooth regions, while larger differences exist for voxels in VOLUME 8, 2020 textured regions. Voxel classification can thereby be achieved by examining eigenvalue differences for each voxel. More specifically, if s 1 and s 2 (s 1 ≥ s 2 ) denote two eigenvalues of C i, we use T = s 1 − s 2 to reflect the degree of texture for every voxel. Voxels with different texture magnitudes can be classified by analyzing the cumulative histogram of in T an MR image slice: Here T (i, j) is the T value of the voxel at location (i, j); t 1 and t 2 are the bin values corresponding to 80% and 50% in the cumulative T histogram, respectively. The terms c 1 , c 2 , and c 3 represent strongly-textured, weakly-textured, and smooth regions, respectively. Fig. 3 presents the results of voxel classification for a T1 image using Eq. (5). It is evident that voxel classification is consistently the same in all images, regardless of whether they are heavily contaminated. This result indicates that image structure tensors provide a reliable classification system in the presence of noise. Second, λ in Eq. (4) should be carefully designed in our method to adaptively enhance weight matrix. Based on the previous analysis, this parameter is empirically selected in accordance with region type. Besides this, the problem of voxel misclassification, which is inevitable in highly corrupted images (for example, in Fig. 3(c) there are green dots present in the red background, which indicates that the noisy background is misclassified into the texture), is also considered in the process of λ selection. More specifically, we used a positive λ in classes c 1 and c 2 to preserve and sharpen details and a negative λ in class c 3 to remove background noise. Additionally, a sigmoid function was used to refine λ in c 2 to remedy misclassification. With Q i (i = 1, 2, 3) denoting the median value of T in each class type, the specific form of λ is derived as follows: where f ( * ) is a typical sigmoid function that returns a small value when the input is low: In Eq. (6), T * is a normalized T for voxels classified as c 2 . We observed that the misclassified voxels in c 2 all have small T values near t 2 . As such, we used a sigmoid function to assign these voxels to λ values near 0. This was done to prevent improper sharpening or smoothing caused by misclassification.
An illustration of the adaptive sharpened LSK (Eqs. (4)-(7)) for the same regions in Fig. 2 is presented in Fig. 4. These refined LSKs coincide better with local image structures and are more adaptable than those of Fig. 2. For example, in the flat region, the LSKs are wide and essentially isotropic. In the edge regions, the LSK shrinks to accurately depict the edge outline. Using the refined LSKs facilitates noise removal and edge preservation, since samples with similar geometric structure are more likely to be leveraged in a weighted averaging framework.

C. RICIAN CORRECTION
It is worth noting that the weighted averaging framework is designed for additive Gaussian noise. Nevertheless, MRI image is commonly believed to be governed by noise with Rician distribution [18]. Directly applying this framework to MRI datasets would cause biasing because of the asymmetry of Rician distributions [11]- [18]. To this end, a post-processing bias correction introduced in [11]- [18] was carried out in the proposed method. The final voxel intensity estimation was calculated as: where z (x i ) represents the estimated intensity value of a voxel x i in the SR image, y ∈ N (x i ) denotes x i 's neighboring sampled voxels, andŴ i is a matrix comprised of the sampled voxels' corresponding adaptively sharpened LSK values, sum( * ) is an operator denoting the summation of matrix elements, and σ is the standard deviation of MR image noise.

III. EXPERIMENTS AND DISCUSSIONS
The proposed algorithm was compared with other state-ofthe-art upsampling methods using both synthetic and clinical MRI datasets. Two T1 MR phantoms from Brainweb (simulated images, http://brainweb.bic.mni.mcgill.ca/brainweb/ selection _normal.html) and the Human Connectome Project (HCP, real images, https://db.humanconnectome.org/) were utilized to evaluate reconstruction performance for the proposed technique in both cases with and without noise. Moreover, a real clinical MRI dataset was also used to evaluate the robustness of the proposed method for a realistic scenario, where subjects gave informed consent to participate and recordings were used for study purposes. Note that the proposed method made no prior assumption of the image, it could thereby be applied to other anatomical regions such as knees, liver and heart. The LR volumes used in this study were generated by blurring and downsampling. More specifically, the blurred volumes were generated by convolving an HR volume with a 3D Gaussian kernel of standard deviation 0.8 (in voxel space) along each dimension. The blurred volume was subsequently downsampled to a lower voxel resolution at 2 × 2× 2 mm 3 or 3 × 3× 3 mm 3 .

A. ALL COMPETING MODELS
In this study, nearest neighbor (NN) interpolation, non-local means (NLM) based upsampling [3], self-learning super resolution (SLSR) [10], Gaussian process regression (GPR) based upsampling [19], deep-learning-based methods (SCN [22], VDSR [23], DRRN [24], and SRFBN [34]) were employed for performance comparisons. GPR method, NLM method and SLSR method are both interpolation-based methods. The former two are zero-order based and the latter is second-order based. As a pioneer CNN model for SR, super-resolution convolutional neural network (SCN) predicts the nonlinear LR-HR mapping via a fully convolutional network, and significantly outperforms classical non-DL methods. Kim [24] et al. proposed a super-resolution method using very deep networks (VDSR). This work use residual-learning and extremely high learning rates to optimize a very deep network that fast maximizing the convergence speed. In DRRN [25], an enhanced residual unit structure is recursively learned in a recursive block, and several recursive blocks are stacked to learn the residual image between the HR and LR images, and then added to the input LR image from a global identity branch to estimate the HR image. SRFBN [34] introduced a feedback module that shared weights by using recurrent network. Besides, a curricular approach was introduced to handle different tasks. Variable parameters used in the latter nine methods were selected per author suggestions. The only parameters used in the proposed method are the size of 3D patches, in which the sampled voxels are determined, as well as the bandwidth in Eq. (2) that is used to calculate the LSK: the 3D patch size was fixed to be 3 × 3 × 3 and the bandwidth was selected according to noise level.
In general, NLM, GPR, and the proposed method belong to a zero-order estimation framework. SLSR belongs to a second-order estimation framework, which has been shown to be beneficial for detail preservation [10]. In addition, GPR and the proposed method require no additional denoising, while NN, NLM, SLSR, SCN, VDSR, DRRN, and SRFBN require a pre-denoising step. As a result, they were used to perform on the denoised LR volumes. This was achieved by filtering the original noisy LR volumes with the APW-NLM method [14], which employs an adaptive bandwidth and patch size.
The performance of the proposed method was evaluated by comparing reconstructed HR volumes with the original HR volumes, as well as those reconstructed using other methods. Peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM) [23] were used to quantify agreement between images. A high PSNR score indicated that a recovered MR image contained little distortion and low noise. An SSIM value near 1 implied reconstructed MR images were close to the ground truth.
The utility and novelty of the proposed algorithm lie in its ability to process corrupted MR images without additional  denoising. In section III-B, an ablation study was carried out to demonstrate the effectiveness of the combined adaptive sharpening and Rician correction. In section III-C, an experiment was conducted with corrupted MR images to test how well the proposed approach performed given different noise levels. In addition, an experiment on noise-free MR images was also carried out to demonstrate the detail preservation capabilities of the proposed method (i.e. a sharp contour is obtained using the proposed method), and it is described in the following subsection. Last, an additional experiment on real clinical datasets is described section III-E.

B. ABLATION STUDY
In order to investigate the effectiveness of the combined adaptive sharpening and Rician correction, an ablation study is carried out where Rician correction and adaptive sharpening are respectively removed.
Images with noise level at 3% of the maximum intensity is used and the upscaling factor is set to 2. As shown in Table 1, the adaptive sharpening technique is a key factor in the performance improvement. Moreover, it is important to note that post Rician correction (the second column in Table 1) has a lower PSNR value than LSK-based upscaling framework (the first column in Table 1). This is possibly due to the reason that LSK is not capable of reflecting latent image structures for noisy images, and therefore leads to an inaccurate weight matrix. In this regard, the subsequent Rician correction would worsen the weight bias.

C. MR IMAGES WITH DIFFERING NOISE LEVELS
Four different noisy T1 volumes (voxel resolution 1 mm 3 , 180 × 216 × 180 voxels) from a Brainweb phantom were used in this experiment. They were corrupted by Rician noise at levels of 3%, 5%, 7%, and 9% of the maximum intensity respectively. The simulated LR data (voxel resolution 2 mm 3 ) in these four noisy volumes were respectively upsampled to 1 mm isotropic resolution using the proposed technique and various comparison methods. Fig. 5 compares the reconstruction results obtained using these different methods for each noise level. Local regions of interest are also presented to provide a better comparison. Table 2 summarizes the quantitative comparisons. We observe from Fig. 5 that the proposed method reconstructed more vivid details across all noisy images. Although it was directly applied to noisy LR images, the proposed method achieved a denoising effect comparable to that of comparison methods processing denoised LR images. On the other hand, fine details of the cortex and tissue boundaries were more apparent with the proposed method, while comparison methods which relied on pre-denoising tended to produce blurry or smooth edges. Moreover, the proposed technique also produced better interhemispheric fissures than the noise-robust methods like GPR method. It is important to note that although SCN, VDSR, DRRN, and SRFBN were able to reconstruct high-frequency image details and achieved state-of-the-art performance in several image upsampling scenarios, they are not suitable for super-resolving noisy images. For instance, fine details they reconstructed were blurred by the subsequent denoising method, so the overall performance of these deep learning-based methods (denoising+SCN, denoising+VDSR, denoising+DRRN, and denoising+SRFBN) deteriorate and even becomes visually comparable to GPR, a noise-robust zero-order based interpolation method. The degradation of these deeplearning-based methods on noisy images implies that it is important to incorporate image upsampling and denoising, so as to obtain fine details. Table 2 demonstrates that the proposed method also achieved higher quantitative values than the other algorithms we tested. Specifically, its average PSNR gain was 0.92 dB, 1.00dB and 0.48dB higher than that of the second-best method with noise level at 3%, 5%, 7% of the maximum intensity, respectively. And for noise level at 9% of the maximum intensity, our method achieves slightly lower PSNR than the compared deep-learning-based methods (VDSR, DRRN, and SRFBN). Regarding SSIM, the proposed method also achieved the highest SSIM scores comparing with all conventional methods, which indicates that our method faithfully recovered fine details. But the proposed method has lower SSIM score than the compared deep-learning-based methods. It is not surprising that our method, an interpolationbased method, gives slightly inferior performance over the deep-learning-based methods, since their models are deep enough to approximate the non-linear mapping between LR and HR images. However, these deep-learning based methods require much larger dataset e.g., DIV2K contains 1000 RGB images with 2K resolution, more complex training processing and more time-consuming than our methods. On the contrary, our method uses the input LR image only and could be generalized to any image.

D. MR IMAGES WITHOUT NOISE
We have observed in Fig. 5 that the upsampling results from other compared methods are blurred, but this degradation could be possibly caused by the pre-denoising step. Hence, in order to make a fair comparison of detail preservation ability for these upsampling techniques, we devised an additional experiment in which noise-free MR images were used.
These MR images without noise were downloaded from the Brainweb phantom and corresponding LR images (downsampled to 3 × 3 × 3 mm3) were generated using the described workflow in section III-A. As shown in Fig. 6, both GPR and NLM resulted in sever ghost artifacts along edges. This was caused by limitations of the zero-order regression-based framework, which cannot consistently reconstruct high-frequency details [10]. Conversely, the SLSR method, a second-order regression-based framework, and SCN, the model in which was trained from thousands of image patches via neural network, achieved visually superior performance near edges. Although the proposed method includes a zero-order regression-based framework, the resulting contours were comparable to that of SLSR. This confirms the adaptive weight sharpening strategy embedded in the proposed technique not only helps to remove noise, but also facilitates detail preservation.

E. REAL DATA
The proposed method was applied to two real clinical data sets and its performance was evaluated both quantitatively and qualitatively. The first was a T1 volume (260×311×260 voxels, 1×1×1 mm 3 resolution) from the HCP dataset, which contained apparent noise. It was acquired on a Siemens 3T scanner and processed by a structural pre-processing pipeline that included spatial noise removal, surface generation, crossmodal registration, and alignment in standard space [21]. The second was a T1 volume (256 × 256 × 78 voxels, 2 × 2 × 2 mm 3 resolution) acquired on a GE MR750 scanner, which contained little noise.
The first T1 volume was downsampled to a voxel resolution of 2 × 2 × 2 mm 3 and then upsampled to 1 × 1 × 1 mm 3 using the proposed technique and other conventional methods. Fig. 7 compares the reconstructed results for a sample slice processed using these methods. We observe the ground truth still exhibits considerable noise despite the preprocessing step in generation of this dataset, which indicates noise cannot be neglected in MR image upsampling. Although the blurring procedure involved in LR image generation may somewhat mitigate noise effects (i.e., the NN result included less noise than the ground truth), the other interpolation algorithms tended to amplify background noise. This excluded GPR and the proposed method, which reduced noise. Using an extra denoising step, NLM, SLSR and SCN could effectively remove noise, but fine details were also washed out. On the other hand, the proposed method achieved better detail preservation and edge reconstruction, due to higher contrast in these texture regions.
The second T1 image was directly upsampled to 1 × 1 × 1 mm 3 using SCN, SLSR, and the proposed method. It is evident from Fig. 8 that the proposed method produced reconstruction results comparable to the other two methods (in terms of detail preservation) when dealing with MR images containing low noise.

F. COMPUTATIONAL COMPLEXITY
As outlined in Fig. 1, the major computational cost of the proposed method comes from these four parts: learning the local steering kernel, computing the weight matrix, adaptive weight sharpening, and Rician noise correction step. Given an image with N pixels and the local window radius r, it takes about O(Nr 2 ) to learn the local steering kernel (including estimation of the gradient covariance matrix and computation of the weights according to Eq. (3) and (2) respectively). And then, it takes about O(N) to calculate the weight matrix. Regarding the adaptive weight sharpening process, two steps were involved: voxel classification according to the type of its neighboring patch structure, and adding or subtracting a fraction of the high-pass filtered weight matrix back to the same matrix. The computation complexity of these two steps

G. DISCUSSION
From the above experimental results, we can see that the proposed method achieves simultaneous MR image upsampling, denoising, and detail enhancement, and it outperforms other state-of-the-art image upsampling method when dealing images with noise. In general, our method contains several techniques, including LSK, adaptive sharpening on LSK and Rician correction. To discuss whether our noise robustness benefits from the proposed adaptive sharpening technique, the following experiment was carried out: the noisy LR image from HCP dataset (used in Fig. 7) was respectively processed by the proposed method with and without adaptive sharpening on LSK, and the corresponding results were shown at Figs. 9(b)-(c). It can be clearly observed that without LSK sharpening technique, the proposed method still produced noisy upsampling result. Moreover, without LSK sharpening, the resulting image appeared blurry near edges. These two findings suggested that the proposed adaptive sharpening technique contributes to the simultaneous denoising and detail enhancing in our method. As shown in Table 2, when processing noisy images, our proposed method can get better results than the deep-learning-based method. For the deep-learning based methods, we firstly perform a denoising step on a noisy image, and then use a deep-learning-based method to upsample it. We can observe from Fig. 7 that using the two-step framework cause the edges of the reconstructed SR image blurred, and the details is not as good as the proposed method, which is a simultaneous image resolution enhancement and noise removal. However, Fig. 6 illustrates that when processing noise-free images, the deep-learning-based method can obtain higher PSNR. This make sense since deep-learning-based methods use much more high-resolution images to learn the relations between LR-HR image patches.

IV. CONCLUSION
In this paper, a unified interpolation-based SR method was proposed to reduce noise during MR image upsampling. By observing that both image interpolation and denoising essentially belong to a weighted averaging framework, we conducted simultaneous image upsampling, denoising, and detail enhancement by carefully devising the weights used in the averaging process. First, the LSK was employed in weight computation to implicitly capture local geometric structure. Additionally, an adaptive sharpening of the weight matrix was integrated to enhance detailed areas and smooth flat areas. Finally, by removing Rician bias from averaged squared intensities, we adapted the averaging process to data corrupted by Rician noise.
Experiments were conducted on both simulated and real cranial MR datasets and the upsampling results, achieved using various methods, were compared both quantitatively and qualitatively. Experiments with noisy MR images demonstrated that the proposed noise-robust interpolation method produced less blurry edges than a traditional two-step upsampling framework (upsampling and denoising), although the utilized upsampling method in this two-step framework was devised for detail preservation. Additionally, although the proposed method was a zero-order regression-based, it succeeded in recovering fine brain structure, comparable to SLSR, a second-order regression-based, in noise-free MR images. These two findings confirm the proposed method successfully adapted to spatial MR data structures and was highly robust in the presence of noise.
Currently, the proposed method is mainly developed for a single modality MR image upsampling. In the future, we plan to use information from high-resolution images acquired in other modalities to further refine interpolation coefficients, and thereby improve upsampling quality.