SAR Analysis for Real Video Super- Resolution Improvement

This paper presents a simultaneous autoregressive (SAR) analysis method to describe the unknown signal-to-noise ratio (SNR) and the texture feature of low-quality real video frames when the ground-truth images are not available. The real video images degraded by the factors, such as electronic noise, oversaturated pixels, motion blur, and compression artifacts, often result in poor motion registration estimation, which makes the performance of the existing video super-resolution (VSR) algorithms lower than expected. It is hard to estimate the SNR of the low-quality real frames without any prior knowledge. To solve this problem, we made a connection between SAR hyperparameters and the SNR of real images. The relationship expression of them was given in this paper. Using the proposed method, well-registered low-quality real video frames can be selected to decrease the root mean squared error (RMSE) of motion estimation of video frames for VSR reconstruction improvement. The anomalous low-quality frame images whose SAR hyperparameters values are inconsistent with others will be considered for removal. Synthetic experiments were designed to illustrate how the SAR hyperparameters values vary with the variation of synthetic parameters. In order to better illustrate the effectiveness of the proposed method, real low-quality videos captured under different conditions were tested under VSR reconstruction experiments. The VSR reconstruction results show that the results obtained using SAR prior analysis have sharper edges and fewer ringing artifacts than the original results. It indicates that the proposed method is helpful to obtain better results of motion registration estimation for low-quality real video images.


I. INTRODUCTION
Video super-resolution (VSR) processing is often required by medical imaging (such as Endoscopic imaging, Ultrasonic Imaging), surveillance imaging, mobile phone imaging, and other real video applications. VSR methods utilize the non-redundant information contained in the low-resolution (LR) images to reconstruct the high-resolution (HR) image [1]- [17]. Before HR image reconstruction, the LR images usually need to be aligned/registered in a unified coordinate system. This registration operation needs to estimate the motion between the target frame (reference frame) and the neighboring frames (supporting frames). It is very hard to accurately estimate motion information only The associate editor coordinating the review of this manuscript and approving it for publication was Dian Tjondronegoro . by LR observation, especially in the case of high noise disturbance [15]- [17]. It makes unavoidable estimation errors which will be propagated and amplified during HR image reconstruction and produce artifacts around image structures.
To prevent error-propagation, Bayesian framework was proposed to model the unknown HR image, the acquisition process, the motion parameters and the unknown model parameters in a stochastic sense [1], [2]. Many works [1]- [8], including our previous work [8], show that Bayesian-based super-resolution (SR) algorithms are fully-automated and do not require parameter tuning. However, when the input image sequence is seriously degraded, such as some surveillance videos, registration estimation problems still exist.
VSR algorithms based on deep learning also rely heavily on the accuracy of motion estimation [9]- [15]. One of the difficulties of the learning-based algorithm is to obtain the ground-truth images and the necessary training data. The simulated data makes their actual performance on real images often overestimated [18]. The reason is that real images are more difficult to estimate motion than simulated data. And the low-quality real images are also more difficult to estimate motion than high-quality real images.
Sufficiently high-quality motion registration estimation is still a vital issue. For low-quality real video frames, it is very hard to obtain a ground-truth image as a reference image. The quality of the supporting images also affects the accuracy of the registration estimation. How to select a suitable reference frame and a series of acceptable supporting frames among the low-quality LR images is a challenging problem. To solve this problem, this paper presents a SAR analysis method that can evaluate the quality of each LR video frame for frame selection to improve motion registration estimation.
The rest of this paper is organized as follows: Section II introduces the SAR prior model and the SNR descriptor. Section III presents the image description performance of SAR hyperparameters and the SAR analysis method, including frame-interval detection and suitable frames selection. The experimental results consist of synthetic experiments and real experiments are demonstrated in Section IV. Finally, the conclusions are drawn in Section V.

II. SAR IN VARIATIONAL BAYESIAN
The quality of HR image estimation and the accuracy of other unknown estimation depend on the accuracy of the image model, such as the TV prior model [1], SAR prior model [2], [4], and other prior models [3]- [7]. In this paper, we use SAR prior to describe the SNR and texture feature of frame images. In the following sections, we briefly describe the individual distributions used to model the unknowns under the Bayesian framework. And the SNR descriptor is presented in section II-D.
A. SAR MODEL SAR statistical model [2], [4] is a quadratic model ( 2 norm) that imposes image smoothness especially the border structures of the images and attenuates noise. SAR model just autoregressive model itself that makes it simpler and more suitable to describe the characteristics of noise video frames and reflect the differences between video frames [2], [4]. In this paper, we adopt the matrix-vector notation such that the HR image x is arranged as PN * 1 vectors, where P is the factor of increase in resolution, and N is the pixels number of LR image. SAR model is defined as where α im is the image model hyperparameter, L is the Laplacian operator. The conditional distribution of the LR image y k is given as following where k is the frame number of the frame in the sequence, B k (s k ) is system matrix, consist of downsampling matrix, blurring matrix and warping matrix, β k is the hyperparameter as the inverse variance of LR image y k , s k is the motion vector, noise term [y k − B k (s k )x] is assumed to be zero-mean white Gaussian noise.

B. BAYESIAN FRAMEWORK
According to the Bayesian inference, the distribution of the HR image x is known as a multivariate Gaussian distribution The mean value of x is expressed as and the inverse covariance is The distributions of the hyperparameters q(α im ) and q(β k ) are found as Gamma distributions given by and where a 0 β k , b 0 β k , a 0 α im and b 0 α im are the parameters of Gamma distribution, the '0' (zero) indicates their initial estimation values. The details can be found in [2].

C. HYPERPARAMETERS
To analysis the characteristics of video frames, the SAR model is independently applied to each frame based on the variational Bayesian framework. It means every single frame will be processed by (3)(4)(5)(6) alone. Since a single image does not need motion registration and downsampling matrix is a constant matrix, the matrix B k (s k ) can be simplified as blurring matrix H k . Hence, (3) is rewritten aŝ and inverse covariance (4) is rewritten by The means of the distributions in (5) and (6) can be calculated by The initial estimation values of (9) and (10) are The Calculation of α k and β k is summarized in Algorithm 1.

Algorithm 1 Hyperparameters Calculation
1: Calculate the initial a 0 k and β 0 k of the k th frame using (11) and (12) 2: while convergence criterion is not met do 3: Compute the kth frame distribution using (7) and (8) 4: Compute the distributions of the hyperparameters α k , β k using (9) and (10) Based on empirical study, the convergence criterion is set as

D. SNR DESCRIPTOR
According to (9) and (10), we can obtain that To further illustrate the meaning of (13), the formula derivation is following: Using (8) in (7) to obtain Define F is the Fourier transform matrix, identity matrix I = F −1 F, the Fourier transform of (14) can be expressed as Since H k is real circulant matrix and Fourier transform is an orthonormal basis, FH k F −1 = diag(H) and FH T k F −1 = diag(H * ), where H is the Fourier transform of H k . Letting γ = FL T LF −1 . Actually, γ is the power spectral density of Laplacian operator. So γ can be considered as a constant matrix when the size of the matrix is confirmed. (15) can be rewritten as The part of H * H + α k β k γ −1 H * is exactly like the Wiener filter W, where S n and S y represent the power spectral density of the noise n and LR image y k , respectively. We know that Wiener filter W is the optimal linear filter which minimizes the expected error E x [||y k −H k f || 2 to obtain an optimal estimatedf . Algorithm 1 process also produces an estimated imagef . Comparing the terms in (16) and (17), we can confirm that the following equation is true As mentioned above, γ is a constant matrix, (18) can be simplified to Therefore, the SNR of the kth frame whose ground-truth image is unavailable can be represented by

III. SAR ANALYSIS A. IMAGE DESCRIPTION
Based on the content correlation between video frames, the change between adjacent frame images should be tiny, and the α k curve of video frames should be smooth without mutating. When the value of α k changes dramatically and frequently, it indicates that the content of the frame images has been greatly degraded by worse noise, obviously motion or blur, which is inherently unfavorable to frame image registration.
In this subsection, we present the image description performance of SAR hyperparameters on the real video image sequence. If not specially noted, all the real sequences with unknown motion information used in this paper are from publicly available datasets provided by UCSC [19], and all the SAR hyperparameters are obtained by Algorithm 1.

1) TEXTURE DESCRIPTION
The single value of the hyperparameter α k may not be able to accurately describe the content of the images, but the difference in the α k values between the adjacent frames can reflect the change of th e adjacent frames.
Equation (9) indicates that the frame image with more details will obtain a smaller α k value. Fig.1 describes the hyperparameters values of the 26 frames in the sequence named Disk. The 4 frames, which are designated by red color in Fig. 1, are shown in Fig.2. The result proves that when the difference in α k values between two adjacent image frames is large, the corresponding image content usually changes greatly, while the image with smaller α k value has more details.

2) MOTION DESCRIPTION
The motion between adjacent frame images also influences α k values. The SAR hyperparameters values corresponding to the 40 frames in the Surveillance sequence are plotted in Fig. 3. Fig. 4 shows 4 adjacent frames marked by red circles filled with red in the graph of Fig. 3. The rectangles marked with the same color in Fig. 4 have the same position and size. By comparing the patterns of the same position on the four adjacent frames, we can see that the four adjacent frames have a displacement offset from each other. And the α k values corresponding to the 4 frames in the graph of   The hyperparameters values of the 80 frames in the Emily sequence are drawn in Fig. 5. It shows that the average of α k values after frame 64 is larger than the average of α k values before frame 64. The corresponding frame images are shown in Fig. 6. And it can be found that the girl in the sequence keeps still until frame 64, after which she moved and caused local movement.
The experimental results in this subsection show that when the α k values of adjacent frames change greatly, some visible changes will appear in the frame images. The experiment details will be presented in section IV. VOLUME 8, 2020

B. SAR PRIOR ANALYSIS
as shown in (10). Its value reveals the level at which y k is disturbed by noise and other degradation factors. In the frame-interval where the interference, such as noise, tends to be stable, we consider that the curve formed by the β k values corresponding to the y k frames in the interval should be smooth and moderate.
To achieve a better image registration from the low-quality video, we need to discriminate LR images according to the β k value of each frame image to keep the interference level of each frame image participating in the registration as consistent as possible. In the following subsections, we will introduce a method to obtain the suitable frame-interval and the reference image.

1) FRAME-INTERVAL DETECTION
The curve of β k is defined as B (k), where k = 1, 2, . . . , L. The difference is given by The threshold is expressed as T = µ(Max − Min)/L, where L is the length of the sequence, Max and Min are the maximum and minimum value of B(k), respectively, and µ is a coefficient, usually equal to 1.
We define a sign description function The point function is determined by the following calculation The interval boundary points are the points with value Point (k) = 1. Every two boundary points form an interval. All the points in the same interval have the same value Point (k) = 0.

2) SUITABLE FRAMES SELECTION
The number of frames in each interval is signed as n(t), where symbol t is the ordinal number of the interval.n is the average frame number of each interval. Let ∅ = {t|n (t) >n}. All the intervals satisfy the condition n (t) >n, will be calculated the expectation of B(k) represented by The expected intervalt is the one generates the minimum E t [| B(k)|] which means the values of β k in the intervalt tend to be stable. In other words, the frames in the intervalt are the wanted supporting frames which help to improve the registration accuracy.
The selection of reference frame from the intervalt is another important factor affecting the registration accuracy of low-quality LR images. To ensure the registration accuracy, we select the frame with the best SNR performance as the reference frame in the intervalt according to the SNR analysis mentioned in section II-D. The expression is as follows: The frame-selection flow diagram is shown in Fig.7.

IV. EXPERIMENTS
Due to the ground-truth images are unavailable, it is difficult to evaluate the accuracy of motion estimation between low-quality real images. As known that VSR results rely heavily on the accuracy of motion estimation. In this section, we used the improvements of the HR reconstruction results to demonstrate the effectiveness of our proposed method. We used the following methods for comparison: 1) Bicubic interpolation, 2) Variational Bayesian SR [1], [20] (denoted by VBSR), and 3) sub-pixel motion compensation SR method [14] (denoted by SPMC), which is a deep-learning based SR algorithm. We also experimented with Shiftand-Add method [21] contained in the MDSP SR software [22] (denoted by S&A). It should be noted that SPMC did not provide training code, we directly used their training results here, and the performance of SPMC in the experiments did not represent its best performance.
The corresponding improved algorithms equipped with proposed method are called iBicubic, iS&A, and iSPMC, respectively. VBSR, S&A, and SPMC all are state-of-theart SR algorithms. Since VBSR has been evaluated as one of the best algorithms which are fully automatic, we will focus on the performance of the iVBSR in the four improved algorithms.

A. SYNTHETIC EXPERIMENTS
In the following subsections, two synthetic experiments were demonstrated to qualitatively and quantitatively evaluate the performance of the proposed method,

1) HYPERPARAMETER ILLUSTRATION
In this subsection, we illustrate how the SAR hyperparameters values vary with the variation of synthetic parameters, such as SNR level and blur kernel size. We generated two groups of synthetic images from ''Lena'' image. The images in group 1 are degraded by additive white Gaussian noise at SNR levels of 10 dB, 15 dB, 25 dB, 35 dB, . . . , 75 dB. The images in group 2 are uniformly blurred. The kernel sizes are (1,2,3,4,5,6,7,8,9,10). Some example synthetic images are shown in Fig. 8.
The SAR hyperparameters values of group 1 images are plotted in Fig.9. As expected, the estimated SNR(β/α) value increases as the simulated SNR level increases. The growth rate of SNR(β/α) is particularly obvious, when the SNR level is below 30dB. According to (9), the more high-frequency information is contained, the smaller α k value. The curve of the α k value, in Fig.9, well describes the high-frequency information of each frame image. When the image is severely degraded (SNR level is below 20 dB), the estimated α k value drops significantly. When the image simulated SNR level is higher than 30 dB, the estimated α k value tends to be steady.
The estimated results of group 2 images are plotted in Fig.10. It should be emphasized that deblurring for each frame is beyond the scope of this paper. The blurring matrix H k of each frame, during the estimation procedure, is fixed   as an identity matrix. It means each estimated imagef is treated as a blurred image. As the simulated blur-kernel size increases, the corresponding image is degraded worse, the estimated SNR(β/α) value decreases, and the estimated α k value increases. According to (10), the performance of the estimated β k value in Fig.10 and Fig.9 reveals that the image with less high-frequency information will obtain a larger estimated β k value. VOLUME 8, 2020 The curve of SNR (β/α) in Fig.10 implies that when the error of H k increases, the growth rate of β k is lower than α k , which indicate that β k is more stable than α k . The results of the experiment indicate that the α k parameter can describe the high-frequency information of the image. However, highfrequency information may be image details or noise. A single α k value is not sufficient to accurately describe the content of the image.

2) FRAME SELECTION EXPERIMENTS
The purpose of this synthetic experiment is to illustrate the importance of the reference-image-selection in VSR. In this section, we generated 10 synthetic LR images from the original HR image shown on the left in Fig. 11 Fig. 11. Note that this resolution chart image is utilized for better illustration of the performance in resolution enhancement. In each sequence, the first image with uniform blur is chosen as the reference image, and the standard Lucas-Kanade method [23] is utilized to iteratively estimate the motion parameters of the synthetic images. The estimated motion parameters of the synthetic images at each SNR level are shown in Tables 1-3. From Tables 1-3, it can be seen that the accuracy of motion estimation is significantly affected by noise and blur. The estimation errors in 5 dB SNR case and ±5 • rotation Angle case are particularly serious.
In order to reveal the influence of the referenceframe-selection on the SR results, we made 3 combinations of the synthetic images. Group 1 are the synthetic images with rotation angles (0 • , ±1 • , ±2 • , ±3 • ). Group 2 are the synthetic   images with rotation angles (0 • , ±2 • , ±3 • , ±5 • ). Group 3 are the synthetic images with rotation angles (0 • , ±3 • , ±5 • ). All of them include SNR levels of 5 dB, 15 dB, 25 dB, 35 dB, and 45 dB. And the 5 dB synthetic image with uniform blur that has the worst quality is confirmed as a reference image for all sequences. The synthetic images with ±5 • rotation angles play as disturbance images in this experiment. It should be emphasized that all sequences in the experiment including an ideal reference frame image which is only degraded by noise. Theoretically, the SR reconstruction results should be that Group 1 is the best, and Group 3 is the worst.
The peak signal-to-noise ratio (PSNR) values provided by the algorithms are shown in Table 4. The PSNR values of the Bicubic are obtained from the interpolation of the ideal reference frame at each SNR level. For the reason of poor motion estimation, S&A and SPMC methods result in worse reconstructions than Bicubic interpolation. VBSR achieves good reconstruction results in Group 1 and 2. But in Group 3 case, VBSR also obtained worse reconstructions than Bicubic interpolation for the reason of poor motion estimation (it can be seen in Tables 1-3).
In the ''iVBSR'' column of Table 4, the PSNR values in the ''Ideal'' column corresponding to the expected VBSR results whose reference frames are the ideal reference frames. As expected, it shows that the SR reconstruction results in Group 1 are better than Group 2 and 3, and Group 3 is the worst. The PSNR values in the ''Real'' column are provided  Table 4 show that different reference frame selection in the image sequence will result in different HR reconstruction results.
PSNR evaluation is a full-reference quality measurement for pixel-wise comparisons. If the coordinate of the reference image is not the default one, the estimated HR need to be warped back to fit the coordinate of the original image to obtain meaningful PSNR values. The PSNR values in the ''Revised'' column are the calibration of the values in the ''Real'' column. Compared to the values in the ''Ideal'' column and ''Real'' column, it can be found that our method fails to select the ideal reference image. However, the PSNR values in the ''Revised'' column still indicate that iVBSR method provides better performance at all noise levels than all original methods not equipped with the proposed method. This means that if the first frame image in the image sequence is badly degraded, our method is able to find a better reference image in the image sequence, although not necessarily the best image.
The hyperparameters values of the synthetic images at each SNR level are shown in Table 5. The first frame of each sequence is the worst degraded image. It's hyperparameters values are significantly biggest than other frames in the same sequence. On the contrary, the ideal reference frame of each sequence is the best image that was only degraded by noise in this synthetic experiment. Other frames were a little degraded by the interpolation calculation introduced by warping operation in the image synthesis process. That results in the hyperparameter value of ideal reference frames smaller than other frames in the same sequence. For this reason, the proposed method missed the ideal default selection.
The reference-frame-selection strategy based on the SNR performance can effectively improve the motion estimation of the sequence frames. Take the case of 25dB in Group 1 for example. The eventual motion estimation errors of VBSR and iVBSR are shown in Table 6. In the VBSR method, the reference image is the first frame. In the iVBSR method, the frame with −1 • rotation is selected as the reference image because its β/α value is the largest in the sequence (refer to the values in Table 5). Hence, the standard value of the warping parameters in Table 6 is different. The comparison results in Table 6 show that the motion estimation errors in iVBSR are smaller than VBSR's if ignore the motion estimation errors of their 1st frame which is degraded badly. This is also proved by the RMSE values of the estimation errors in Table 6. The results of the HR reconstruction in Group 1 are shown in Fig. 12 for the 45dB case, and example HR reconstruction in Group 3 are shown in Fig. 13 for the 25dB case. In Fig.12 case, VBSR and iVBSR provide better VOLUME 8, 2020 visually enhanced restorations and much sharper edges compared to other methods. Compared with the original VBSR, iVBSR which uses reference-frame-selection can significantly reduce the ringing artifacts. In Fig.13, the performance of VBSR is worse than Bicubic interpolation, but the performance of iVBSR is still better. The HR constructions with the default ideal reference frame (Fig.13 (e)) and the selected reference frame (Fig.13 (f)) are very similar. After the coordination calibration, the PSNR value of the latter (Fig.13 (g)) is higher than the former (Fig.13 (e)).

B. REAL EXPERIMENTS
In order to evaluate the performance of the proposed method more comprehensively, DUF [10] and RBPN [24], whose deep-learning network can allow them to avoid explicit motion estimation and compensation, are added as new comparative algorithms in this subsection. And all the input sequences are the real videos provided by UCSC [19]. It should be noted that we just directly ran the code, provided by the authors of DUF and RBPN, to test the UCSC datasets. The performance of DUF and RBPN in the experiments did not represent their best performance.

1) SIMPLE TEXTURE VSR
Text sequence shows simple text pattern with global translational motion. It consists of 30 LR images. The α k and β k values of the 30 frame images are drawn in Fig.14. The 3 rd , 4 th , 10 th frame which are designated by red color in Fig. 14    are shown in Fig.15. It can be found that the frame with smaller α k value (such as frame 4) is better than other frames. The curve of β k in Fig. 14 shows that the values are drastically changing until into the frame-interval [23], [30] which can be detected by the proposed method. The SNR(β/α) values of 30 frame images in the Text sequence are illustrated in Fig.16. As can be seen that the β/α value of frame 28 is the largest in the interval [23], [30]. And it is selected as the reference frame by the proposed method.
The reconstructed HR images obtained by Bicubic interpolation and SR algorithms are shown in Fig. 17. All SR algorithms based on explicit motion estimation have better reconstruction effect than Bicubic interpolation. Fig. 17 (e) and Fig. 17 (f) obtained by SPMC and iSPMC, respectively, have obviously artifacts. But Fig. 17 (f) obtained by iSPMC is a little better. The edges of Fig. 17'(d) and Fig. 17 (h) are sharper than Fig. 17 (c) and Fig. 17 (g), respectively. It reveals that the improved SR algorithms with the proposed method have better motion registration than the original SR algorithms. Fig. 17 (i) and Fig. 17 (j) show that DUF and RBPN are not very good at reconstructing lowquality real video frames.The specific evaluation results are shown in Table 7.

2) RICHER DETAILS VSR
In this subsection, the reconstruction experiment of Book sequence is presented to better illustrate the visual VOLUME 8, 2020  The frame-interval [6], [11] marked in green is detected as stable frame-interval by the proposed method. improvement introduced by the proposed method. The Book sequence has richer details than the Text sequence, and it is closer to normal video sequence. It also contains 30 LR images. The SAR hyperparameters values of Book sequence are plotted in Fig. 18. The frame-interval [6], [11] marked as green color is detected as stable frame-interval by the proposed method. The SNR(β/α) values are drawn in Fig.19. Frame 9 with the largest β/α value in the interval [6], [11] is selected as the reference frame.
The HR construction results are shown in Fig. 20. Fig.20 (g) obtained by VBSR is fettered by the registration problem and its edges are wreathed in noise and ringing artifacts. The most difference between VBSR and S&A is the times of motion estimation. VBSR iteratively estimate the motion parameters along with the iteration of HR image reconstruction. S&A method makes only one motion estimation and reconstructs HR image very quickly. Therefore, S&A is less interfered by the registration problem. SPMC based on motion compensation is less affected by the registration estimation too. In this case, the quality of Book sequence is obvious better than that of Text sequence. Both of DUF and RBPN obtain better results. However, Fig.20 (h) reconstructed by iVBSR has sharper edges and fewer artifacts than others. The specific evaluation results can be found in Table 7. It also shows that the results of iS&A and iSPMC are better than S&A and SPMC, respectively. The experiment results indicate that the proposed method contributes to the improvement of motion registration estimation.

3) COMPLICATE MOTION ANALYSIS
The Surveillance sequence in Fig. 4 has significant jitter, as captured by a handheld device. These frames did not only follow the global translational motion model, but also include a little pitching motion. By analyzing the β k values plotted in Fig. 3, we can obtain two obvious stable frameintervals, [7], [13] and [17], [22], each of which can contribute a good SR reconstruction result. The proposed method considers the frame-interval [7], [13], which are designated with green circles in Fig. 21, as the better one because of its smaller E [| B (k)|] value, and selects frame 7 as the reference frame, since its β/α value is the largest in the interval [7], [13] (see Fig. 22).

FIGURE 21.
The α k and β k values of the 40 frame images in the surveillance sequence. The frame-interval [7], [13] marked in green is detected as stable frame-interval by the proposed method. We try to combine the two mentioned frame-intervals, and obtain a result like Fig.23 (g) with heavy ringing artifacts. It is noted that the registration model in VBSR and iVBSR defaults to 4 degree-of-freedom (DOF) motion registration. Obviously, 4 DOF registration model cannot accurately estimate the 6 DOF motion. Hence, we conclude that there may be different pitch motions between the two VOLUME 8, 2020 frame-intervals, which increases the motion DOF of the combined frame-interval from 4 to 6 or more.
The results in Fig.23 show that iVBSR with the proposed method provides acceptable visually enhanced restoration with sharper edges and fewer ringing artifacts than the original VBSR. It indicated that our method can obtain local frame-intervals with stable motion in low-quality real video to decompose the motion complexity (such as reducing the complex 6 DOF motion to simple 4 DOF motion, which only includes translation and rotation). The results of iS&A and iSPMC are also better than S&A and SPMC, respectively. The results of DUF and RBPN have obvious ringing artifacts. The evaluation results can be found in Table 7.

4) FRAME INTERVALS COMBINATION
This subsection presents Alpaca sequence VSR reconstruction results. The Alpaca sequence with heavy noise approximately follows the global translational motion model. The hyperparameters values of Alpaca are plotted in Fig. 24, which reveals the complicate content change in the sequence. The proposed method detects serval stable frame-intervals, such as [15], [20], [22], [29]. It can be considered that the frame-interval [15], [29] is segmented into two frame-intervals by frame 21. We assume that the motion DOF in the two intervals are the same, then the two frame-intervals can be combined into one large frame-interval to keep more non-redundant image information. Under that assumption, frame 20 is selected as the reference frame of the merged frame-interval according to the β/α value analysis in Fig. 25. The HR construction results of the merged frame-interval are shown in Fig.26. The SR result of Alpaca sequence obtained by VBSR (see Fig.26 (g)) is still full of ringing  artifacts. The results of iS&A and iVBSR are all acceptable and better than S&A and VBSR respectively. Due to the heavy noise, learning based algorithms all produce obvious artifacts around image structures in this case. This experiment shows that when two frame-intervals are closely adjacent, they can be combined to obtain a stable large interval. In that situation, the proposed reference-selection-method still works well and mitigates the registration problem to obtain a better SR result.

C. OTHER REAL EXPERIMENTS
More HR reconstruction results are illustrated in Fig.27. It is obvious that the results of the improvement methods are sharper and clearer than the results of the original methods. Although the motion estimation improvement in the real sequences cannot be accurately estimated due to the absence of ground-truth images, the improved HR results with less ringing artifacts can indirectly indicate that the proposed method can improve the registration accuracy.

D. BRISQUE RESULTS
In order to objectively evaluate the aforementioned resulted images whose original HR image is not available, Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) [25], [26] is selected to quantitatively measure the quality of the restored HR images. For no-reference quality assessment, BRISQUE shows strong correlations with human visual perception [18]. It scores from 0 to 100 (0 represents the best quality, 100 the worst).
The quality scores are reported in Table 7. In most cases, the performance of the improved method is better than the original SR methods. In the case of sequence filled with simple patterns, such as Text sequence and Disk sequence, the improvement of results is particularly noticeable.
No criteria are perfect, and neither is BRISQUE. For example, the quality scores of Alpaca in Table 7 show the VBSR result is the best one. In fact, the reconstructed image by VBSR (see Fig.26 (g)) is filled with ringing artifacts. But the vast majority of the quality scores can still match the image quality.

V. CONCLUSION
It is a challenging problem to estimate SNR information based only on the low-quality real video image itself. The frame with worse SNR value will often impair the accuracy of motion registration. And the estimation errors will greatly worsen the VSR results. In this paper, a SAR analysis method is proposed to descript the SNR of frame images. Based on the analysis, the frames in the original sequence will be filtered to improve the registration accuracy which will make the HR reconstruction of the original sequence better.
Based on the results of visual evaluation and BRISQUE evaluation, we can draw a conclusion that the proposed method significantly improves the accuracy of motion registration estimation for low-quality real video images. The state-of-the-art SR algorithms using proposed method can get better experimental results when dealing with low-quality real sequences. Our method is used as a preprocessing method for the SR algorithms. A simple frame-selection process gives them a greater advantage in terms of robustness and stability. YIGANG WANG received the master's and Ph.D. degrees in applied mathematics from Zhejiang University, China, in 1995 and 1998, respectively. He has been a Professor with Hangzhou Dianzi University, since 2004. He mainly works on the following topics, such as image processing, computer graphics, photorealistic rendering and realistic real-time rendering, and virtual reality. He is currently developing research in the areas of computer vision and augmented reality.