Image Quality Assessment Based on Quaternion Singular Value Decomposition

We propose an image quality assessment metric based on quaternion singular value decomposition that represents a color image as a quaternion matrix, separates image noise information using singular value decomposition and extracts features from both the whole image and its noise information. In the proposed method, the color image and its local variance are represented by using quaternion and then performing singular value decomposition. Later, 75% of singular values are taken as image noise information. We extract a luminance comparison, contrast comparison, structure comparison, phase congruency and gradient magnitude from whole color images and extract the peak signal-to-noise ratio from image noise information as features. Finally, these features are used as the input to a kernel extreme learning machine to predict the quality of the tested images. Extensive experiments performed on four benchmark image quality assessment databases demonstrate that the proposed metric achieves high consistency with the subjective evaluations and outperforms state-of-the-art image quality assessment metrics.


I. INTRODUCTION
In this time of rapid information development, the visual quality of images is becoming increasingly important. Images may be distorted during acquisition, transmission, compression, restoration, and processing [1], [2]. Image quality assessment (IQA) [3] can usually be divided into subjective image quality assessment and objective image quality assessment. Subjective IQA metrics are expensive and time consuming. Objective IQA metrics research aims to design computational models that can automatically predict image quality. Therefore, research on objective IQA metrics is significant. According to the availability of original reference images, objective IQA metrics can be classified into full reference (FR), no-reference (NR) and reduced-reference (RR) metrics [4].
Here, we aim to develop an efficient FR IQA model. The most widely used FR IQA algorithms are the peak The associate editor coordinating the review of this manuscript and approving it for publication was Varuna De Silva . signal-to-noise ratio (PSNR) and mean squared error (MSE) [5], which calculate the distortion between the reference image pixels and corresponding distorted image pixels. However, they are criticized for their poor prediction of perceived image quality.
Many effective methods have been proposed to address the issue of FR IQA. Sheikh et al. introduced the information fidelity criterion (IFC) [6] and visual information fidelity (VIF) [2], which quantify the information shared between the distorted and the reference images. Wang et al. introduced the structural similarity index (SSIM) [4], which measures distorted image quality by luminance comparison, contrast comparison and structure comparison. The multiscale extension of SSIM (MS-SSIM) [7] produces better results than its single-scale counterpart. Zhang et al. proposed a feature similarity index (FSIM) [8] that applies phase congruency (PC) and gradient magnitude (GM) in color image quality assessment. Recently, Reisenhofer et al. proposed a Haar wavelet-based perceptual similarity index (HaarPSI) [9] method that obtains local similarities using high-frequency VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Haar wavelet coefficients. Although these methods delivered good performance, there still remains room for performance improvement in the field of FR IQA models. In recent years, image quality assessment methods based on deep neural network are proposed. Zhang et al. [10] proposed a deep bilinear model for blind image quality assessment that works for both synthetically and authentically distorted images. He et al. [11] proposed a visual residual perception optimized network (VRPON) that separates the training of blind IQA into a distortion degree identification network and an image quality prediction network. Yan et al. [12] proposed a novel deep neural networks based multi-task learning approach. They introduced natural scene statistics features prediction as a task to our multi-task learning model and let this task to aid the quality prediction task. Bosse et al. [13] presented novel network architectures, incorporating an optional joint optimization of weighted average patch aggregation implementing a method for pooling local patch qualities to global image quality. Kim and Lee [14] proposed the full-reference image quality assessment metrics as intermediate training targets of the convolutional neural networks. The biggest defect of the deep learning-based IQA method is that a large number of training samples are needed, and we can just use a better-performing FR IQA method to make up for this defect. The specific approach is as follows: firstly, a large number of distorted images with different distortion levels and types are synthesized by the distortion algorithm on some reference images, and then the distorted images are labeled by the FR IQA algorithm, so that the synthesized samples can be used to pretrain the deep learning model. This is also a new application field of FR IQA algorithm.
Singular value decomposition (SVD) has been successfully used in the area of IQA. Azadeh and Ahmad [15] proposed a structural SVD-based image quality assessment (SSVD) that evaluates the image quality by reflecting on the left and right original singular vectors. Sang et al. [16] introduced an IQA metric based on a reciprocal singular value curve that obtains image quality indices by the area and curvature of image reciprocal singular value curves. We also deploy this tool to further explore a better method for gauging the perception of distortion.
Furthermore, we use an extreme learning machine in our proposed IQA method. Traditional neural networks must manually set a large number of network training parameters and easily fall into the local optimal solution. However, the extreme learning machine is a single-hidden layer feedforward neural network [17] that obtains a unique optimal solution through a set number of hidden layer nodes. Thus, it has great advantages in learning speed and generalization performance. Specifically, the kernel extreme learning machine is introduced into the kernel function, which can obtain the least square optimal solution. Here, we deploy it to implement quality prediction using various features.
The contributions of our proposed model are summarized as follows: 1) We use quaternion to represent the R, G, B three-channel and local variance of the color image, and conduct quaternion singular value decomposition to separate the noise part. 2) We extract different features from the whole image and noise part, and integrate them into our algorithm to make learning prediction by the kernel extreme learning machine. 3) We show that the proposed metric achieves high consistency with the subjective evaluations and outperforms state-of-the-art image quality assessment metrics.
The remainder of this paper is organized as follows. Section II introduces the theory of local variance and singular value decomposition of quaternion matrices. Section III presents the proposed algorithm. Section IV analyzes the experimental results. Section V summarizes the whole paper.

II. RELATED WORK A. OVERVIEW OF LOCAL VARIANCE
The local variance in the image can describe the detailed information of an image. The study [18] found that local variance is more sensitive to blurred images than other distorted images. Therefore, the local variance is used to enhance the feature representation in our proposed algorithm.
Let Var(I x,y ) be the local variance in image I, where I x,y is an image block. Var(I x,y ) is computed within a local square window centered at the pixel (x, y), which moves pixelby-pixel over the entire image. We need to convert color images to YUV space (luminance channel Y, two chrominance channels U and V) and use the luminance layer Y to compute the local variance.
The local variance in an image is defined as where I x,y is the local mean Considering that dividing the image into blocks damages the image structure correlation to some degree, we use a smoothing function to reduce such an influence. An 11 × 11 circular-symmetric Gaussian weighting function [4] w = {w i |i = 1, 2, . . . , N } is employed, with a standard deviation of 1.5 samples, normalized to the unit sum as follows: where N is the total number of elements within a window. 75926 VOLUME 8, 2020

B. OVERVIEW OF QUATERNION SINGULAR VALUE DECOMPOSITION
The concept of quaternions was first proposed by Hamilton in 1843 [19]. The quaternion q is defined as where a, b, c and d are real numbers, and i, j and k are imaginary number units and obey the rules as follows: A color image can be decomposed into red (R), green (G), and blue (B) channels. Each pixel of a color image can be represented as a single pure quaternion-valued pixel [20].
where r(x, y), g(x, y) and b(x, y) are the red, green and blue components, respectively, corresponding to the pixel at position (x, y) in the color image. According to (7), the real part of quaternion is 0. It can replace the local variance a = Var(I x,y ) to make the quaternion matrix capture more local details and structural information [21]. Thus, a quaternion of a color image can be defined as We only obtain luminance information from grayscale information by SVD. Since SVD can be applied to hypercomplex matrices [22], both luminance and chroma information can be obtained from color images by quaternion singular value decomposition (QSVD) [23], [24], i.e., more information of images can be obtained by QSVD. Let X (q) the m × n quaternion matrix with rank r. Then, X (q) can be decomposed uniquely, as follows: where U (q) is an m × m quaternion unitary matrix, S (q) is an m×n diagonal matrix (i.e., the diagonal elements are singular values σ ), and V (q) is an n × n quaternion unitary matrix. Let H conjugate transpose, and U (q) , S (q) and V (q) are defined as where is the column vector of unitary matrix V (q) , and σ k (k = 1, 2, . . . , r) is the diagonal element of diagonal matrix S (q) , r = min(m, n). In addition, the singular values σ k are in descending order, i.e., σ 1 > σ 2 > · · · > σ r . The quaternion singular value decomposition of color image X (q) can also be denoted as the sum of the products of σ i and u i(q) v H i(q) as follows: where u i(q) v H i(q) is a color feature image of X (q) . The color image X (q) can be treated as the linear combination of r color feature images. The ith singular value σ i denotes the luminance and chroma information of the ith color feature image. Specifically, the singular value denotes the weights of the color feature image in the color image X (q) . The advantage of quaternion singular value decomposition is that it represents the color image as a whole and overcomes the shortcoming of traditional singular value decomposition.

III. OVERVIEW OF THE PROPOSED MODEL
The framework of the proposed model is shown in Figure 1. First, the reference image and distorted image are represented by using the quaternion. Specifically, we take the R, G, B components of the color image as the imaginary parts of the quaternion and the local variance as the real part of the quaternion. Second, we obtain image noise information by SVD. Third, we extract the luminance comparison, contrast comparison, structure comparison, PC and GM from the whole color image and extract the PSNR from image noise information. Finally, the six generated features are fed into a kernel extreme learning machine to predict the quality of the tested image.

A. IMAGE NOISE INFORMATION EXTRACTED FROM QSVD
Generally, the sum of the first 10% or even 1% of the singular values accounts for more than 99% of the total singular values, i.e., most of the energy of the image is concentrated on the color feature image corresponding to the larger singular values. Liu and Lin [25] proved that the latter 75% of singular values represent the data in which noise is the dominant factor. Thus, the noise information of the color image can be defined as where r is the rank of a color image, p ∈ (0, r) is a threshold (here we set p = 0.25r). N o and N d are the noise information of the reference image and the distorted image, respectively.

B. FEATURES EXTRACTED FROM THE WHOLE IMAGE
Natural image signals are highly structured, i.e., their pixels have strong dependencies. It is generally known that each position of the image is considered to have different importance. Therefore, we extract features by referring to SSIM and FSIM.
SSIM combines luminance, contrast and structure information to assess image quality. We take the mean of the image as luminance information, the standard deviation of the image as contrast information, and the covariance of the image as structure information. The luminance comparison l(x, y), contrast comparison c(x, y) and structure comparison s(x, y) = σ xy + C 3 σ x σ y + C 3 (18) where µ x and µ y are the mean of the distorted image block and the mean of the corresponding reference image block, respectively, σ x and σ y denote the standard deviation of the distorted image block and the standard deviation of the corresponding reference image block, and σ xy and σ x σ y denote the covariance between the distorted image block and the corresponding reference image block. The constants C 1 , C 2 and C 3 are included to avoid instability when the denominator is very close to zero. Specifically, we choose where L is the dynamic range of the pixel values, K 1 1 and K 2 1 are small constants. We set K 1 = 0.01 and K 2 = 0.03. SSIM has a deficiency that when pooling a single quality score from the local quality map, all pixel positions are considered to have the same importance. However, the edge areas of the image are definitely considered to be more important than the smooth areas [26]. Based on the above analysis, we also refer to FSIM in our proposed model. FSIM combines PC with GM to assess image quality. PC captures highly informative features and can be considered a measure of the significance of a local structure. GM also describes the contrast information of the image.
Rather than defining features directly at points with sharp changes in intensity, the PC model postulates that features are perceived at points where the Fourier components are maximal in phase. The method developed by Kovesi was employed in FSIM [27]. We can apply the 1D analysis over several orientations and then obtain the result using Gaussian as the spreading function. The PC of an image is defined as G y (x).
where E o (x) is the local energy, T o is a noise compensation factor, A no (x) is the local amplitude on scale n, o is the orientation, and ε is a small constant. Gradient magnitude can be expressed by convolution masks. Three commonly used gradient operators are the Sobel operator [28], the Prewitt operator [28] and the Scharr operator [29]. Here, we adopt the Scharr operator. The GM of the image is defined as where G x (x) and G y (x) are the partial derivatives of the image along the horizontal and vertical directions, respectively.

C. FEATURES EXTRACTED FROM IMAGE NOISE INFORMATION
Image noise is closely associated with image perceptual quality. Thus, we extract features from image noise information. Specifically, we calculate the similarity between the reference and distorted image noise by using the PSNR. PSNR calculates the distortion between the reference image pixels and corresponding distorted image pixels. Although it is not consistent with the human visual system (HVS), PSNR is good at assessing noisy images. The PSNR of the m×n image is defined as where L is the dynamic range of the pixel values, i and j denote the row and column of the image, x ij and y ij denote the pixel values of the distorted image and reference image, respectively.

D. KERNEL EXTREME LEARNING MACHINE
After extracting features, we use an extreme learning machine (ELM) to map generated features to image quality. The ELM is a single-hidden layer feedforward neural network (SLFN) that can randomly initialize the weights and biases of the input and obtain a unique optimal solution only by setting the number of hidden layer nodes. The extreme learning machine is mathematically modeled as where w i = [w i1 , w i2 , · · · , w in ] T is the weight vector connecting the ith hidden node and the input nodes, β i = [β i1 , β i2 , · · · , β im ] T is the weight vector connecting the ith hidden node and the output nodes, b i is the threshold of the ith hidden node. The structure of the extreme learning machine network is shown in Figure 2. The ELM can be easily implemented and runs extremely fast. However, there is currently no effective method for accurately estimating the number of nodes of the hidden [30]. ELM is bedeviled by the curse of dimensionality, which runs in high-dimensional feature space [31]. To improve robustness and the capability of nonlinear approximation, Huang et al. [32] added the kernel function to an extreme learning machine. We call it the kernel extreme learning machine (KELM). The kernel matrix needs to satisfy the Merce conditions and is defined as where K (x i , x j ) is the kernel function (we choose the Gaussian RBF kernel function), and is the kernel matrix. The kernel extreme learning machine is mathematically modeled as where I is the unit diagonal matrix, and C is the penalty coefficient. The parameter I C is added to the leading diagonal of diagonal matrix H H T . According to (28), the weight of the output nodes is β = ( I C + ) −1 T .

A. IMAGE DATABASES
Four publicly available image databases are used to test the performance of the proposed method, including CSIQ [33], LIVE [34], TID2008 [35], and TID2013 [36]. The details of four image databases are summarized in Table 1.

B. PERFORMANCE METRICS
Four commonly used performance metrics are employed to evaluate the IQA metrics. They are the Spearman rank-order correlation coefficient (SROCC), the Kendall rank-order correlation coefficient (KROCC), the Pearson linear correlation coefficient (PLCC), and the root mean squared error (RMSE) [37]. A better objective IQA measure is expected to have higher SROCC, KROCC and PLCC and lower RMSE.
To evaluate the performance of the IQA metrics, before calculating PLCC and RMSE, a five-parameter logistic mapping between the objective outputs s o and subjective scores is given by:    where s o is the objective outputs, s m is the mapped score, and β i , i = 1, 2, 3, 4, 5 are the parameters to be fitted by minimizing the sum of squared errors between the mapped score s m and subjective scores. We use the 80% images in each database for training and the remaining 20% images for testing. The experiments are repeated one hundred times, and the average value of the evaluation results is taken as the final result.

C. PARAMETER SETTING
We set Elm_type = RBF_kernel and Kernel_para = 1. The parameter Regularization_coefficient is equal to the number of training sets. The numbers of training subsets in the CSIQ, LIVE, TID2008, and TID2013 databases are 866 × 0.8 ≈ 693, 982 × 0.8 ≈ 786, 1700 × 0.8 ≈ 1360, and 3000 × 0.8 ≈ 2400, respectively. The SROCC of our proposed model with different parameters on each database is listed in Tables 2-5. The parameters Kernel_para and Regularization_coefficient are simplified to K and R, respectively. It is obvious that the SROCC value is optimal when K = 1 and R is equal to approximately the number of training subsets. For unity, the parameter R is set as the number of training sets.

D. THE ADVANTAGE OF KELM
The kernel extreme learning machine has an advantage against other regression tools. We listed the SROCC, PLCC, KROCC and RMSE of our proposed model by support vector    regression (SVR) [38], ELM, online sequential extreme learning machine (OSELM) [39], OSELM_VARY [39] and KELM on different image databases across a hundred trials in Tables 6-9. The quantity of initial training data used in the initial phase of OSLEM and OSELM_VARY is 56. The size of the block of data learned by OSELM at each step is 1. The range from which the size of the data block is randomly generated at each iteration of the sequential learning phase in OSELM_VARY is [10,30]. The results show that the KELM has the best performance improvement.

E. CONTRIBUTE OF LOCAL VARIANCE
To make the quaternion matrix capture more local details and structural information, the real part of quaternion is set with local variance of a color image in our proposed model. We also implemented a comparative experiment to study the contribution of local variance. The SROCC, PLCC, KROCC and RMSE values on different image databases are listed in Table 10. The top results are highlighted in boldface. It is obvious that the local variance contributes to the performance of our proposed model.

F. EACH FEATURE/FEATURE VECTOR CONTRIBUTION
We extracted luminance comparison, contrast comparison, structure comparison, PC and GM from a whole color image  and extracted PSNR from image noise information in our proposed model. The generated feature vector of the jth image is defined as where l j , c j , s j , pc j , g j and n j are luminance comparison, contrast comparison, structure comparison, PC, GM and PSNR, respectively. We randomly combined the extracted features. The combination of the luminance comparison, contrast comparison and structure comparison is called F s , and the combination of PC and GM is called F f . The PSNR is only called F p . The SROCC, PLCC, KROCC and RMSE for different feature combinations on different image databases across a hundred trials are listed in Tables 11-14.
It is obvious that all six features delivered the best performance. The GM and contrast comparison represent the contrast information of the image. We also list the SROCC, PLCC, KROCC and RMSE for the different feature combinations on TID2013 in Table 15. For simplicity, the GM is denoted F GM , the contrast comparison is denoted F c , and the remaining features form a feature vector F other . The results again confirm that the combination of GM and contrast comparison is the best choice.
From Table 16, it is obvious that the proposed method has the 20 top results on different distortion types and is superior to the SSIM, FSIM and PSNR metrics.

H. PERFORMANCE ON FOUR DATABASES
In this section, we compare the overall performance of the competing IQA metrics. These metrics include SSIM, FSIM, PSNR, SCQI [42], LLM [43], SPSIM [44], RVSIM [45], HaarPSI [9], SSVD [15] and SPCM [46]. The results show that the proposed method performs best on four databases. It is obvious that the proposed method is superior to SSIM, FSIM and PSNR metrics. Therefore, the proposed model is efficient and feasible. And it shows  that the proposed metric achieves high consistency with the subjective evaluations and outperforms state-of-the-art image quality assessment metrics. Figure 3 plotted the scatter distributions of subjective MOS versus the predicted scores of the proposed method on different databases. The curves shown in Figure 3 were obtained by a nonlinear fitting, according to (29). It is clear that the proposed method provides a satisfactory fitting effect and achieves very high consistency with human subjective evaluations.

I. CROSS-DATABASE VALIDATION
To test the feasibility and robustness of the proposed method, a cross-database validation is conducted. The results are listed in Table 18. Our metric outperforms all of the other three methods.

V. CONCLUSION
We proposed an image quality assessment metric based on the singular value decomposition of quaternion matrices. Since the correlation of red, green, and blue channels of color images, the characteristics of local variance, the acquisition of image noise information, and the feature combination are considered, the generated features can provide more comprehensive information from different angles. Furthermore, the selection of a kernel extreme learning machine further improves the performance of the proposed method. The experimental results show that our proposed method achieves very high consistency with human subjective evaluations.