Blind Stereo Image Quality Assessment Based on Binocular Visual Characteristics and Depth Perception

The quality prediction of stereo images has great challenges without reference images. In this paper, we propose a novel no-reference stereo image quality assessment (NR-SIQA) model based on binocular visual characteristics and depth perception, which can effectively evaluate the quality of symmetric distortion and asymmetric distortion images. To be speciﬁc, we discriminate the different binocular behaviors by analyzing binocular visual characteristics, and construct the corresponding cyclopean view instead of single cyclopean view to simulate different binocular behaviors. Then, we extract monocular and binocular visual features from the left view, the right view and the synthetic cyclopean view. Furthermore, in order to evaluate the depth quality of the stereo image accurately, we extract the depth perception features from the weighted disparity map and the longitudinal correlation coefﬁcient map. Finally, we construct the mapping relationship model from quality perception feature domain to quality score domain by training an adaptive enhancement algorithm based on support vector regression (SVR). We evaluate the performance of the proposed algorithm on four stereo image databases. The experimental results show that compared with the state-of-the-art full reference(FR), reduced reference(RR) and NR-SIQA algorithms, the proposed algorithm achieves highly competitive performance for both symmetric and asymmetric distortions.


I. INTRODUCTION
With the rapid development of 3D technology, 3D movies and television have played an important role in daily life and attracted attention from all over the world. In the process of acquisition, compression, transmission and storage of 3D images, the left and right views may introduce different degrees and types of distortion, which affects the visual quality of experience. There are many image processing methods, such as image denoising [1] and deblurring [2], which can improve the image quality. And image quality assessment (IQA) plays an important role in image processing, because IQA can evaluate whether the method can improve the image quality. Moreover, compared with 2D images, there is certain disparity between the left and right views of a stereopair, which can provide additional depth perception The associate editor coordinating the review of this manuscript and approving it for publication was Peng Liu . for the human visual system (HVS) to enable viewers to have realistic experience. But too large disparity also causes an uncomfortable experience for HVS, which also affects human eyes' judgment of image quality. Therefore, in order to provide a good visual experience to people, it is necessary to construct a stereo image quality assessment model consistent with the subjective score. Similarly to 2D-IQA metrics, 3D-IQA metrics can be classified into three categories according to the availability of the pristine reference image: the FR-SIQA methods [3]- [7] which use the original undistorted image as a reference to evaluate the quality of the stereopair; The RR-SIQA methods [8]- [10] which only utilize part of the pristine image when evaluating the quality of the image; and the NR-SIQA methods [11]- [17] which evaluate the quality of the image without any reference information. Compared with the first two categories, NR-SIQA method is more difficult, but it has wider application value in practical life. Different from traditional 2D visual perception, the visual perception quality of a stereopair depends on the qualities of the left and right views. The visual psychology research shows that the HVS can convert different views acquired by the left and right eyes into a single view, but in the process of conversion, it is not a simple addition of left and right eye stimulus to obtain a single view [18]. According to the degree of similarity/dissimilarity between binocular stimulus, different binocular phenomena may occur. Binocular fusion occurs when binocular stimulus are same or very similar [19]; binocular competition occurs when binocular stimulus are sufficiently dissimilar [20]. In [21], a study has showed that the fluctuation amplitude of the primary visual cortex (V1) activity during the binocular competition is 45-83% of the fluctuation range of the V1 activity caused by alternating the binocular stimulus. In other words, local interactions between V1 neurons can trigger sensory alternation during competition. While binocular suppression is a special case of binocular competition. When binocular stimulus are sufficiently dissimilar and important information contained in one view is sufficient to suppress the view acquired by another eye, the view contains more information dominates the perceived content. This phenomenon is called binocular suppression [22].
Most of the previous studies only constructed a single cyclopean image to simulate the final view of the left and right views fused in HVS, which ignores the different effects of different binocular behaviors on HVS perception. In [23], when evaluating the quality of stereoscopic video, the sum channel of binocular vision is constructed in the way of averaging left and right views directly for analyzing binocular visual characteristics. In [24]- [27], the Gabor filter responses of the left and right views are used to weight the left and right eye stimulus and synthesize a single cyclopean to simulate the complex binocular vision mechanism in HVS. In NR stereo image/video quality assessment metrics, although the performance of these algorithms is improved compared with the earlier methods of directly averaging the qualities of left and right views, these methods assume that one view dominates the fusion process of left and right views in HVS, and ignore the phenomenon that two views alternately dominate the fusion process in binocular competition. In addition, some SIQA algorithms [28]- [31] obtain the final qualities of stereo images by the weighted average of the qualities or features of left and right views. For example, in [31], the normalized binocular stimulus intensity is used as the weight of features extracted on the left and right views, and features are combined by weighted sum, and then the deep learning model is used to learn the quality prediction model of stereo images. Although these algorithms can improve the accuracy of stereo image quality prediction, they still cannot fully simulate the complex binocular vision mechanism.
In addition to binocular vision characteristics, another important factor that needs to be considered is depth information. Horizontal parallax is the main factor to provide human eyes with depth information, but excessive parallax can also cause dizziness and discomfort, which affects the subjective judgment of image quality [32]. The current SIQA algorithms have limited research on depth quality measurement. In [33], Based on the free energy theory and the binocular vision mechanism, Chen et al. constructs the suppression map by the phase difference of the luminance channel in the left and right views, and extracts the disparity entropy feature based on autoregressive model to construct the depth quality metric model (DPQM) for stereo video. In [34]- [36], authors extract statistical features from disparity map to predict the depth quality of the stereo image, and consider that the contribution of disparity information to depth quality is same in all regions, which ignores the influence of the longitudinal depth information on the stereo image quality evaluation. The method of [37], Karimi et al extract features from synthesized phase and contrast maps and then use a stacked neural network model to learn the predict model of stereo images. In [38], the prediction performance of the model using stacked auto-encoders (SAE) to predict stereoscopic image quality is accurate, but the complexity is high.
To solve the above problems, we propose a blind stereo image quality evaluation model based on binocular visual characteristics and depth perception. The main contributions of this algorithm are: (1) According to the degree of similarity/dissimilarity between binocular stimulus and the comparison of useful information in the left and right views, we construct the different binocular combination models instead of fixed cyclopean image to simulate different binocular behaviors, such as binocular fusion, binocular competition and binocular suppression. (2) In addition, we extract depth perception features from the weighted disparity map and the longitudinal correlation coefficient map between the left and right views, which considers the influence of horizontal and longitudinal depth information on SIQA.

II. PROPOSED METHOD
The framework of our method is shown in Fig.1. Firstly, we construct different binocular combination models based on binocular vision characteristics to simulate the complex visual mechanism. Then we divide the extraction of quality perception features into two parts: (1) We extract statistical distribution features and texture features from the left and right views as a set of image content quality perception features, and extract statistical distribution features from binocular combination model as another set of image content quality perception features. Then we combine the same features extracted from the left and right views by averaging.
(2) We extract depth perception features from the weighted disparity map and the longitudinal correlation coefficient map between the left and right views. Finally, we use SVR to train the regression model from feature domain to quality domain.

A. SYNTHETIC CYCLOPEAN MAP
In [39], several subjective experiments show that when the distortion type of stereo pairs is asymmetric information-loss distortion(ILD), such as asymmetric Gaussian blur distortion, VOLUME 8, 2020 the perceived quality of a stereoscopic image pair is dominated by the high-quality component of stereo pairs because a high-quality component containing sufficient information can complement the lost information of the other view. On the other hand, for information-additive distortion (IAD), such as asymmetric Gaussian noise, the perceptual quality of a stereoscopic image pair is dominated by the low-quality component of stereo pairs since IAD cannot be removed by the low-quality component. As mentioned in [40], the perceptual quality of a stereoscopic image pair is dominated by the view with more information, which is consistent with the psychophysical findings described in [22]. Therefore, in this paper, we select the view that contains more information as the domination view in binocular combination process. Compared with the reference [41], the proposed method does not need to classify distortion types, which reduces the complexity of the algorithm. Meanwhile the proposed method is also applicable to multiply-distorted images which cannot be classified, and reduces the error which may be caused by the classification.
To discriminate and simulate different binocular behaviors in HVS, we construct different cyclopean maps to simulate the binocular combination process. In the calculation process, we use the structural similarity model (SSIM) [42] based on stereo matching algorithm to calculate the similarity between left and right views and select the domination view in binocular combination process by comparing the amount of visual information of the left and right views. In this paper, we measure the amount of visual information by computing the entropy.
where EL is the amount of information contained in the left view, N and L are the number of channels in color space and the maximum pixel value, respectively, and subscripts k and i represent the kth channel in RGB color space and the ith gray scale, p(x ki,l ) is the probability when pixel value x in left view is equal to i. Similarly, the amount of information contained in the right view can be calculated as (1). S L,R denotes the similarity value between left and right views, we calculate the similarity value of each pixel between two views based on SSIM [42], and then calculate the structural similarity value S L,R between two views by averaging similarity values of all pixels. In addition, reference [19], [20] and [22] pointed out that different binocular behaviors can be distinguished according to the degree of similarity/dissimilarity between binocular stimulus, so we set structural similarity threshold T 1 and in [39], S. Ryu et al demonstrated the view containing more information dominates the perceived quality of stereo image, so we select the main view by setting the threshold of visual information T 2 . The specific calculation is as follows.

1) BINOCULAR FUSION
When S L,R ≥ T 1 , that indicates that the binocular stimulus are sufficiently similar. We select the left view as the domination view which is often used in previous studies. The binocular fusion model is calculated as follows.
2) BINOCULAR SUPPRESSION When S L,R < T 1 and EL − ER ≥ T 2 , that indicates that the binocular stimulus are sufficiently dissimilar and the left view contains more information. So the left view dominates the perceptual quality of a stereoscopic image pair, and binocular suppression model is calculated as follows.
When S L,R < T 1 and ER − EL ≥ T 2 , that indicates that the binocular stimulus are sufficiently dissimilar and the right view contains more information. So the right view dominates the binocular combination process, and binocular suppression model is calculated as follows.

3) BINOCULAR COMPETITION
When S L,R < T 1 and |EL − ER| < T 2 , that indicates that the binocular stimulus are sufficiently dissimilar and the left or right view does not contain sufficient information to suppress the other view, so the left and right views alternately dominate the fusion process, rather than simply taking a view as the domination view. Based on above analysis, we recombine the two suppression maps whose domination views are the left and right views respectively to simulate this alternating phenomenon. The binocular competition model is calculated as follows.
Weight W L , W R , W SL , W SR are calculated as follows.
where I L (x, y) and I R (x, y) are the left and right views, respectively, d(x, y) is the horizontal disparity value in the (x, y) location. In this paper, the disparity value is calculated by SSIM-based algorithm utilized in [6]. E L (x, y) and E R (x, y) are 2D Gabor energy responses in all scales and directions of the left and right views, respectively, E SL (x, y) and E SR (x, y) are 2D Gabor energy responses in all scales and directions of the binocular suppression maps with the left or right view as the domination view respectively, the specific calculation is as follows. 2D-Gabor filter is defined as follows.
g(x, y, λ, θ ) = 1 2π σ exp − 1 2 where x = x 0 cos θ + y 0 sin θ and y = −x 0 sin θ + y 0 cos θ, (x 0 , y 0 ) is center of the filter, λ is the wavelength, which controls the scale of the Gabor filter, θ is the orientation, σ is the standard deviation of an elliptical Gaussian envelope along x and y axes. Since the simple and complex cells in primary visual cortex have receptive fields at different scales, we utilize a multi-scale set of Gabor filter banks with five scales of frequency domain and four orientations, σ = 0.5λ, λ ∈ 1, where ⊗ represents a convolution operation, I (x, y) is input image.

B. IMAGE CONTENT QUALITY PERCEPTION FEATURES
Although distinctive in contents, natural images inherently obey a particular statistical characteristic [25], which is measurably modified by the presence of distortions. Motivated by this, we extract the brightness statistical distribution features from the left and right views and cyclopean map calculated according to section 2.1. Firstly, we use the mean subtracted contrast normalized method (MSCN) to preprocess images. Fig. 2(a)-(c) give the left and right views from LIVE 3D Phase I database [43] and the corresponding synthetic cyclopean view. The statistical distribution histograms of the MSCN coefficients of these images are more consistent with the generalized Gaussian distribution (GGD), as shown in Fig. 2(d)-(f). Therefore, in this paper, we use the GGD model to fit these statistical distributions. Given an image I with the size of M ×N, its MSCN coefficients are calculated by: where I(i,j) denotes the pixel value in the (i,j) spatial location; ω = ω k,l |k = −K , . . . , K , l = −L, . . . , L is a 2D circular symmetric gaussian weighting function. K and L determine the size of 2D Gaussian kernel (K = L = 3). A zero mean GGD is calculated by: with gamma function (·) defined as: where α and ν 2 reflecting the image naturalness, control the shape and variance of distribution, respectively. x is the VOLUME 8, 2020 MSCN coefficient. In this paper, we use the GGD model to fit MSCN distributions of left and right views and the corresponding cyclopean view, and the parameters α and ν 2 are regarded as the quality-sensitive features f 1 = α, ν 2 .
In addition to extracting statistical distribution features in spatial domain, we also extract texture features from left and right views as another set of image content quality perception features.
Local binary pattern (LBP) is an efficient local texture descriptor, which has the significant advantages of small computation, no training and light invariance. In this paper, we use rotation-invariant and uniform LBP LBP riu2 r,p to extract texture features from left and right views, respectively.
where x c is the gray value of the center pixel of the local neighborhood, x r,p,n is the gray values of p equally spaced pixels on a circle of radius r(r>0) that form a circularly symmetric neighbor set, s (x) is a symbolic function, U ≤ 2 represents the two binary values that are adjacent to each other on a the circularly symmetric neighbor set change by 0/1(1/0) no more than twice, which is in the form of In this paper, we choose p = 8 and r = 1. And then we use the magnitude of the MSCN coefficient to weight the LBP histogram to obtain 10-dimensional texture features f 2 .
where N is the number of pixels, k ∈ [0, p+1] is LBP pattern, w j is the magnitude of the MSCN coefficient.

C. DEPTH PERCEPTION FEATURES
Binocular disparity plays an important role in the depth perception of stereoscopic images. When human view stereoscopic images/video content, disparity provides depth perception for human eyes and improves the quality of the experience. However, excessive disparity tends to give viewers phenomena including eye strain, dizziness, reduced visual sensitivity, etc. Therefore, it is necessary to statistically analyze the disparity information of the stereo image pairs. However, not all regions of disparity information have the same impact on image quality assessment. As mentioned in [39], the view with more visual information will attract more attention of HVS. Therefore, based on the characteristics of human visual perception, we use the amount of visual information to weight the disparity map, so as to emphasize the influence of disparity value in the attention region of HVS on the estimation of depth quality. Firstly, we use the visual information of dominant view to weight the disparity map to obtain a weighted disparity map D e . Then since the existence of distortion will change the statistical characteristics of the disparity map, we extract some simple statistical features such as kurtosis and skewness from the weighted disparity map as depth where w e (i, j) is the weight of the disparity at (i,j), we apply equation (1) to a 7 * 7 image block centered on (i, j), and calculate the amount of visual information of the image block as the weight value w e (i, j), d(i, j) is the disparity at (i, j) which is calculated by SSIM-based algorithm utilized in [6], S, K, m and v are skewness, kurtosis, mean and variance of the weighted disparity map, respectively, M × N is the size of weighted disparity map. In addition, although horizontal disparity is the main reason for generating depth information, it reflects the horizontal position difference between the left and right retinal projections of a given point in space, thereby ignoring the influence of longitudinal differences between the left and right views on the depth quality assessment. And the longitudinal mismatch information between left and right views caused by asymmetric distortion will also affect the depth perception, and cause binocular visual discomfort, which will affect the human eye's judgment on depth quality [44]. Therefore, in this paper, we calculate the degree of linear correlation of longitudinal changes between left and right views, as a supplement to horizontal depth information.
Firstly, the disparity compensation maps of the left and right views are calculated as follows.
where d r (x, y) corresponds to the disparity values which allows to find for each pixel in the left view its the maximum similarity one in the right, d l (x, y) corresponds to the disparity values which allows to find for each pixel in the right view its the maximum similarity one in the left. Then, the mean subtracted contrast normalized operation is performed on the left and right views and their corresponding disparity compensation maps. Finally, the longitudinal correlation coefficient map between the left view and the right disparity compensation map is calculated as follows.
where corr (X , Y ) is a correlation function provided by MAT-LAB. corr (X , Y ) calculates the pairwise correlation coefficient between each pair of columns in the matrices X and Y, N (·) is the mean subtracted contrast normalized operation. Fig. 3(a)-(d) show the left and right views from LIVE Phase II and the statistical distributions of the two longitudinal correlation coefficient maps calculated by the above method. We can see from Fig. 3(c)(d), the statistical distributions of the two longitudinal correlation coefficient maps are consistent with the non-zero mean asymmetric generalized gaussian distribution AGGD.
Therefore, this paper uses the non-zero mean AGGD model to fit the longitudinal correlation coefficient map between the left view and right disparity compensation map. The non-zero mean AGGD model are calculated as follows.
f (x; α, β l , β r , µ) where α is the shape parameter and controls the shape of AGGD distribution, β l , β r are the distribution variances of left and right respectively. µ is the mean value. When µ = 0, it indicates that the distribution is zero mean AGGD. Fig. 4(a) shows the statistical distributions of the longitudinal correlation coefficient maps between the left views and right disparity compensation maps. These stereo image pairs are from the LIVE Phase I. Fig. 4(b) shows the statistical distributions of the longitudinal correlation coefficient maps of three asymmetric distortion images in the LIVE Phase II. It can be seen from Fig. 4 that the correlation coefficient distributions corresponding to images with different distortion types have different statistical characteristics and are consistent with AGGD. Therefore, we extract the AGGD parameters of the longitudinal correlation coefficient map between the left view and right disparity compensation map as depth perception features f 4 = {α, β l , β r , µ}.

D. QUALITY ESTIMATION
In this paper, we extract 44-dimensional features from each stereo image pairs. Among them, we extract 2×3 = 6 dimensions statistical distribution features f 1 and 10 × 3 = 30 dimensions texture features f 2 from the left and right views on three scales, respectively. In order to reduce the feature dimension and simulate the interaction between simple cells in the visual cortex, we combine the same features extracted from the left and right views by averaging and obtain 36-dimensional feature combinations; Then we extract 2-dimensional binocular statistical features f 1 from the cyclopean map, 2-dimensional depth features f 3 and 4-dimensional depth features f 4 .
After obtaining quality-sensitive features, we use support vector regression (SVR) to construct the mapping relationship from feature domain to quality domain. SVR has been proved to be an effective tool for solving prediction or nonlinear fitting problems. During training stage, we first extract quality-sensitive features of training database and then use the SVR to learn a prediction function from training feature vector to subjective ratings (e.g., MOS or DMOS). During testing stage, we first extract the feature vector and then predict image quality scores of testing images by feeding feature vector into the well trained prediction function.

III. EXPERIMENTAL RESULTS AND ANALYSIS
In order to test the performance of the proposed algorithm, our method is compared with some state-of-the-art NR-SIQA and FR-SIQA methods on four publicly available S3D IQA databases: LIVE 3D Phase I database [43], LIVE 3D Phase II database [6], Waterloo-IVC 3D Phase I database [45], Waterloo-IVC 3D Phase II database. Table 1 gives the specific parameters of each database.

A. OVERALL PERFORMANCE COMPARISON
In order to verify the effectiveness of the SIQA model proposed in this paper, the proposed method is compared with some state-of-the-art SIQA methods, including FR-SIQA metrics (Lin et al. [4], Khan and Channappayya [5], Chen et al. [6], SSIM [42], Jiang et al. [46]), RR-SIQA metrics (Ma et al. [8]) and NR-SIQA metrics (SINQ [15], Yang et al. [31], Zhou et al. [32], Karimi et al. [37], Yang et al. [38], Fezza et al. [41], BRISQUE [47]). In this paper, we utilize three commonly used criteria to quantify and verify the performance of the proposed method as well as all competing metrics. They are Spearman rankorder correlation coefficient (SROCC), Pearson linear correlation coefficient (PLCC), and Root mean square error (RMSE). Among them, SROCC and PLCC evaluate prediction monotonicity and consistency of prediction performance, respectively. Whereas, RMSE reflects the prediction error. For a perfect metric, the match value between the predicted scores and associated subjective ratings is close to 1 for PLCC and SRCC but close to 0 for RMSE. In this paper, the dataset requires to be randomly split into two non-overlapping subsets: training subset and testing subset. This study takes 80% dataset as training subset, while the resting is regarded as testing subset. After training the model on the training subset, the prediction performance is measured on the test subset. To avoid the performance bias, the random training-testing split is repeated 1000 times and the performance is reported in form of median value. Tables 2-3 show the overall performance of our method compared with other SIQA methods on the four publicly available S3D IQA databases. Top three performance is highlighted in boldface for readers' convenience. Compared with the comparison algorithms, the overall performance of our method is ranked in the top three on the four S3D IQA databases and is close to the best performance. The SROCC, PLCC and RMSE values of the proposed algorithm on the four S3D IQA databases are significantly better than most of the comparison algorithms, and some values achieve the most superior results on LIVE 3D Phase I database and IVC 3D Phase II database. It is proved that the proposed method can effectively evaluate the quality of symmetric and asymmetric distortion stereo images. In particular, it can be seen from table 3 that the proposed algorithm can be effectively used to evaluate complex asymmetric distortions with different types.

B. PERFORMANCE ON INDIVIDUAL TYPES
An excellent IQA metric should not only show its powerful ability on entire database, but also effectively cope with quality assessment task on each individual distortion type. In this section, an experiment is further conducted to investigate and verify the superiority of the proposed method on estimating the perceived quality of individual distortion  type. Given that Waterloo-IVC databases are more complex and contain mixed distortion types, therefore, they are not suitable for this task. Tables 4-6 tabulate the experimental results on LIVE Phase I and Phase II. Similarly, the best result for SIQA comparisons is also highlighted in bold. It can be seen from tables 4-6, although the algorithm proposed in this paper do not work well on all individual distortions, it shows the best performance on some distortion types and the top three on others. Moreover, compared with FR metrics, the proposed method shows powerful competitiveness with FR metrics. To this end, these observations verify that the proposed method can be fully competent to tackle quality assessment problem of S3D images.
C. PARAMETER SETTING Section 2.1 analyzes binocular visual characteristics, and discriminates and simulates the binocular combination models of stereo image pairs in HVS by setting thresholds. In order to select appropriate thresholds, this paper analyzes the impact of similarity threshold T 1 and amount of information threshold T 2 on the performance of the proposed model, where T 1 ∈ {0.5, 0.6}, T 2 ∈ {0.1, 0.2, 0.3}. It can be seen from table 7, when T 1 = 0.5, T 2 = 0.3, the PLCC and SROCC values of the proposed algorithm on the LIVE Phase I database are the maximum value; The performance on the LIVE Phase II database is not the best, but it is also the top three, and the gap with the best performance is small. In addition, when the value of T 1 , T 2 are increased, the performance of our model does not change much. Therefore, we set T 1 = 0.5, T 2 = 0.3.

D. CONTRIBUTE OF DIFFERENT FEATURE COMBINATIONS
According to different binocular behaviors, the algorithm proposed in this paper constructs the corresponding cyclopean image by thresholds judgment, and then extracts the quality perception features on the cyclopean image and the weighted disparity map to realize the quality prediction of stereo image. In order to analyze the effect of different feature VOLUME 8, 2020   combinations on the performance, table 8 shows the performance comparison of these algorithms with different feature combinations. In table 8, ''Mon'' represents the model that extracts shape parameters of GGD and texture features from the left and right views as monocular features; ''Mon + Bin'' represents the model that extracts monocular and binocular features from the left, right views and corresponding cyclopean map; ''Mon + EDM'' represents the model that extracts monocular and horizontal depth features; ''Bin + EDM'' represents the model that extracts binocular and horizontal depth features; ''EDM'' represents the model that extracts horizontal depth features from the weighted disparity map in addition to monocular and binocular features. In table 8, the performance comparison between ''Mon'' and ''Mon + Bin'', and the performance comparison between ''Mon + EDM'' and ''Bin + EDM'' prove that the synthetic cyclopean image proposed in this paper can improve the performance of SIQA model and can be used to simulate the fusion process. The performance comparison between ''Mon'' and ''Mon + EDM'' proves the effectiveness of depth features. And the overall performance of the algorithm has been improved by extracting binocular features and depth features which confirms the effectiveness of the features extracted by our method.

E. GENERALIZATION PERFORMANCE
In order to verify the generalization performance of the algorithm in this paper, a cross-validation experiment    Tables 9 and 10. Because the IVC Phase I and Phase II databases contain mixed distortion images with different distortion types, table 11 does not show cross-database testing for individual distortion. It can be seen that the performance of training on one database (i.e., Phase I) and testing on another database (i.e., Phase II) is not good because the distortion images contained in the training set and the test set differ greatly in image content, distortion type and distortion degree. Comparatively speaking, the prediction model trained by the algorithm in this paper has good adaptability to different test databases.

F. PERFORMANCE ON SYMMETRIC AND ASYMMETRIC DISTORTIONS
In order to further test the performance of our algorithm, we divided the LIVE Phase II database into two separate   subsets of only symmetric and asymmetric distortions and tested the performance of our method on both subsets. The results are listed in Table 12. It is difficult to simulate the fusion process happening in the HVS, because of the complex visual mechanisms. So most IQA models perform significantly worse on asymmetric distortions than on symmetric ones, but our model performs well on both.symmetric and asymmetric distortions.

G. COMPUTATIONAL COMPLEXITY
In order to evaluate the time complexity, we calculate the time it takes for our method to evaluate a stereopair in LIVE Phase I. The complexity of SSIM is very low, because it does not need to synthesize cyclopean map, and the accuracy is not as high as ours. It can be seen from Table 13, compared with reference [6], [37], [38], the complexity of our algorithm is the lowest, beacause our method extracts spatial domain features directly, which have lower computational complexity than the transform domain features. Moreover, we use our method to evaluate image quality without classifying the distortion type of images. The models in [37] and [38] use CNN and stack auto-encoders to predict image quality, although the overall performance of the models is competitive with ours, the complexity of this paper is lower than theirs. Compared VOLUME 8, 2020 with [15], the performance of our algorithm is better than it on LIVE Phase I and IVC Phase II and is competitive with it on other image databases, but the complexity is higher than it. In the future work, we should further reduce the algorithm complexity.

H. ALGORITHM CONSISTENCY
The scatter map of quality prediction value and subjective score of the proposed algorithm is shown in figure 5. It can be seen that the scatter distribution of the algorithm in this paper is closely clustered, indicating that the prediction results of the algorithm in this paper are highly consistent with human subjective evaluation.

IV. CONCLUSION
In this paper, we propose a no-reference stereo image quality assessment model based on binocular visual characteristics and depth perception. More precisely, we first discriminate different binocular behaviors based on thresholds and construct corresponding cyclopean images to simulate complex binocular visual mechanism. Then we extract image content quality-aware features from the left and right views and the corresponding cyclopean map and then extract the depthaware features from the weighted disparity map and the longitudinal correlation coefficient map. Finally, we use SVR to construct a stereo image quality evaluation model from feature domain to quality fraction domain. Experimental results show the effectiveness of the proposed 3D quality assessment technique compared to the recent state-of-the-art methods.