Segmented Spherical Projection-Based Blind Omnidirectional Image Quality Assessment

In contrast with traditional images, omnidirectional image (OI) has a higher resolution and provides the user with an interactive wide field of view. OI with equirectangular projection (ERP) format, as the default for encoding and transmitting omnidirectional visual contents, is not suitable for quality assessment of OI because of serious geometric distortion in the bipolar regions, especially for blind image quality assessment. In this paper, a segmented spherical projection (SSP) based blind omnidirectional image quality assessment (SSP-BOIQA) method is proposed. The OI with ERP format is first converted into that with SSP format, so as to solve the problem of stretching distortion in the bipolar regions of ERP format, but retain the equatorial region of ERP format. On the one hand, considering that the bipolar regions of the SSP format are circular, a local/global perceptual features extraction scheme with fan-shaped window is proposed for estimating the distortion in the bipolar regions of OI. On the other hand, the perceptual features of the equatorial region are extracted with heat map as weighting factor to reflect users’ visual behavior. Then, the features extracted from the OI’s bipolar and equatorial regions are pooled to predict the quality of distorted OIs. The experiments on two databases, namely CVIQD2018 and MVAQD databases, demonstrate that the proposed SSP-BOIQA method outperforms the state-of-the-art blind quality assessment methods, and is more consistent with human visual perception.


I. INTRODUCTION
With the rapid development of virtual reality (VR) technologies, omnidirectional visual contents play an important role in VR systems and are used in multimedia, industry, medical care and business fields [1]. During omnidirectional image (OI) processing, compression and transmission, the degradation of the OIs may be generated [2]. Different from the ordinary images being presented on flat display, OIs are viewed with head mounted display (HMD). In this viewing mode, users can only view part of the OI at a certain time as a viewport in HMD, and the distorted viewport will be magnified, which greatly affecting the users' experience [3]. Therefore, how to evaluate the quality of OIs effectively and objectively is an urgent problem to be solved.
The associate editor coordinating the review of this manuscript and approving it for publication was Abdel-Hamid Soliman .
In VR system, OIs are captured and represented in spherical form, however, this form is not easily to be stored or transmitted directly. Thus, the motion picture expert group (MPEG) has developed the omnidirectional media application format (OMAF) for encoding, storing, transmitting and rendering omnidirectional media [4]. In OMAF, equirectangular projection (ERP) is used as the default projection to project the original spherical signal onto a two-dimensional (2D) plane for compression and transmission with the existing encoding standards [5]. But the OI with ERP format (hereinafter, denoted as ERP-image for convenience) will lead to redundancies and geometrics distortion with the latitude, especially in the bipolar regions of ERP-image due to stretching. Hence, it is indispensable to develop the objective omnidirectional image quality assessment (OIQA) methods to evaluate and optimize the VR systems.
Generally, image quality assessment (IQA) metrics can be categorized into the full-reference (FR), reduced-reference VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ (RR) and blind/no-reference methods. FR-IQA metrics need all the information of the original image, RR-IQA metrics only need part information of the original image, while blind IQA (BIQA) metrics predicate the image quality without any reference information [6]. For FR-IQA metrics, some classic metrics were proposed for ordinary 2D images, such as peak signal-to-noise-ratio (PSNR), structural similarity (SSIM) [7] and visual information fidelity (VIF) [8], and so on. Because of the OI's particularity which is different from the ordinary image, these FR-IQA metrics cannot estimate OIs' quality accurately. Thus, Yu et al. [9] proposed spherical PSNR (S-PSNR) on the spherical surface to adapt the traditional PSNR for OIs. Sun et al. [10] noted that the distortion degree of the ERP-image is related to the latitude of corresponding pixels, and proposed weighted to spherically uniform PSNR (WS-PSNR). Unlike S-PSNR, WS-PSNR does not require projection transformation, however, it cannot be calculated across formats. Zakharchenko et al. [11] presented Craster parabolic projection PSNR (CPP-PSNR) to solve this crossformat limitation. Zhou et al. [12] proposed the weighed to spherically uniform SSIM (WS-SSIM), which analyzed the stretching distortion problem in ERP-image. Although the above methods have improved OI's quality prediction to a certain degree, they did not consider user's visual perception well. Therefore, Li et al. [13] utilized the characteristics of user behavior and proposed to map the user's head movement (HM) and eye movement (EM) into deep learning model to estimate the quality of omnidirectional video. Kim et al. [14] constructed a deep-learning-based omnidirectional IQA method through quality predictor and human perception guidance module.
Since the distortion-free reference image is usually difficult to be obtained in practical applications, some blind image quality assessment (BIQA) methods were proposed to predict the quality of ordinary 2D images instead of FR-IQA methods. For example, BRISQUE [15] and OG [16] extract natural statistical features in the spatial domain or gradient domain, and pooling with the traditional machine learning methods. These BIQA methods are called opinion-aware BIQA methods. On the contrast, the opinion-unaware BIQA methods do not require subjective opinions for training, such as NIQE [17], dipIQ [18] and BPRI [19], but they cannot be universally applicable in all distortion types. Although these BIQA metrics have great consistency with the human visual system (HVS) in predicting the quality of ordinary 2D images, they cannot achieve satisfactory results in predicting the OIs' quality. In addition, most of the objective quality evaluation approaches for OI are mainly based on the ERP format. However, ERP format has severe distortion at the bipolar region resulted from stretching [5], which will cause certain errors in the absence of comparison of reference images.
With the above analysis, a new segmented spherical projection based blind omnidirectional image quality assessment (SSP-BOIQA) method is proposed in this paper. The segmented spherical projection (SSP) representation of the OI, denoted as SSP-image, eliminates the stretching distortion in the bipolar regions of the ERP-image by mapping the bipolar regions into two circles. Meanwhile, it retains the advantages of intuitive and continuous boundary in the equatorial region of the ERP format. Additionally, by testing the proposed method on two IO databases, we verify its effectiveness in quality estimation and superiority over the state-of-the-art IQA approaches. The main contributions of this paper are listed as follows.
(1) Considering that OIs are in fact with spherical form, and there are serious stretching distortion in the bipolar regions of the ERP-image which decreases the accuracy of quality assessment of OI, especially for blind quality assessment, even though ERP format is the default for OI's encoding and transmission, the assessment is implemented on OI with SSP format instead of ERP format because SSP format is more in line with the real feeling of the user through HMD.
(2) Considering that the bipolar regions of SSP format are circular, a local/global perceptual feature extraction scheme with fan-shaped window is proposed for estimating the distortion in the OI's bipolar regions. Since the OI distortion process will change and produce unnatural local anisotropy, especially in the circular regions, a relative gradient orientation (RGO) is used to explain the structural features of OI. Moreover, in order to adapt to image with different geometric shape, an improved circular local binary pattern (CLBP) operator is designed by replacing the square neighborhood with the circle neighborhood.
(3) Considering that the user's behavior characteristics when viewing OIs through HMD can also affect the subjective rating of the OIs, the perception characteristics represented with heat map or significance map are used to further simulate the human visual system (HVS) so as to improve the performance of the proposed SSP-BOIQA method.
The remainder of this paper is organized as follows. Section II discusses the related work. Section III describes the proposed method in detail. Section IV presents the experimental results and analyses. Finally, the conclusion and future works are given in Section V.

II. RELATE WORKS
In this section, OI's SSP and ERP formats are briefly described and user's visual behavior characteristics of viewing OIs with HMD are discussed.
In VR system, the omnidirectional image/video with sphere projection format was first mapped to a 2D plane, and then compression and transmission by using the existing 2D image/video coding standards. ERP format is the default mode in OMAF. As shown in Figs. 1(a) and 1(b), ERP maps the spherical longitude lines and the latitude lines to vertical lines and horizontal lines with constant spacing respectively, so that bipolar regions in the sphere are stretched dramatically at the top and bottom of the ERP-image. Although the mapping relationship in ERP is simple and the boundary is continuous, ERP will cause bit waste in encoding and serious stretching distortion at the bipolar regions. All of these results  in a gap between the omnidirectional image/video in ERP format and the practical sphere projection format viewed through head mounted display (HMD).
SSP [20] develops an approximate equal-area mapping, and it is of compression-friendly shapes with less wasteful pixels. In Fig. 1(c), SSP segments the sphere into three tiles: the north polar, equatorial and south polar regions. The north and south poles are mapped into two circles, and the projection of the equatorial segment is the same as the ERP format. The stretching distortions in the bipolar regions in ERP format, as shown in Fig. 2(a), may lead to errors in some BOIQA methods. The SSP format not only retains the visual and continuous boundary at the equatorial region of the ERP format, but also improves the serious stretching distortion of the bipolar regions, which is more in line with the real feeling of user through HMD, as shown in Fig. 2 In addition, users can select the viewport in HMD to focus on the attractive contents of OIs through their HM, and the EM determines which region will appear in the viewport. Therefore, it is helpful to evaluate the objective quality of OIs by analyzing the viewing behavior. Recently, a number of data sets including HM and EM data for omnidirectional image/video visual attention models have emerged to track and analyze user's viewing behavior. Abreu et al. [21] established one of the earliest attention dataset for OIs, including the HM data of 32 subjects. Rai et al. [22] provided a visual attention dataset (VAD) containing 60 OIs. Except for visual attention models based on subjective experiments, some saliency prediction methods for omnidirectional image/video have appeared. Zhang and Chen [23] presented a novel two-stream neural network for video saliency prediction, which can learn saliency related spatiotemporal features from human fixations.
In summary, in order to establish effective objective BOIQA metrics, it is necessary to consider the unique perception features of OIs. First of all, even though ERP format is compatible with the existing image/video coding standards, it will result in bit waste in encoding and serious stretching distortion in bipolar regions, the latter will make impact on the accuracy of objective BOIQA with the ERP format. Moreover, the process of viewing OIs through HMD is interactive, and the degree of user's interest in the content of the OI is inconsistent, which provides a way to evaluate OI quality based on user behavior. Therefore, in this paper, we will develop a SSP-BOIQA method to overcome the problems existing in the bipolar regions of ERP format. At the same time, we also consider the characteristics of user's behavior, and use the subjective heat map or objective visual attention as a weighting factor to further optimize the quality model.

III. PROPOSED SSP-BOIQA METHOD
From the perspective of OMAF compatibility, this paper proposes a SSP-BOIQA method,which mainly focuses on OI quality prediction in both of the bipolar regions and the equatorial region. The framework of the proposed method is shown in Fig. 3. Firstly, the ERP-image is converted into the SSP-image, so as to solve the stretching distortion problem of the bipolar regions in ERP format. Secondly, in the bipolar regions, considering that the regions are circular in the SSPimage, a local/global perceptual feature extraction scheme with a fan-shaped window is proposed for estimating the distortion in the OIs' bipolar regions. Thirdly, in the equatorial region, the perceptual features are extracted with heat map as weighting factor to reflect users' visual behavior. Finally, the extracted feature set is used as the input of random forest to simulate the nonlinear relationship between the feature space and the human opinion score.
For the sake of convenience, let I ERP denote the distorted OI with the ERP format, I SSP be the SSP-image converted from I ERP , Similarly, for I E , the structure and contrast features are extracted, called as {F Estr , F Econ }. The method described in this section takes OI2 in MVAQD database [30] as an example.

A. THE BIPOLAR REGION FEATURES OF SSP-IMAGE
The ERP images have severe stretching distortion in the bipolar regions. To overcome this shortcoming of the ERP format, the SSP format projects the bipolar region into two VOLUME 8, 2020 up and down circles, which makes it more able to show the original appearance of objects than ERP format. However, there also exists a defect, that is, features in the circular regions are difficult to be extracted by using rectangular window. Therefore, a new feature extraction method based on circular region is proposed. Here, the same features are extracted from the bipolar regions, and the proposed method is described in detail by taking the north polar region, I N , as an example.
Detail features are important to the visual perception, especially in the flat areas of the bipolar regions where the loss of the details is more likely to be magnified and found. In addition, the block effect in the plane domain produced by compression warps due to the conversion, changes to radial blocking and radial banding distortion. As an example, Fig. 4 shows the radial blocking and radial banding distortions. It can be found that when viewing the bipolar region of the distorted OIs through HMD, the radial blocking and radial banding distortions are similar to those in the SSP-image, which means that SSP-image is more suitable for quality assessment of OI compared with ERP-image.
Considering that the traditional rectangular window does not fit the shape of bipolar regions, and cannot detect the edge of the circular area well to represent this unique distortion type, a fan-shaped window, as shown in Fig. 5, is proposed to replace the traditional rectangular window. The fan-shaped window scheme can effectively be used to describe the bipolar regions' distortion. Let W fan denote a fan-shaped window, and (i, j) ∈ W fan , θ be the constant central angle of W fan , while r be the variable ring width of the W fan to ensure that the area covered by each fan-shaped window is almost equal. The north polar region, I N , is divided into non-overlapping image blocks with sliding W fan . Then, the mean standard deviation, σ , of all image blocks of I N is calculated as  where N is the number of image blocks, n is the number of pixels in the corresponding W fan , and I W fan N _m is the mean of the pixel value of all pixels in the W fan . Then, σ is used as the local detail feature of the bipolar regions of the OI.
Cortical neurons are highly sensitive to anisotropic information in images, and image distortion processes can modify and create unnatural local anisotropies [24]. Hence, it is of great significance to study the direction information of the image to improve BOIQA method, especially in the circular region of image. In this work, relative gradient orientation (RGO) is used to explain the structural features of images. Fig. 6 shows the RGO maps, from which the difference between the original image and its distorted images can be clearly distinguished.  By calculating the gradient of each pixel in horizontal and vertical directions, the feature of the gradient direction in I N is extracted. Let { G N (i, j)} denote the gradient direction map of I N , then, G N (i, j) can be computed by where * denotes convolution operator, and P h and P v represent gradient operators of horizontal and vertical directions. For example, Prewitt operators or other gradient operators can be used to compute G N (i, j). Orientation is relative, it may be measured absolutely, against the frame of reference of the image coordinate system, or it can be measured in a relative manner against the background of local orientations. RGO features can capture departures from the natural distribution of local orientations caused by local degradations of image structure. Therefore, RGO features are used to describe the degradation of the distorted image structure, and the estimated RGO, { G N _RGO (i, j)}, can be computed by where G N _m (i, j) is the local mean orientation of north polar region in the SSP-image, which is calculated by where G N _vm (i, j) and G N _hm (i,j) represent the mean of gradient values of the horizontal and vertical directions in a window centered at position (i, j), respectively. Fig. 7 shows RGO distributions of an OI with regard to five kinds of distortions. It is shown that distortions cause the RGO distribution deviating from that of the original OI, and different distortion types lead to different changes. For example, HEVC, JPEG and JP2K distortions generate block effect due to coding compression, which adds pseudo-edge structure to make the peak higher, while white noise (WN) introduces noise to make the distribution flatter. The distribution conforms to the generalized Gauss distribution (GGD) model with zero mean value, which can be fitted as where (·) is the gamma function, and expressed as where µ is the mean brightness value of the image block, α is the shape parameter, and β is the scale parameter. Likewise, µ, α and β, which can be estimated with the moment matching-based method, are added to the quality-aware feature set as the global structure features in this work. Usually, the scene of the OI's bipolar regions are relatively simple because of the shooting conditions, but the texture information will be changed due to distortions, so the texture information is also worth to be considered. Local binary pattern (LBP) operator [25] can be used to describe the relationship between the center pixel and its surrounding neighbors by computing gray-level differences. The traditional LBP operator is calculated in a square neighborhood, which cannot obviously meet texture features of images with different geometric shapes, especially, in the case of the SSP's bipolar regions. In order to adapt to bipolar regions of OI with SSP format and extract effective texture information at the edges of these regions, the traditional LBP operator is improved by replacing the square neighborhood with the circular neighborhood.
Specifically, the circular LBP (CLBP) of each pixel in the circular image can be computed by where u is the uniform measure, and the superscript riu2 denotes the rotation invariant ''uniform'' patterns with u value less than 2, P is the number of pixels in the neighborhood, and R is the neighborhood radius; I Nc denotes the pixel value of the center pixel, and I i N is the pixel value of the ith pixel in the neighborhood. s(·) is a threshold function and defined by CLBP describes the relationship between the pixels of the circular image in the bipolar regions, which can effectively VOLUME 8, 2020 reflect the complex degradation caused by various distortions. In the bipolar regions, the area of interest to human eyes will make greater contributions to the subsequent extraction of perceptual texture features. Therefore, the histogram of CLBP map is weighted by using heat map as the weight, and the CLBP histogram weighted with heat map is obtained by where k ∈ [0, K ] is the possible CLBP patterns, and H N (i, j) denotes the heat map of the north polar region. Fig. 8 shows the CLBP map of the original OI, its distorted OIs, and the weighted CLBP histogram respectively. It can be seen from the figure that different types of distortion result in different CLBP maps, and the difference in the weighted CLBP histogram is obvious. CLBP map has P + 2 different modes and different types of distortions result in different distributions of CLBP modes. Among them, the mode 0 stands for bright spot, the modes 1-7 represent the edge areas, the mode 8 denotes dark or flat area, and the mode 9 is the non-uniform patterns that are less encountered in natural OIs. For example, HEVC and JPEG compression increase the mode 8 in the map, because these two compression distortions can cause the block effect, resulting in a large number of flat image blocks. WN reduces the modes 2-6 in the map and increases the mode 9, because it reduces the edge information but increases the non-uniform texture which is not common in natural images.
According to the above analysis, the extracted histogram of the CLBP map can effectively summarize the effect of image distortion on texture information, so it is taken as the global texture feature of the bipolar regions.
Finally, the perceptive feature vector extracted from the I N has a total of 14 dimensions, denoted as F N = {F Nl , F Ng1 ,F Ng2 }, where F Nl is the 1-dimensional local detailed feature extraction from the I N , F Ng1 is the 3-dimensional global structure feature vector extracted from the I N , F Ng2 is 10-dimensional global CLBP feature vector extracted from the I N . Similarly, 14-dimensional feature vector are also extracted from the I S , denoted as F S = {F Sl , F Sg1 ,F Sg2 }.

B. THE EQUATORIAL REGION FEATURES OF SSP-IMAGE
The equatorial region in the SSP-image is the same as that in the ERP-image, and the image information content is richer than that in the bipolar regions. In view of this, the entropy in the multi-scale phase consistency (PC) map is extracted as the global structure feature. The heat map is used as the weight to guide the extraction of image blocks, and the histogram statistics of image blocks are considered as the local contrast feature.
The phase information of images plays an important role in visual perception system, which conforms to the characteristics of human eye perception. In this paper, we adopt the method in [26] to compute the PC map of the OI. Firstly, the equatorial region in the SSP-image is convolved with a 2-D log-Gabor filter denoted by where θ k = kπ/K , k = {0, 1, . . . , K − 1} is the orientation angle of the filter, K is the number of orientations, σ θ determines the filter's angular bandwidth. ω 0 is the central frequency of the filter, and σ r controls the filter bandwidth. By modulating ω 0 and θ k and convolving G 2D with the equatorial region I E , we get a set of responses of the pixels at the positions of {(i, j)} as [e n,θ k (i, j), o n,θ k (i, j)]. The local amplitude on scale n and orientation θ k is The local energy along orientation θ k is where F θ k (i, j) = n e n,θ k (i, j) and H θ k (i, j) = n o n,θ k (i, j).

The 2-D PC at (i, j) is computed as follows
where ε is a very small normal number, in case the denominator is 0. Multi-scale representation can better reflect the texture structure characteristics of OIs, so the PC map is sampled four times to obtain five scale feature maps. In this section, the information entropy from five scale feature graphs is taken as the structure feature of the equatorial region in SSPimage.
Compared with the traditional edge detection method based on gradient function, PC can detect the edge of the image well and detect features within a wide range. However, it is contrast invariant, but the local contrast of an image affects the image quality perceived by HVS. When viewing the OI through HMD, the user can only keenly observe one part of the OI at a certain moment. Therefore, the heat map is taken as the weight to guide the selection of image patches, and the histogram of the selected image patches is used as contrast feature to compensate phase information, so as to improve the perception consistency between the objective assessment results and human visual system. By using the sliding window, the equatorial region of the OI is divided into several non-overlapping image patches. Let W E denote the 64×64 sliding window. The heat map is also divided into several corresponding image patches. The weight of an image patch is calculated as w(l) = H E (l), where H E represents the heat map of the equatorial region. A group of image patches with greater weight are selected, and the histogram features of which are calculated. Then we construct the following histogram set, in which each column is the histogram of one selected image patch, denoted by where L is the number of image patches. Principal component analysis (PCA) is used to construct the zero-mean matrix, so that the image patch has zero-mean brightness and its contrast feature is enhanced.
where H(l) represents the histogram of the l th image patch, and is also the l th column vector located in H, H is the average brightness matrix of patch, Then, the covariance matrix, Cov, of H is calculated by Singular value decomposition (SVD) [27] is used to calculate the eigenvalues of the covariance matrix. The eigenvalues are arranged in order from large to small The first 20 eigenvalues are selected as the local contrast features of the equatorial region in the SSP-image.
Finally, the perceptive feature vector extracted from the I E has a total of 25 dimensions, denoted as F E = {F Estr , F Econ }, where F Estr is the 5-dimensional structure feature vector extracted from the I E , and F Econ is the extracted 20dimensional local contrast feature vector of I E .

C. QUALITY PREDICTION OF OI
The 53-dimensional feature set F f , F f = {F N , F S , F E }, extracted from the equatorial region and the bipolar regions are used as the input of the random forest to simulate the nonlinear relationship between the feature space and the human opinion score. Specifically, the feature space is mapped to predict the objective quality Q f of OIs by a regression function f m (·), which can be expressed as where f m (·) is achieved by machine learning, and F f is the extracted feature vector. Since random forest has great prediction accuracy and is not easy to appear the over-fitting phenomenon when performing regression and classification tasks [28], the random forest is utilized to learn the mapping function f m (·) in this work.

IV. EXPERIMENTAL RESULTS AND ANALYSIS A. EXPERIMENTAL ENVIRONMENTS
To verify the effectiveness of the proposed method, the method is compared with several classical NR-IQA and FR-IQA methods on the following two OI databases.
(1) CVIQD2018 database [29]: The database contains a total of 544 images, including 16 sets of original scenes and distorted images generated by HEVC, AVC and JPEG compression. The MOS value ranges from 0 to 100, and the higher the score, the higher the image quality. It should be noted that there is no heat map in this database, so Zhang's omnidirectional significance map extraction method [23] is utilized to generate significance map from the reference image of this database instead of the heat map.
(2) MVAQD database [30]: We have also designed a diverse subjective OI database to further verify the effectiveness of the proposed SSP-BOIQA method. It contains VOLUME 8, 2020 15 high-quality uncompressed OIs and their heat maps, all from the 360-degree image database (VAD) provided by Rai et al. [22], as shown in Fig. 9.
For the MVAQD database, each original image is distorted with five distortion types at four distortion levels respectively, as shown in Table 1, and therefore the corresponding 20 distorted images are obtained with respect to each of the original image.
In the subjective experiments, 26 subjects voted on the quality of 315 OIs in MVAQD database. The voting is performed by using five-grade quality scale with the following levels: ''5-Excellent'', ''4-Good'', ''3-Fair'', ''2-Poor'', and ''1-Bad''. After the subjective evaluation process, we have strictly followed the screening criteria described in [31] to implement the subject reliability assessment. Finally, the MOS of the remaining 20 subjects are utilized in the experiments. Fig.10 describes the MOS values of 15 scenes with different degrees of distortion. With the increase of distortion degree, the subjective quality decreases. Fig. 11 further gives the exemplary images with specific distortions from MVAQD database and their subjective scores. It is seen that the better the image quality, the higher the MOS value, which illustrates the validity of subjective ratings of MVAQD database.
Four commonly used performance criteria, suggested by video quality experts group (VQEG), are employed to evaluate and compare the proposed method with existing IQA methods, namely Pearson linear correlation coefficient (PLCC), Spearman rank-order correlation coefficient (SRCC), Kendall's rank correlation coefficient (KRCC) and Root mean squared error (RMSE) [32]. PLCC, SRCC, KRCC and RMSE measure the accuracy, monotonicity, relevance and error in the process of prediction, respectively. The closer the absolute values of PLCC, SRCC and KRCC are to 1, and the closer RMSE is to 0, the better the performance of an objective quality assessment method is.

B. OVERALL PERFORMANCE COMPARISON
To investigate the effectiveness of the proposed SSP-BOIQA method, it is compared with some popular FR-IQA and BIQA methods on the above two omnidirectional databases. The selected FR-IQA methods are PSNR, SSIM [7], VIF [8], CPP-PSNR [11], S-PSNR [9] and WS-SSIM [12], in which the last three are objective evaluation metrics designed for omnidirectional visual content. The BIQA methods selected include BRISQUE [15], OG [16], NIQE [17], dip_IQ [18] and BPRI [19]. Among them, BRISQUE, OG and SSP-BOIQA proposed in this paper are opinion-aware BIQA methods that requires a large number of subjective opinions for training, while the other three objective evaluation metrics are opinion-unaware BIQA methods. For opinion-aware BIQA methods, 80% of the reference images and their distorted images in the omnidirectional data set were randomly selected as the training set, while the remaining 20% were as the test set, and the distorted images corresponding to the same reference image are allocated to the same set to ensure complete independence of the training and test data. To avoid the contingency of the calculated correlation coefficients, the above training-test process was repeated 1000 times, and the median values of PLCC, SRCC, KRCC and RMSE were taken as the final performance indicators. For FR-IQA and opinion-unaware BIQA methods, we report the mean performance of them on the test set of the opinion-aware BIQA methods for a fair comparison with methods that need training. Table 2 gives the experimental results on the CVIQD2018 and MVAQD databases, with the best performance indicators for FR-IQA and BIQA highlighted in bold. From Table 2, the following observations can be obtained.
(1) All performance indexes of CVIQD2018 database are generally better than that of MVAQD database. Compared with CVIQD2018 database, which contains only three encoding compression distortions, MVAQD database additionally includes blur, JP2K and WN. Thus, the MVQAD database has better distortion diversity, which makes the distortion difference of this database larger and more complex, and therefore  makes more challenge for objective quality assessment. Some of these methods have significant performance differences between these two databases, which shows their limitation to different types of distortions. For example, BPRI performed poorly on MVAQD database, but well in CVIQD2018 database.
(2) The performance of FR-IQA methods was generally higher than that of the classical BIQA methods, and opinionaware BIQA methods performed better than opinion-unaware BIQA methods. The FR-IQA methods take both the reference image and the distorted image as the input, which can offset the influence of the geometric distortion of the ERP-image to some extent. Additionally, most opinion-aware BIQA methods that need training are subject to feature regression through random forest, and this has advantages over simple linear combinations because the relationship between features and subjective evaluation is extremely complex.
(3) The performance of FR-OIQA methods is improved slightly compared with the traditional FR-IQA methods. It indicates that using the traditional FR-IQA methods to calculate the quality of an OI in ERP format cannot truly reflect the distortion on the sphere, but calculating the image quality on the plane that closer to the sphere can better eliminate the influence of projection transformation. However, the existing FR-OIQA methods do not take into account user behavior characteristics, which results in an insignificant improvement over the FR-IQA methods. Moreover, SSIM predicts the objective quality score of the distorted image by calculating the structural similarity between the reference image and the distorted image. For CVIQD2018 database which only contains three encoding compression distortions, SSIM is good at detecting the changes of image structure information resulted from the compression, so SSIM performs well on CVIQD2018 database. But for more complex MVAQD database, SSIM is not so outstanding.
By contrast, the proposed SSP-BOIQA method is outstanding for both of the two databases because of its geometric compensation for the ERP-image and the combination of user's behavior characteristics in HMD.
For visualization, we provide the distribution diagrams of the subjective MOS values with respect to objective values on the MVAQD database in Fig. 12, in which the blue ''+'' indicates the distorted images and the red curves are obtained through fitting. It can be observed that the blue ''+'' with respect to SSP-BOIQA method gather much closer to the fitted curve than the competitors, which intuitively shows the scores of SSP-BOIQA method correlate well with subjective MOS values.

C. PERFORMANCE COMPARISON OF DIFFERENT FEATURES
The proposed method comprehensively considers geometric compensation and user perception characteristics, divides the OI with SSP format into the equatorial region and the bipolar regions, and extracts different perception features from these two kinds of regions. To compare the contribution of different features extracted from different regions to the VOLUME 8, 2020 final performance of quality assessment, experiments were conducted on the above two databases for different regional features, as listed in Table 3, where ''NQ'' and ''SQ'' respectively represent the extracted perception features of the north and south polar regions, while ''EQ'' represents the extracted perception features of the equatorial region. For the north and south polar regions where the same perceptual features are extracted, the performance indexes of the south polar region are better than those in the north polar region. This is because the north polar regions of OI in the two databases are mostly sky, indoor ceiling, and so on, which are flatter and simple than the south polar region. This phenomenon is more obvious in CVIQD2018 database, so the performance indexes of SQ in this database are much better than that of NQ. In addition, EQ makes a great contribution to the quality assessment, because equatorial region has rich structure and complex texture, which is the most interesting area for human eyes. Finally, if all the features are combined the performance of quality assessment can be improved significantly, as shown by ''All'' in Table 3.
Considering that the geometric shapes of different regions in SSP-image are different, and human eyes have different perception degrees for distortions in different regions, we extract different perceptual features from different regions in SSP-image. In order to analyze the performance of different types of features in two databases, as well as the complementary effect between some features, the experiments shown in Table 4 are carried out. In the proposed SSP-BOIQA method, {F N l , F Ng1 , F Ng2 , F Sl , F Sg1 , F Sg2 } constitute the feature set of the bipolar regions, where F N l and F Sl are the local detail features, F Ng1 and F Sg1 are the global structure features, whileF Ng2 and F Sg2 are the global CLBP features extracted from the bipolar regions. {F Estr , F Econ } constitute the feature set of the equatorial region, where F Estr is the PC structure feature, and F Econ is the local contrast feature extracted from the equatorial region.
From Table 4, the following observations can be derived. Firstly, among all the features, the global CLBP features {F Ng2 , F Sg2 } produce the best performance on the two OI databases. This is because the scene contents of the bipolar region of OI is relatively simple, which makes the texture information change caused by distortions more obvious. Secondly, the performance of feature set of bipolar regions is improved after feature fusion. In the bipolar regions, local features and global features complement each other, while structural features and texture features complement each other. Thirdly, the feature set of equatorial region performs better after the feature fusion. Phase consistency is contrast  invariant while the image's local contrast does affect the perception, so the fusion of these two types of features can achieve better performance.

D. EFFECT OF HEAT MAP
Users can select the viewport of HMD and focus on the attractive contents of OIs through head movement and eye movement. Therefore, the heat map or significance map is used as the weighting factor in the selection of image patches and the global CLBP feature extraction. Two comparison experiments are implemented, and the results are given in Table 5. In the first group, the directly counted histogram of CLBP is compared with the histogram weighted with heat map or significance map. In the second group, the scheme which randomly selects image patches in the equatorial region is compared with the scheme which selects image patches with heat map or significance map. From Table 5, it is seen that the use of heat map or significance map helps to improve the performance of the proposed SSP-BOIQA method.

E. PARAMETER SENSITIVITY
In the proposed SSP-BOIQA method, the selection of some specific parameters is involved, including the radius R and the sampling points P in the improved CLBP, the number of  image blocks N divided by the sliding fan-shaped window in the bipolar regions, the dimension C of the local contrast feature vector and the scale dimension S of the structure features in the equatorial region. The selection of these parameters will have an impact on the proposed BOIQA method, so we design comparative experiments to select the best parameters.
Firstly, in order to discuss the effects of the radius R and the sampling points P of the improved CLBP for the bipolar regions, we changed the size of R and P for experimental comparison. Three groups parameters are chosen to measure PLCC and SRCC for comparison experiment, respectively: R = 1, P = 8; R = 2, P = 8; R = 2, P = 16, the result is shown in Fig. 13(a). It can be found from the figure that there is no significant difference in performance as R and P changing. When R = 1, P = 8, the overall performance index is slightly higher. After considering the computational complexity, R is set to 1 and P is set to 8 reasonably in the proposed SSP-BOIQA method.
Secondly, the number of image blocks N determines the degree of fineness to extract the local detail perception features in the bipolar regions. If N is too large, the image block will contain too little content to accurately reflect the local details. However, if N is too small, the image block size will be too large to reflect the advantages of the improved fan-shaped window. Fig. 13(b) shows the performance comparison with respect to different N . r and θ in Fig. 5 can be changed to control the size of the fan-shaped window, and divide the bipolar regions into image blocks. The N 1 and N 2 of Fig. 13(b) are the number of rings and the number of image blocks of a ring, respectively, and three groups of parameters are compared in terms of the corresponding PLCC and SRCC of the quality assessment: 8×18, 8×36, and 16×36. As can be seen from Fig. 13(b), changing N has a little effect on overall performance. When N = 16×36, PLCC and SRCC indexes are slightly higher than others. Therefore, N is set to 16×36 in this work.
Finally, in order to discuss the dimension C of the local contrast feature, vector and the scale dimension S of the complementary multi-scale structure features in the equatorial region, the performance of features is measured on the two databases when C = {10, 15, 20, 25} and S = {1, 3, 5}, as shown in Table 6. It is seen that when C = 20 and S = 5, the features of the equatorial region achieve good performance for both databases, and the size of C will not significantly affect the performance of the features.

F. DISCUSSION
In this paper, the SSP-BOIQA method is proposed to compensate the OI geometrically, taking into account the fact that the bipolar regions of the ERP image have obvious stretching distortion and waste a lot of bits for encoding. Moreover, considering the user's behavior characteristics with HMD, the feature extraction is improved and the perception factor is added. Although the proposed method achieves good results on both MVAQD and CVIQD2018 databases, it may not provide satisfactory results in some challenging situations.
In this work, most of the features selected from the equatorial region and the bipolar regions are based on the improvement of natural statistical features. How to dig deeper features related to human visual perception is important for blind quality assessment of omnidirectional image. Omnidirectional video system is a kind of close eye viewing, its visual perception characteristics is more complex and different from the common video viewing mode. Therefore, how to obtain effective user behavior data through subjective experiments and use it for feature extraction and pooling is worth being studied to further improve the performance of quality assessment in the future.
With the rapid development of VR technology, omnidirectional video (OV) has been widely used instead of OI, because it provides users with more immersive experience. Therefore, the objective quality assessment of OV has become a new challenge. Different from OI, OV focuses on the fluency of presentation and playback. Therefore, how to describe the feature of temporal correlation is a new problem.

V. CONCLUSION
In this paper, a segmented spherical projection based blind omnidirectional image quality assessment (SSP-BOIQA) method is proposed for virtual reality systems. Specifically, the omnidirectional image (OI) is first converted to segmented spherical projection (SSP) format to solve the problem of stretching distortion of the bipolar regions in the equirectangular projection (ERP) format. Considering the different geometric shapes of different regions in SSP-image, and the different perception degree of human eyes to distortions in different regions of the SSP-image, different features are extracted from different regions in the SSP-image. For feature extraction of the bipolar regions, the fact that the bipolar regions are circular images are considered. In addition, heat map or significance map is used to improve the performance of the quality assessment. The experimental results on the MVAQD and CVIQD2018 databases show that the proposed SSP-BOIQA method is superior to the existing image quality assessment methods. However, even though the heat map or significance map can simply simulate the user's behavior of viewing OI through HMD, it is not enough for quality assessment of OI. Therefore, there is still great room for the improvement of the performance of IQA method and how to dig deeper features related to human visual perception is the focus of future work.