Unifying Structural and Semantic Similarities for Quality Assessment of DIBR-synthesized Views

Multi-view 3D content is subject to distortions during the process of depth image-based rendering (DIBR). Studies have shown the unreliable performance of the well-established image quality assessment (IQA) models for evaluation of DIBR-synthesized views which surge the need for more effective IQA methods. Existing objective methods generally rely on the pixel-wise correspondences between the reference and distorted images, while view synthesis can introduce pixel shifts. Moreover, local DIBR distortions in disoccluded regions have different visual impacts from those of the conventional distortions, challenging the available IQA models. Here, we developed a Full-Reference (FR) objective IQA metric for synthesized views that perform significantly better than 2D IQA and the state-of-the-art DIBR IQA approaches. While the pixel misalignment between the reference and synthesized views is a big challenge for quality assessment, we deployed a Convolutional Neural Network (CNN) model to acquire a feature representation that inherently offers resilience to the imperceptible pixel shift between the compared images. Therefore, our model does not need accurate shift compensation. We deployed a set of quality-aware CNN features representing high-order statistics, to measure the structural similarity which is combined with a semantic similarity measure for accurate quality assessment. Moreover, prediction accuracy is improved by incorporating a visual saliency model acquired using the activations of the higher CNN layers. Experimental results indicate a significant performance gain (14.6% in terms of Spearman’s Rank-order Correlation) compared to the top existing IQA model. The source code of the proposed IQA metric will be publicly available.


I. INTRODUCTION
W ITH the rapid advances in virtual reality and 3D applications, new multimedia modalities such as freeviewpoint video (FVV) [1], light fields, point clouds, and holography [2] [3] have attracted significant attention in recent years. These emerging multimedia formats promise to enable immersive experience to end-users by delivering a richer visualization with full-parallax properties. Multiview representation requires handling a tremendous volume of information captured from different viewpoints, therefore, effective data representation, storage, and transmission methods are key factors to promote the application of immersive multimedia. Multi-View texture plus Depth (MVD) [4] [5] is a widespread immersive media format that aims to represent a full 3D scene using a subset of texture views accompanied with the corresponding disparity information. A technique called Depth Image-Based Rendering (DIBR) [6] can be deployed to synthesize the missing virtual views using the texture and depth information captured from the adjacent camera locations.
Despite the remarkable advantages of the DIBR-based approaches, synthesized views often suffer from multiple distortion types caused by imperfect synthesis, occlusion, and depth data errors. Therefore, reliable Image Quality Assessment (IQA) methods are essential to evaluate and monitor the quality of the reconstructed views. DIBR distortions that oc- cur in disoccluded regions have different visual effects from the conventional blur, noise, and blocking artifacts. Moreover, view synthesis errors can introduce geometric distortion and stretching effects which lead to misalignment between the reference and the synthesized view. These distinct characteristics of DIBR distortions challenge the well-established objective quality assessment algorithms (such as Structural SIMilarity (SSIM) [7] and Visual Information Fidelity (VIF) [8] indices). The 2D IQA models highly rely on pixel-wise alignment of the comparing images and they are principally designed to evaluate the visual impacts of the conventional distortions on 2D images. Hence, they fail on assessing the quality of DIBR-synthesized views, and specific IQA models are required for MVD content evaluation.
The existing objective quality assessment methods for synthesized views can be classified into two main categories of Full-Reference (FR) or No-Reference (NR), based on the availability of the reference image. NR methods aim to assess the quality of synthesized views independent of a reference image. Tian et al. [9] proposed an NR algorithm that deploys morphological operations to compute quality scores in luminance and chrominance components. Then, the individual quality scores are combined and a generated edge image is used to weight the pixel-wise quality values. The authors further expanded their work in [10] by adding black hole and stretching detection strategies to improve the quality prediction task. Gu et al. [11] deployed a multi-scale scheme to design two DIBR-specific Natural Scene Statistics (NSS) models based on self-similarity and structural consistency characteristics. Thereafter, the quality scores obtained from the two NSS models were combined to pool the final score. Jakhetiya et al. [12] developed a computationally efficient NR method that uses simple median filtering to detect geometric and structural distortions for quality assessment.
Whereas an accurate NR quality assessment of DIBRsynthesized views is very challenging due to the lack of a reference view, several FR methods have been developed for providing more reliable quality predictions. Sandic-Stankovic et al. [13] computed Peak Signal-to-Noise Ratio (PSNR) on multi-scale images obtained from Morphological Wavelet (MW) decomposition. The authors further improved the IQA accuracy in [14] by using Morphological Pyramids (MP) filters for image decomposition. In [15], reference and synthesized views were used to detect error-prone disoccluded regions. The size and strength of these regions were then considered to compute a quality score which is later combined with a global sharpness assessment score. More recently, some works showed the effectiveness of shift compensation to align reference and synthesized views for FR quality assessment. In [16], a two-step shift compensation approach was proposed based on the SURF and a multi-scale block matching, and finally, a quality score is obtained by computing pixel-wise mean squared error between the reference and shift-compensated distorted views. In our recent work [17], we proposed a quality index called SSPD that uses feature matching and superpixel difference to pool a quality score. The corresponding interest point features are compared in reference and synthesized views for local quality assessment. In addition, a global quality score is pooled by computing the gradient magnitude difference of superpixels in reference and shift-compensated synthesized images. The SSPD could outperform the competing approaches on both conventional and new DIBR data sets.
Current DIBR IQA models are generally designed to perform well on the conventional DIBR distortions. Data sets such as IRCCyN/IVC [18] include DIBR algorithms that only resemble old-fashioned DIBR distortions including blurring, black holes, and geometric distortions. However, recent DIBR techniques have been improved significantly and can better address the DIBR errors. The black hole errors are almost resolved, geometric distortions are better handled, and sophisticated inpainting methods are proposed to better compensate for the errors in the disoccluded regions. In 2019, a new public DIBR data set namely IETR [19] was released that covers the new DIBR techniques. Fig. 1 visualizes an original image and the synthesized versions obtained using different DIBR algorithms from the IETR data set. As the figure illustrates, local distortions on object boundaries, induced by different DIBR algorithms, present diverse visual impacts on image structures. Moreover, techniques such as LDI [20] can deform important visual cues such as faces which affect image semantics and visual aesthetic attributes. Performance evaluation on the IETR data set has revealed that DIBR IQA models fail to deliver high accuracy for new DIBR algorithms [19]. The highest performance is achieved using our SSPD model on the IETR data set with Spearman's Rank-order Correlation Coefficient (SRCC) of 0.685 which suggests a noticeable room for improvement [17]. In this paper, we proposed a new FR objective quality assessment metric based on SEmantic-and QUality-aware feature Similarity measures plus a Salient-region detection (SEQUSS). The proposed metric achieves a large performance gain over the existing IQA approaches.
The state-of-the-art DIBR IQA methods often follow a shift-compensation strategy as a pre-processing step to compensate for the misalignment between the reference and synthesized views; however, the proposed algorithm circumvents the need for shift compensation using deep features of a Convolutional Neural Network (CNN) model. Although the alignments of the images being compared can lead to more accurate quality assessment, the shift compensation process is not always flawless and often comes with warping errors that can influence the quality evaluation task. Using the pretrained ResNet50 CNN [21], we transform images to a multiscale representation with perceptual features that are more tolerable to shifts. Though a pair of reference and test images that are different in terms of their precise pixel locations might look rather similar for the Human Visual System (HVS) [22], they are often judged to be different when using FR objective quality assessment methods. The deep perceptual features obtained from the CNN model are better aligned with the perceptual preferences and show good tolerance to imperceptible pixel shifts. Here we developed two measures by extracting both quality-and semantic-aware features from the ResNet50 layers. Moreover, the feature activation maps from the last CNN block were used to highlight the visuallysalient regions for more effective perceptual evaluation. In particular, the main contributions of the proposed method are summarized as follows: • The proposed method utilizes perceptual features of deep CNN layers for quality assessment. These features better adhere to the HVS behavior and are less sensitive to small pixel-wise shifts between the comparing images. This allows more reliable quality assessment free of the error-prone shift-compensation methods. The proposed metric achieves a substantial accuracy gain over the state-of-the-art approaches. • We proposed to incorporate the semantic content features in quality assessment since DIBR local distortions and stretching artifacts can influence image semantics and aesthetic properties. • The visual attention processing behavior of HVS is considered in the design of our IQA model by generating visual saliency masks using the last CNN block. The saliency map can suitably highlight the regions of interest that are visually important for quality assessment. The rest of the paper is organized as follows: Section II elaborates the proposed quality assessment algorithm. The experimental results are summarized and discussed in Section III. Finally, section IV concludes the paper.

II. PROPOSED OBJECTIVE QUALITY ASSESSMENT METHOD
Unlike traditional distortion types that rather uniformly affect the entire image, DIBR distortions include several local and global artifacts that alter the structural and semantic characteristics of the scene and degrades the overall visual Quality of Experience (QoE). This urges sophisticated models that can better comply with the complex properties of HVS. Data processing in the HVS follows a hierarchical mechanism in VOLUME 4, 2016 which the sensitivity to complex stimulus characteristics is increasing along the ventral visual pathway. Early visual processing areas encode low-level frequency components of the scene while higher visual areas are more sensitive to complex textures and semantic shapes [23] [24] [25]. CNN architectures -initially designed for computer vision tasks -also follow a hierarchical multi-scale data processing mechanism and they can roughly approximate the data processing of the HVS [26] [27]. Recent studies reveal the effectiveness of the pre-trained deep CNNs for the task of quality assessment [28].
Here we developed a new DIBR IQA metric using the features extracted from different layers of the ResNet-50 CNN. This CNN model is trained on more than a million images of the ImageNet data set [29] and consists of five main convolutional stages (L 1 to L 5 ) followed by a Fully Connected (FC) layer at the end of the network. The number of filters for the five CNN layers is L 1 : 64, L 2 : 256, L 3 : 512, L 4 : 1024, and L 5 : 2048. Moving toward deeper convolutional layers, the size of CNN filters (feature maps) shrinks while the number of filters is increased. Fig. 2 presents the framework of the proposed quality assessment method. Deep features of the ResNet-50 are deployed to effectively quantify both the structural and semantic degradations in quality assessment. In addition, we incorporated saliency maps -obtained using the features of the fifth CNN block -in quality prediction to account for the visual attention processing of HVS. The proposed method consists of two computational streams for quality pooling. The first stream uses intact reference and synthesized views as inputs to the network while the second stream applies saliency masks on input images as a pre-processing step to extract features only from perceptually-salient regions. For each computational stream, a perceptual and a semantic score are acquired using deep quality-aware and sematic-aware feature similarity measures, respectively. Finally, the scores from two channels are combined to obtain the final DIBR quality score.

A. QUALITY-AWARE FEATURE COMPARISON
The proposed quality assessment model is based on a nonlinear transformation of the reference and synthesized images to a new over-complete feature space representation. We used the features of the ReLU layers available at the end of each of the five CNN blocks. Similar to data processing in the visual cortex, early CNN layers include smaller receptive fields and capture low-level features using a larger number of neurons in each feature map while higher CNN layers are more sensitive to high-order statistics and complex edge features. Fig. 3 presents some feature maps selected from the five CNN layers of the ResNet50 model. As shown in the figure, structural information has been encoded in five layers of the CNN model at different levels of frequency details.
Due to the misalignment between reference and synthesized views, it can be expected that early CNN layers deliver lower IQA performance when precise frequency components are compared; however, deeper levels might perform better since the comparison is performed in a higher level of visual appearance and the shift is better tolerated. This assumption was confirmed by examining all five CNN blocks for quality assessment in which the best performance was achieved using the fourth layer, thus, we used this layer for quality-aware feature extraction. Using the features in the fourth layer, we ensure high sensitivity to the structural degradation while preserving a good tolerance against spatial imperceptible misalignments. Please note that although quality assessment in higher CNN layers could suitably mitigate the impact of misalignment without the need for shift compensation, severe geometrical distortions can still affect the algorithm accuracy. However, such large displacements does not appear in the reconstruction of the new view synthesis algorithms. Let ψ r and ψ d are the resized N × 1 feature vectors extracted from the feature maps of the reference and distorted views respectively, the structural similarity of the features in the lth layer of the ResNet50 is computed as: where σ ψr and σ ψ d are the global variance of the features in reference and distorted views and σ ψr,ψ d denotes the global covariance of the features. The parameter c 1 is a small positive value to ensure numerical stability of the measurement.

B. SEMANTIC-AWARE FEATURE COMPARISON
Image semantic information describes the content appearance in the image and distortions can deviate the semantic characteristics. Researchers have shown that semantic image category has a noticeable impact on subjective quality ratings [30] and modifications in image semantics can impact the impression of the overall perceptual quality [31] [32].
Since DIBR impairments -induced by stretching and faulty reconstructions on shape borders -can degrade the semantic understanding and aesthetic quality of images, integrating quality-and semantic-aware features can help to quantify the impact of visual artifacts on content recognition and the final QoE. Here, we propose to use semantic features to further improve the accuracy of DIBR quality assessment. We used the features of the FC layer for semantic comparison. The FC layer comes at the end of the network after the CNN blocks and before the softmax layer used for the classification task, thus, the FC layer is supposed to contain coarse features that represent scene semantics. Using the 1000 features of the FC layer in the ResNet-50, a semantic measure is acquired by computing the degree of consistency between two feature vectors. We computed the Spearman Rank-order Correlation Coefficient (SRCC) between the features of the FC layer in reference and synthesized views ψ F C r and ψ F C d : where δ i is the difference between the ranks of pairs of FC features in reference and synthesized views.
Using the quality and semantic measures, Q p and Q s , the overall quality score of the first computational stream is computed as: The two measures in (3) are combined with equal weights. According to our experiments, a small increase in the weight of the quality measure Q p can yield a minor gain, while larger weights (i.e., a significant decrease in the influence of the semantic measure) has a negative impact on performance. Thus, the equally weighted measures can provide high performance while avoiding extra parameter tuning.

C. SALIENT-REGION QUALITY MEASURE
The HVS is more attracted to visually-salient regions in the scene and thus the quality degradation in such regions of interest (RoI) is more critical. Various visual saliency models have been developed in the literature that are inspired by the human visual attention processing behavior and the generated saliency maps have been incorporated in the design of objective quality assessment algorithms to better replicate the HVS characteristics in quality assessment. In the FR scenario, saliency values are often used to weight the pixelwise quality difference between the comparing reference and distorted images. Zhang et al. [33] studied the added value of 20 different saliency models for the task of 2D IQA through a comprehensive statistical analysis. The outcomes revealed a statistically significant gain in the performance of objective IQA models when incorporating the saliency models. Here, we benefit from saliency detection to improve the quality assessment of the 3D synthesized views.
Saliency models aim to capture biologically-inspired features by considering image intensity, color, edge, and texture. As mentioned earlier, higher visual areas (such as V3, and V4) can be characterized by sensitivity to natural textures and the neurons in these areas are more selective for complex textures and shapes of the stimuli [34] [35]. Assuming that the higher layers of CNN multi-scale architectures can roughly replicate the complex responses of the higher visual areas of the human visual cortex, we utilized the activation maps of deep CNN layers to generate visual saliency maps for quality assessment. Instead of using an off-the-shelf saliency model, VOLUME 4, 2016 we take advantage of the feature maps of the last CNN layer (L 5 ) of ResNet50 to highlight the RoI.
We constructed feature maps of the reference image from the 2048 feature activation maps in the fifth CNN layer. Thus, a set of n 2D feature maps in the lth layer is defined as: The set of feature maps is then upsampled ( F l ) to the input image size (224×224) using the bicubic interpolation. Finally, the pixel values of the 2048 upsampled maps are aggregated to obtain the visual importance probability map (Sal): Due to the misalignment between the reference and synthesized views, it is not straightforward to directly use saliency maps for pixel-wise weighting. Instead, saliencymasked images are generated to compare RoI features in reference and test images. Using the obtained saliency maps, new masked inputs are constructed by discarding the nonsalient regions as follows: where µ is the mean pixel value of the saliency map Sal, and I m is the saliency-masked input image. Fig. 4 depicts the procedure of the saliency map generation and masking. The saliency maps obtained from different reference images of the IETR data set are presented in Fig. 5 which implies that the deployed method can effectively highlight the visually important objects and regions of the scene. The saliency-masked reference and synthesized views are feed forwarded to the network to obtain the features of the salient regions. Finally, quality-aware and semantic-aware similarity measures are computed as in (1) and (2) and the overall quality score for the saliency channel Q 2 is obtained by averaging two scores as described in (3). To compute the quality-aware measure of the salient region (Q Sal p ), the third CNN layer was used for feature extraction as it showed the highest performance among the five convolutional layers.

D. FINAL SEQUSS SCORE
The objective quality scores from the two computational streams are combined to obtain the final quality score. Although the saliency-based IQA is important, it does not account for severe distortions that might appear in non-salient regions. Therefore, we integrated the saliency-based score Q 2 with the global score from the first computational stream Q 1 by computing the weighted sum of the two scores as follows.
where β is a non-zero constant parameter set to 0.6 to slightly increase the weight of the global score.

III. EXPERIMENTAL RESULTS
This section describes the performance evaluation results of the proposed SEQUSS metric. The IETR [19] is a new data set of 140 DIBR synthesized views that is used to benchmark our quality model and it is the only subjectivelyannotated public data set that include both the conventional and new DIBR approaches. Conventional DIBR distortions such as severe black holes and large geometric shifts are not considered in this data set since such artifacts are not the main visual impairments in the new DIBR algorithms. Subjective scores are gathered from 42 naive participants and a Differential Mean Opinion Scores (DMOS) is acquired for each test stimuli. Besides geometric and stretching distortions, the performance of the inpainting approach used in the DIBR plays an important role in the quality of the synthesized views. Therefore, recent efforts are mostly focused on proposing more accurate inpainting approaches to improve the holefilling in the disoccluded regions. The IETR encompasses 10 MVD sequences processed by 7 DIBR algorithms including: Criminisi [41], View Synthesis Reference Software (VSRS) from MPEG 3D video group [42], Layered Depth Image (LDI) DIBR approach [20], Hierarchical Hole-Filling (HHF) method [43], Ahn's [44], Luo's [45], and Zhu's [46] holefilling methods. The VSRS algorithm is deployed in two scenarios for single-view (VSRS1) and multi-view (VSRS2) synthesis applications. The performance of the proposed methods is compared against 15 objective IQA methods including five FR methods (PSNR, SSIM [7], VIF [8], GMSD [36], and FSIM [37]), five NR DIBR methods (NIQSV [9], NIQSV+ [10], MNSS [11], NR-MWT [38], and Jakhetia's [12]), and six FR DIBR models (MW-PSNR [13], MP-PSNR [14], LOGS [15], SC-IQA [16], Peng et al. [39], Sui et al. [40], and SSPD [17]).
Spearman's Rank-order Correlation Coefficient (SRCC), Pearson Linear Correlation Coefficient (PLCC), Kendall's Rank Correlation Coefficient (KRCC), and Root Mean Square Error (RMSE) are four evaluation criteria that were deployed to compare the performance of IQA models against the subjective scores. A higher value of SRCC, PLCC, and KRCC indicates a higher consistency of the objective scores with human opinions and better performance. We applied the following nonlinear fitting function to the objective scores x prior to the computation of the evaluation indices: where λ 1 to λ 5 denote the fitting parameters. Table 1 compares the efficacy of the proposed method with the competing approaches. Our IQA model achieves a substantial improvement of the prediction accuracy in terms of all four evaluation indices. Compared to the SSPD which  [18], the outcomes reveal the failure of the existing DIBR IQA models on the new IETR data set. The proposed SEQUSS model consists of several computational units that deliver the quality-aware and semanticaware measures using the full synthesized view as well as the salient regions. Table 2 presents the performance of each individual quality measure as well as the overall performance on the IETR data set. The quality-aware measures (Q p , Q Sal p ) have a higher correlation with DMOS when compared to the semantic measures (Q s , Q Sal s ) while the integration of these two measures could further improve the performance of the quality assessment. The second computational stream measures the quality by focusing on the visual salient regions and the integration of the quality scores of this channel (Q 2 ) with the global score of the first stream (Q 1 ) lead to better overall estimation accuracy.

A. PERFORMANCE EVALUATION ON THE IETR DATA SET
As mentioned in section II, we used the fourth CNN Layer for feature extraction in the first computational stream (global), and the second stream (salient region) deploys the features of the third CNN layer. In Table 3, we reported the performance of all five CNN layers. As shown in the table, Layer 3 and 4 always have the highest performance compared to other layers. Comparing the SRCC in two streams, the accuracy is slightly shifted toward lower layers when deploying salient regions. Lower layers allow a more detailed comparison of the frequency components, although the tolerance to geometric distortions and misalignment issues are diminished when moving toward the lower layers. Fig. 6 illustrates the scatter plots of the objective scores VOLUME 4, 2016 versus the DMOS on the IETR data set. The plot presents a better convergence of the data points when using the SE-QUSS which implies higher agreement of the proposed metric with the subjective opinions compared to other competing approaches. Fig. 7 visualizes an example of the SEQUSS scores assigned to the synthesized views of a reference image ('Shark') in the IETR data set. The synthesized views (bh) are arranged from highest to lowest DMOS values. As the figure shows, the SEQUSS is performing quite well and the objective scores (higher is better) are highly consistent with the human subjective scores (lower is better), however, the other competing method (MP-PSNR) indicates some disagreements with the subjective DMOS.

B. STATISTICAL SIGNIFICANCE TEST
We performed a statistical significance test according to the ITU-T Recommendation P.1401 [47]. We applied a twosample t-test with 95% confidence level on the SRCC values of all the metric pairs under the null hypothesis that there is no significant difference between the two metrics. The null hypothesis is rejected at 5% significance level. The outcomes of the significance test are presented in Table 4 in which the symbols '1' implies that an IQA model in the row axis is superior to a metric in the column axis, and '-1' indicates the inferior performance of the row metric compared to the metrics in the column axis. The symbol '0' denotes that the difference is not significant. The table confirms that the proposed method performs significantly better than all other approaches including SSPD as the second-best method.

C. PERFORMANCE EVALUATION ON THE IRCCYN/IVC DATA SET
Performance evaluation experiments confirm the superiority of the proposed methods compared to other DIBR methods that are mainly devised to quantify conventional DIBR distortions. The existing state-of-the-art DIBR techniques often suffer from moderate geometric distortions and a flawed inpainting process to fill the holes in the disoccluded regions. However, most available objective metrics tried to address conventional artifacts by deploying shift compensation strategies or methods to quantify the visual impact of the black holes. Although the proposed method is devised to address the new DIBR quality assessment requirements, in this section we reported the performance of the proposed SEQUSS on the traditional IRCCyN/IVC DIBR data set [18]. This data set contains 12 reference and 84 test views synthesized using 7 traditional DIBR algorithms developed between the years 2003 to 2010. We excluded 12 test images synthesized by Fehn's algorithm due to the severe shifts applied to the synthesized views. The quality comparison was performed on the remaining 72 synthesized views.

D. SENSITIVITY TO THE WEIGHTING PARAMETER
The proposed model has only one parameter to adjust to obtain optimal performance. The weight parameter β in (7) specifies the weight of the quality scores from two computational streams (i.e. an overall score computed using the entire image Q 1 versus an overall score of the salient region Q 2 ). Although the quality assessment of salient region is important, it does not consider the quality loss on other non-salient areas that might attract attention especially when severe distortion is occurring. Therefore, the global quality score and the saliency score are combined using a weighting function. Fig. 8 shows how the SRCC varies by selecting different values of β. The highest performance is achieved around β = 0.6 whereas there is no abrupt change around the selected value which shows the performance stability of the metric. The SEQUSS is more computationally efficient compared to SSPD. We performed a test on an image of size 1024 × 768 on a Windows laptop with 16 GB RAM and a Core i7-2.7 GHz CPU in which an average processing time of 2.11 seconds over 10 repetitions is achieved using the SEQUSS which is more than 10x faster than SSPD with the execution time of 21.67 seconds.

IV. CONCLUSION
In this paper, we proposed SEQUSS, a full-reference IQA metric to predict the perceptual quality of DIBR-synthesized views. The method takes advantage of ResNet50, a pretrained deep CNN model, to transform reference and synthesized to a perceptual representation that better complies with the HVS characteristics. The features of the CNN model were VOLUME 4, 2016  used to compute the structural and semantic similarities between the reference and synthesized images. The two similarity measures were then unified to compute an overall quality score. The selected feature space can effectively quantify the visual impact of challenging distortion types of DIBR images while it is also robust to the geometrical shifts. To incorporate the human visual attention properties in quality assessment, we produced saliency maps using the feature activations of the CNN layers. Quality assessment on the selected regions of interest could further improve the performance accuracy of the proposed model. While none of the competing IQA models could perform well on the new IETR data set, our SEQUSS model improved the IQA of DIBR images by a large margin.