Convex Optimization Method for Quantifying Image Quality Induced Saliency Variation

Visual saliency plays a significant role in image quality assessment. Image distortions cause shift of saliency from its original places. Being able to measure such distortion-included saliency variation (DSV) contributes towards the optimal use of saliency in automated image quality assessment. In our previous study a benchmark for the measurement of DSV through subjective testing was built. However, exiting saliency similarity measures are unhelpful for the quantification of DSV due to the fact that DSV highly depends on the dispersion degree of a saliency map. In this paper, we propose a novel similarity metric for the measurement of DSV, namely MDSV, based on convex optimization method. The proposed MDSV metric integrates the local saliency similarity measure and the global saliency similarity measure using the function of saliency dispersion as a modulator. We detail the parameter selection of the proposed metric and the interactions of sub-models for the convex optimization strategy. Statistical analyses show that our proposed MDSV outperforms the existing metrics in quantifying the image quality induced saliency variation.


I. INTRODUCTION
S ALIENCY -the scene-driven, bottom-up selective vi- sual attention mechanism of the human visual systemhas been widely studied in the area of image quality assessment [1]- [6].The application of saliency in image quality metrics potentially improves their reliability in predicting image quality as perceived by humans [7]- [14].However, the optimal use of saliency requires a deeper understanding of how saliency plays a role in image quality assessment and how perception relating to saliency can be effectively quantified.
Eye-tracking studies [15], [16] have been undertaken to better understand the interaction between saliency and image distortions.The SIQ288 database [16] reveals the correspondences between the changes in image quality and the changes in saliency, as an example illustrated in Figure 1.Note all saliency maps contained in the SIQ288 database were rendered from ground truth eye-tracking data.It can be seen from Figure 1 that the difference between the saliency of the "high quality" image and that of the "original image" is marginal; the difference in saliency is modest for the "medium quality" image; whereas the saliency of the "low quality" image significantly differs from that of the "original image".To measure such so-called distortion-induced saliency variation (DSV) -the similarity between the original saliency (e.g., (e) in Figure 1) and the deviated saliency (e.g., (f), (g) or (h) in Figure 1) -a benchmark was established in [17].In this benchmark, the difference mean saliency variation score (DMSS) per saliency map in the SIQ288 database is yielded via a subjective study.As shown in Figure 1, the higher the DMSS, the less similar the deviated saliency is from the original saliency.The DMSS scores are reliable, however, they are expensive and impractical in many circumstances.A more realistic way to measure saliency variation (i.e., DSV) is to use computational algorithms.
In the literature, many algorithms have been established and used for similarity measures in various applications.In particular, these similarity metrics are used in the area of saliency modelling to evaluate a computational saliency model's ability to predict human fixations.The plausibility of using these metrics for quantifying DSV, however, is not verified.In this paper, we first analyse the performance of eight state-of-the-art similarity metrics in measuring DSV; and then we propose a new metric for assessing the distortioninduced saliency variation (DSV).The proposed metric is inspired by the characteristics of the human visual system (HVS), using a saliency dispersion measurement as a key component in a convex optimization method to integrate the local saliency similarity and the global saliency similarity.
The rest of this paper is organized as follows.Section II provides an analysis of the state-of-the-art similarity metrics, and the description of the DSV benchmark.Section III details the proposed algorithm.The overall performance of the proposed algorithm is presented in Section IV.Section V concludes the paper.

II. ANALYSIS OF THE STATE-OF-THE-ART SIMILARITY METRICS A. THE DSV BENCHMARK
The benchmark of the distortion-induced saliency variation (DSV) [17] is based on the SIQ288 database [16], which contains images of varying quality (i.e., 18 pristine images and each distorted by 5 distortion types and 3 distortion levels) and their corresponding saliency maps rendered from ground truth human fixations.The DSV benchmark contains 18 reference saliency maps and 270 deviated saliency maps.Sixteen experts in computer vision assessed the deviations of saliency maps of distorted images.The resulting difference mean saliency variation score (DMSS) represents the degree of similarity between each deviated saliency map and its original saliency.The DMSS ranges from 0 to 100, with 0 representing the smallest degree of difference and 100 representing the largest degree of difference.Table 1 lists the details of the distortion-induced saliency variation benchmark.Figure 2 shows the histogram of DMSS scores contained in the benchmark.

B. PERFORMANCE OF THE STATE-OF-THE-ART SIMILARITY METRICS
There are many mathematical algorithms that can be used to measure the similarity between two images.In terms of evaluating saliency maps, research has focused on measuring the difference between the predicted saliency (generated by machines) and the ground truth saliency (obtained by eyetracking) [18].The popular metrics are: the area under the receiver operating characteristic curve (AUC) [19], [20], Normalized Scanpath Saliency (NSS) [21], Information Gain (IG) [22], SIMilarity (SIM) [23], Pearson's Correlation Coefficient (CC) [24], Kullback-Leibler Divergence (KL) [18], and Earth Mover's Distance (EMD) [25].It should be noted that each metric has its own advantages and disadvantages depending on the specific applications, for example, some metrics (i.e., AUC, NSS, IG) emphasise on local saliency similarity measurement and some metrics (i.e., SIM, CC, KL, EMD) focus on the global similarity of saliency.However, it is unknow yet whether these metrics are helpful in measuring the distortion-induced saliency variation (DSV).Now, we analyse the performance of these similarity metrics in the DSV context.We calculated the metrics for all test/deviated saliency maps contained in the DSV benchmark.The performance of these metrics in measuring the saliency variation is quantified by comparing the ground truth DMSS scores and the metrics' outputs.Figure 3 shows the scatter plot of the DMSS and each of the similarity metrics.correlation coefficient (KROCC), and root mean square error (RMSE) that are formulated as follows: ) where L is the number of test saliency maps, m i and n i are the metric score and subjective DMSS of the i-th saliency map respectively, m and n are the mean of metric scores and subjective DMSSs respectively, R(•) denotes the rank, is the mean of the ranks, and N (•) represents the amount of corresponding variables.
In general, some metrics (i.e., AUC-Borji, IG, KL) exhibit a poor correlation with the DMSS, meaning they cannot accurately measure the distortion-induced saliency variation.AUC-Judd, NSS, SIM, CC, and EMD metrics show better performance, but there is still room for improvement.Some interesting observations are revealed in our study.For different reference saliency maps from different source images, the responses of the similarity metrics differ.We give two examples here using the correlation matrix visualization.Figure 4 illustrates the correlation matrix of eight similarity metrics and DMSS of all deviated saliency maps from the reference, i.e., "Paint-house".Figure 5 illustrates the correlation matrix for a different reference, e.g., "Man-fishing".The reference saliency of Figure 4 represents a dispersed map, and the reference saliency of Figure 5 represents a concentrated map.The similarity metrics show consistent performance for the case of Figure 4, meaning they may capture similar properties of DSV.Whereas, the similarity metrics show inconsistent performance for the case of  metric for assessing the distortion-induced saliency variation (DSV) based on saliency dispersion, which is described in detail below.

III. THE PROPOSED ALGORITHM A. BASIC ALGORITHM FRAMEWORK
First, We propose a new algorithm for quantifying distortioninduced saliency variation (DSV), combining the local and global saliency similarity measures.The principle of the algorithm is to integrate local measure and global measure as demonstrated in the section above; and a modulator based on saliency dispersion is used to control the adaptive measure of saliency similarity.The basic idea of the algorithm is described below.
Modulator: Saliency dispersion measure [10] gives a quantitative gauge of the spatial distribution of saliency, i.e., the extent to which a saliency pattern is spread throughout the spatial domain.Based on our observations of the DSV benchmark (as the examples illustrated above), the distortioninduced saliency variation tends to depend on the degree of saliency dispersion: when saliency is concentrated in fewer areas in the spatial domain, the global (i.e., structural characteristics) saliency variation significantly contributes the DSV assessment; when saliency is dispersed throughout the spatial domain, the local (i.e., positional characteristics) saliency variation makes a predominant contribution to the DSV assessment.Hence, the saliency dispersion measure developed in [10] is used here.The method is based on Shannon entropy (note H represents the entropy of a saliency map), using a multilevel approach.More specifically, the saliency map (i.e., S) is partitioned at level P into P ×P nonoverlapping blocks of equal size.The multilevel entropy of the saliency maps is defined as: where P max is the finest level of division and N = P 2 max , and B runs over each block, as the procedure illustrated in Figure 6.The lower the multilevel entropy, the more compact the saliency map is; otherwise, the higher the multilevel entropy, the more spread-out the saliency map is as illustrated in Figure 6.
Algorithm formulation: We now consider how to use the above components for the quantification of the DSV.For the input reference saliency map and the deviated saliency map, local and global saliency similarity metrics are calculated, and the degree of dispersion is calculated for the reference saliency map.The key idea is to reinforce local similarity measure for the spread-out saliency map and global similarity measure for the compact saliency map.We formulate the metric for DSV (MDSV) as a convex optimization problem, and define the metric as: Performance of state-of-the-art similarity metrics for the measurement of distortion-induced saliency variation (DSV) for "low dispersion" and "high dispersion" saliency maps of the DSV benchmark [17].
Criterion AUC-Judd AUC- where T is the selected threshold and τ controls the steepness of the function.In this paper, T and τ were determined from the saliency maps of the DSV benchmark, using statistical methods and learning-based parameter calibration, which are detailed below.

B. STATISTICAL PROPERTIES OF SALIENCY SIMILARITY METRICS BASED ON SALIENCY DISPERSION
In order to find suitable components for our proposed convex optimization based DSV metric, we analyse the statistical properties of the state-of-the-art similarity metrics based on saliency dispersion using the DSV benchmark database.We computed the multilevel entropy of all reference saliency maps contained in the DSV benchmark database [17] using the formula (5).Our previous work showed that the multilevel entropy has the ability to distinguish the dispersion degrees of saliency maps.Figure 6 illustrates the multilevel entropy of two saliency maps of different dispersion degrees: one represents the dispersed saliency and one represents the concentrated saliency.In order to reveal how different similarity metrics respond to different degrees of saliency dispersion, we divided the reference saliency maps in the DSV benchmark into two sets: one set contains saliency maps of high dispersion degrees (i.e., large multilevel entropy); one set contains saliency maps of low dispersion degrees (i.e., small multilevel entropy).The division is made using a threshold (i.e., 7.0126 in our experiment) of the calculated multilevel entropy values for all reference saliency maps.Note rigid thresholding is impossible as dispersion is a relative term, but we have considered the subjective content classification in [26] to assure the division reflects two distinctive degrees of saliency dispersion.This leads to a split of the entire DSV benchmark into "low dispersion" and "high dispersion" saliency maps.To analyse the impact of saliency dispersion on existing similarity metrics, we calculate the performance of these metrics (i.e., based on the PLCC, SROCC, and KROCC between a metric and ground truth DSV) once on the subset of "low dispersion" saliency maps and once on the subset of "high dispersion" saliency maps.Table 3 lists the performance of similarity metrics for "low dispersion" and "high dispersion" saliency maps of the DSV benchmark.As can be seen from Table 3, the metrics' performance differs for different degrees of saliency dispersion.More specifically, for the "low dispersion" saliency maps, globe metrics, i.e., SIM and CC give the best performance amongst all metrics; whereas for the "high dispersion" saliency maps, local metrics, i.e., AUC-Judd and NSS produce the best performance amongst all metrics.This tends to suggest that the way local and global metrics contribute to DSV measurement depends on the degree of saliency dispersion, and therefore their usage should be explicitly considered.Based on above analysis, saliency dispersion measured by multilevel entropy could be used as the modulator to combine the local similarity metric and global similarity metric for a more sophisticated measurement of the distortion-induced saliency variation.

C. PLAUSIBILITY OF USING IMAGE FIDELITY METRICS FOR DSV MEASUREMENT
Since saliency maps are essentially images, we naturally wonder whether the well-established image fidelity metric -Structural SIMilarity (SSIM) index [27] and its variants [28]- [30] could be used to evaluate the distortion-induced saliency variation.In Table 4, we list the performance of the Structural Similarity (SSIM) index, Multiscale Structural Similarity (MS-SSIM) index, Complex Wavelet Structural Similarity (CW-SSIM) index, and Information Content Weighted Structural Similarity (IW-SSIM) index for the measurement of DSV on the DSV benchmark.As can be seen from the table that SSIM and its variants produce good performance, compared to the overall performance of the state-of-the-art similarity saliency metrics as listed in Table 2.The PLCC values of SSIM and its variants are comparable to the bestperforming SIM and CC metrics in Table 2. Since SSIMbased methods measure the structural information of images, they seem to capture the saliency patterns.In addition, the MS-SSIM extract the hierarchical information via a multiscale approach and the IW-SSIM index obtains sophisticated features based on information theory, therefore, both MS-SSIM and IW-SSIM produce slightly higher performance than SSIM.CW-SSIM index focuses on the linear perturbation around edges and textures, which are less important for saliency patterns.This might cause a relatively lower performance for CW-SSIM.In summary, these SSIM-based metrics have demonstrated the potential for the DSV measurement, and therefore, can be further investigated for use in our proposed algorithm.

D. SELECTION OF SUB-MODELS FOR THE PROPOSED DSV METRIC
Our proposed convex optimization framework contains two essential components, i.e., local and global similarity measures.Based on above analysis of existing similarity metrics (including the image fidelity metrics), we now focus on determining which metrics could be included in our proposed algorithm.The details of the selection procedure as well as the reasons of selecting specific metrics are described below.
Let S r (i, j), S d (i, j) respectively denote the intensity value of the saliency map of the reference image and its distorted version at the position (i, j), and F r (i, j), F d (i, j) are the intensity (binary) value of the fixation map of the reference image and its distorted version at the position (i, j).
Local saliency similarity: As already discussed in Section II.B, AUC-Judd, AUC-borji, NSS, and IG are classified as local saliency similarity metrics.As per the overall performance listed in Table 2, AUC-Judd and NSS significantly outperform NSS and IG, therefore, we consider AUC-Judd and NSS as candidates and describe both metrics.
AUC-Judd [19]-The ROC (Receiver Operating Characteristic) curve is discretely given by the threshold set L = {l 1 , • • •, l n }.For each l i , i ∈ {1, • • •, n}, its corresponding coordinates are defined as (TPR li , FPR li ), where TPR li , FPR li are the true-positive rate and false-positive rate that can be formulated as follows: where tp li , f p li , tn li , f n li represent the number of the true positives, false positives, true negatives, and false negatives for the corresponding l i -level set classifier, and they can be implemented by level sets of the saliency map of the distorted image S d and the fixation map of the reference image F r .The AUC-Judd is equal to the area under the ROC curve.NSS [21]-For each fixation location (i, j) of the reference F r (i, j), the corresponding Normalized Scanpath Saliency (NSS) value for the distorted saliency map (i.e., S d (i, j)) is given by: where µ S d , σ S d are the average value and standard deviation of the test saliency map S d .Then the NSS metric for the whole test saliency map S d is: where F r,1 = {(i, j)|F r (i, j) = 1}.
Global saliency similarity: As already discussed in Section II.B, SIM, CC, KL, and EMD are global metrics.Based on the overall performance listed in Table 2, CC gives the best performance amongst all global metrics, therefore, we consider CC as the first candidate.SIM's performance is comparable to (slightly lower than) CC, but as we discussed in Section III.C, the image fidelity metrics are applicable and show good potential for DSV measurement, we hence consider SSIM-based metrics rather than SIM in our algorithm.We also want to consider a third candidate, KL, simply because it has been widely used as part of the loss function in deep learning-based saliency models.We wish to investigate whether KL has its place in our proposed framework.The details of the three candidates are described below.
CC [24]-The Pearson's Correlation Coefficient (CC) metric of the test saliency map S d and the reference saliency map S r is defined as: where σ(S d ), σ(S r ) are the variance of saliency maps S d and S r , respectively, and σ(S d , S r ) is the covariance between the saliency maps S d and S r .KL [18] -The Kullback-Leibler divergence (KL) metric of the test saliency map S d and the reference saliency map S r is defined as: where ǫ is a regularization constant.SSIM [27]-The structural similarity (SSIM) index of the test saliency map S d and the reference saliency map S r is defined as following mean structural similarity (MSSIM): where M N is the size of S d and S r , and SSIM[S d (i, j), S r (i, j)] is formulated as: where local statistics µ S d (i,j) , µ Sr(i,j) , σ S d (i,j) , σ Sr(i,j) , and σ S d (i,j)Sr(i,j) are estimated by the local window that is defined using an normalized 11 × 11 circularly-symmetric Gaussian function w = {w p |p = 1, • • •, l, l p=1 w p = 1} with the standard deviation of 3  2 .Specifically, these local statistics are estimated as: where p is the p-th position of the local window.
In summary, we have now selected two local saliency similarity measures (i.e., AUC-Judd and NSS), three global saliency similarity measures (i.e., CC, KL and SSIM) as the candidate sub-models for our proposed convex optimization framework for quantifying the distortion-induced saliency variation (DSV).We will now investigate the construction of a final DSV metric, based on above basic components.

E. SELECTION OF ESSENTIAL PARAMETERS FOR THE PROPOSED DSV METRIC
In the proposed algorithm framework, as detailed in Section III.A and equation ( 5)- (7) , there are three essential parameters that are worth discussing.We now give more details below on how these important parameters are determined using empirical methods.
Maximum level of the multilevel entropy for saliency dispersion measure: The maximum level of the multilevel entropy as illustrated in Figure 6 is determined by calculating the correlation between the estimated saliency dispersion degree and its ground truth -inter-observer agreement (IOA), as detailed in [10].By varying the variable, i.e., P max in equation ( 5), the correlation (i.e., PLCC) is calculated over the DSV benchmark database as a function of P max .It was found that the PLCC value saturated at the maximum level P max = 4.In order to determine whether there is a significant difference between P max = 4 and the higher maximum levels, a Wilcoxon signed rank test is applied, which is a nonparameter version of t-test in the case of non-normality based on the residuals between estimated multilevel entropy H Σ and its ground truth IOA.The test results show that there is no statistically significant difference between P max = 4 and P max = 5, and between P max = 4 and P max = 6 (i.e., in both cases, p-values are larger than 0.05).Therefore, we choose the maximum level P max = 4 for the calculation of the multilevel entropy to measure the saliency dispersion of the saliency maps contained in the DSV benchmark.
Steepness of the sigmoid function: In our proposed algorithm framework for distortion-induced saliency variation, the parameter τ is used to control the steepness of the sigmoid function.Inspired by the parameters of the Gaussian probability distribution, we estimate the steepness τ by the inverse of the unbiased deviation of the dispersion degrees of the reference saliency maps in the DSV benchmark, and the estimated value of τ is given by the following equation: where R is the number of the reference saliency maps in the DSV benchmark, S r,t represents the t-th reference saliency map, ) is the mean of the dispersion degrees of the reference saliency maps.Based on this statistical method, the steepness of the proposed DSV metric is computed and is equal to 20.62 for the DSV benchmark in this paper.
Threshold of the sigmoid function: In order to control the convergence and the robustness of our proposed MDSV framework, we need to determine a suitable threshold T for our MDSV algorithm.Since the modulator of our proposed metric framework is related the dispersion degree of the reference saliency maps, we estimate the threshold as: where T [S r,t ] is the estimated value of the t-th reference saliency map in the DSV benchmark, and it can be computed by the following formula: where τ is given by equation (20), and dist(S r,t ) is computed as: where M 1 (S r,t ) is the metric vector of metric M 1 for the test saliency maps originated from the t-th reference saliency map in the DSV benchmark, M 2 (S r,t ) is the metric vector of metric M 2 for the test saliency maps originated from the t-th reference saliency map in the SIQ288 database, DM SS(S r,t ) is the ground truth vector of metric DM SS for the test saliency maps originated from the t-th reference saliency map, and || • || 2 is the L 2 norm of vector space.For example, if we use the modulator to fuse the local similarity measure AUC-Judd and the global similarity measure CC, the estimated threshold is 4.38.

A. INTERACTIONS OF SUB-MODELS FOR THE CONVEX OPTIMIZATION METHOD
In Section III.D, we have selected potential candidate submodels (for both local and global saliency similarity measures) to construct our final MDSV metric.Now, we investigate (1) how different combinations of local and global measures and (2) how different fusion methods can affect the final DSV measurement.
Firstly, based on the proposed convex optimization method of equation ( 6), we define a metric for distortion-induced saliency variation, namely MDSV 1 , combing the local saliency similarity measure AUC-Judd and global saliency similarity measure CC.We now compare our combination strategy to other alternatives including linear regression (LR), multi-layer perceptron (MLP) [31], support vector machine with polynomial kernel (SVM-P) [32], SVM with radial basis function (RFB) kernel (SVM-RFB) [32], model tree rules (M5Rules) [33], random tree (RT) [34], and random forest (RF) [35].Each of above fusion methods was used to combine AUC-Judd and CC to form a DSV measurement (note since they are machine learning-based fusion method, a 10-fold cross-validation was used to generate results for a fair comparison between methods).The performance (i.e., PLCC, SROCC and KROCC) of these DSV measures on the DSV benchmark are listed in Table 5.It can be seen that our MDSV 1 significantly outperforms other DSV measures.Also, it is worth noting that MDSV 1 produces better performance than any of the individual sub-models (see their performance in Table 2), meaning the proposed convex optimization method has proven efficacy.
Secondly, in the literature of computational saliency models, deep learning-based methods (e.g., SAM-VGG and SAM-ResNet [36]) often benefit from a loss function that combines sub-models NSS, CC and KL.By using the above mentioned fusion methods, i.e., LR, MLP, SVM-P, SVM-RFB, M5Rules, RT, and RF, DSV measures can be produced by combining NSS, CC and KL (note KL already shows little impact on DSV measurement (see Table 2), but it is included in these metrics without compromising the performance as these metrics learn the weights of sub-models).Since they are machine learning-based fusion method, a 10-fold cross-validation was used to generate results for a fair comparison between methods.Meanwhile, we can use our proposed convex optimization method to define a new DSV metric, namely MDSV 2 , by fusing the local saliency similarity measure NSS and the global saliency similarity measure CC.Note, KL is not considered in our DSV metric as it shows little impact on DSV measurement (see Table 2) and our method is designed to combine only two submodels.The performance (i.e., PLCC, SROCC and KROCC) of these DSV measures is listed in Table 6, which shows that our proposed MDSV 2 is superior to other DSV measures.This also suggests that our proposed algorithm could be potentially used to improve the loss function of deep learning-based saliency models.Moreover, the performance comparison between MDSV 1 and MDSV 2 has demonstrated the contribution of the modulator (i.e., representing saliency dispersion degree) introduced in our algorithm framework.More specifically, the only difference between MDSV 1 and MDSV 2 is that the former uses AUC-Judd and the latter uses NSS as the local saliency similarity measure.According to Table 7, since AUC-Judd is better than NSS, we would expect that MDSV 1 is better than MDSV 2 in terms of DSV measurement.However, as can be seen in Table 7, MDSV 2 is statistically significantly better than MDSV 1 (i.e., p < 0.05 via a t-test).This means that the modulator has adaptively combined the sub-models towards optimal performance for DSV measurement.
Thirdly, based on our convex optimization method, we could use SSIM or its variant as the sub-model for global saliency similarity measure.This will produce a new family of DSV metrics, for example, we can define a MDSV 3 that combines the local saliency similarity measure NSS and the global saliency similarity measure SSIM.The performance of MDSV 3 will be discussed in more detail below.

B. OVERALL PERFORMANCE
Based on our convex optimization method, we produce three new DSV metrics, MDSV 1 MDSV 2 , and MDSV 3 which fuse the local saliency similarity measure and global saliency similarity measure.Now, we give a comprehensive evaluation of these new metrics in comparison to existing saliency similarity metrics.We calculate each new DSV metric for all test (deviated) saliency maps of the DSV benchmark, and quantify the ability of metric outputs to predict the ground truth DMSS scores using PLCC, SROCC, KROCC and RMSE.The overall performance is listed in Table 7 and Figure 7-9 each shows the scatter plot of the DMSS and one of the new DSV metrics.As can be seen, the proposed metrics MDSV 1 , MDSV 2 , and MDSV 3 outperform existing alternatives for the measurement of DSV.Hypothesis testing (i.e., t-test) was conduced and statistical results show that each of our MSDV variant is significant (p < 0.05) better than any other alternative metric of the state-of-the-art.
The superior performance of the proposed metrics demonstrates the importance of taking into account both local and global saliency similarity measures as well as saliency dispersion.It is worth noting that our proposed convex optimization framework is highly adaptable as described in above when MDSV 1 , MDSV 2 , and MDSV 3 were produced.Different combinations of local and global saliency similarity measures can be easily implemented based on the proposed framework.Based on our current available candidate sub-models, we can continue to construct more variants of MDSV, such as combing AUC-Judd and SSIM.When better sub-models are created in future, they can be easily used to replace current sub-models to produce new MDSV metrics.In addition, we analyse an important property of the DSV measurement.One of the significant findings of the previous study [9] is that DSV potentially forms a good basis for image quality prediction.In terms of the association between distorted images and their saliency maps, it is found that the wider the saliency variation relative to the reference (i.e., the larger the difference between the deviated saliency and the original saliency), the higher the distortion in the image (i.e., the lower the image quality).Statistical analysis [17] also revealed that the quality levels are significantly (p < 0.05) distinguished by the subjective (ground truth) DMSS.Now, we analyse whether this property also holds for the state-ofthe-art similarity metrics and our proposed metrics MDSV 1 , MDSV 2 , and MDSV 3 .Table 8 lists the statistical analysis for the similarity metrics.It can be seen that all metrics can distinguish medium and low quality levels, but most of them (except for SIM, EMD and our proposed MDSV 1 , MDSV 2 , and MDSV 3 ) fail in distinguishing the high and medium quality levels.Overall, results demonstrate that our proposed metrics give the best performance for the measurement of the distortion-induced saliency variation.

V. CONCLUSION
Following up on our previous work of building a benchmark for distortion-induced saliency variation (DSV), in this paper, we proposed a novel convex optimization framework for quantifying DSV.We have investigated eight state-of-the-art saliency similarity metrics and their predictive performance for DSV.From the statistical analyses, we found that the performance of these metrics for DSV highly depends on the dispersion degree of the saliency maps.Based on the findings, we therefore proposed an algorithm framework based on the convex optimization method to combine local and global saliency similarity measures and using saliency dispersion measure as the modulator.We have produced three new DSV metrics, MDSV 1 , MDSV 2 , and MDSV 3 based off our proposed framework.Experimental results show that these new DSV metrics significantly outperform existing metrics in quantifying distortion-induced saliency variation (DSV).Our proposed framework is highly adaptable, and can be easily updated given better-performing local and global sub-models are created.Going forward, the proposed DSV metric will be used to facilitate the development of advanced saliency models that can predict saliency of distorted images in various image quality related applications.

FIGURE 1 .
FIGURE 1. Illustration of the distortion-induced saliency variation (DSV).DMSS (difference mean saliency variation score) represents the degree of similarity between the deviated saliency map and the reference saliency map.

FIGURE 2 .
FIGURE 2. Illustration of the histogram of the difference mean saliency variation scores (DMSS) of the DSV benchmark [17].

Figure 5 ,
meaning they may focus on different properties of DSV measurement.These examples indicate that the measurement of DSV tends to depend on the dispersion degree of the reference saliency map.Based on these observations, we propose a new similarity

FIGURE 4 .
FIGURE 4. Correlation matrix visualization of eight saliency similarity metrics and DMSS of all deviated saliency maps from the reference, i.e., "Paint-house".

FIGURE 5 .
FIGURE 5. Correlation matrix visualization of eight saliency similarity metrics and DMSS of all deviated saliency maps from the reference, i.e., "Man-fishing".

FIGURE 6 .
FIGURE 6. Calculation of multilevel entropy HΣ for distinguished saliency maps.At each level the saliency map is divided into blocks of equal size.HΣ is found by the sum of the entropy computed at each level of partition.P max is the level with finest partitioning.

TABLE 4 .
Performance of the Structural Similarity (SSIM) Index, Multiscale Structural Similarity (MS-SSIM) Index, Complex Wavelet Structural Similarity (CW-SSIM) Index, and Information Content Weighted Structural Similarity (IW-SSIM) Index for the measurement of distortion-induced saliency variation (DSV) for saliency maps of the DSV benchmark[17].

FIGURE 7 .
FIGURE 7. Scatter plot of the DMSS and our proposed MDSV1.Curve shows the regression line of nonlinear logistic fitting.X-axis shows the predicted score MDSV1 and y-axis shows the observers' DMSS.

FIGURE 8 .
FIGURE 8. Scatter plot of the DMSS and our proposed MDSV2.Curve shows the regression line of nonlinear logistic fitting.X-axis shows the predicted score MDSV2 and y-axis shows the observers' DMSS.

FIGURE 9 .
FIGURE 9. Scatter plot of the DMSS and our proposed MDSV3.Curve shows the regression line of nonlinear logistic fitting.X-axis shows the predicted score MDSV3 and y-axis shows the observers' DMSS.

Table 2
lists the quantitative results of the performance in terms of the Pearson's correlation coefficient (PLCC), Spearman rank order correlation coefficient (SROCC), Kendall's rank order

TABLE 2 .
[17]ormance of state-of-the-art similarity metrics for the measurement of distortion-induced saliency variation (DSV) on all saliency maps of the DSV benchmark[17].

TABLE 6 .
[17]ormance of different fusion methods LR, MLP, SVM-P, SVM-RFB, M5Rules, RT, RF for combing the sub-models NSS, CC and KL and our proposed MDSV2 for combing NSS and CC (note KL is not considered as it has little impact on DSV) for DSV measurement on the DSV benchmark[17].