Perceptual Hashing With Complementary Color Wavelet Transform and Compressed Sensing for Reduced-Reference Image Quality Assessment

Image quality assessment (IQA) is an important task of image processing and has diverse applications, such as image super-resolution reconstruction, image transmission and monitoring systems. This paper proposes a perceptual hashing algorithm with complementary color wavelet transform (CCWT) and compressed sensing (CS) for reduced-reference (RR) IQA. The CCWT is exploited to decompose input color image into different sub-bands. Since the calculation of CCWT uses all color channels without discarding any information, the distortions introduced by digital operations on color channels are preserved in the CCWT sub-bands. The block-based CS is used to extract features from the CCWT sub-bands. As the Euclidean distance between the block-based CS features is slightly influenced by content-preserving operations, perceptual features constructed by Euclidean distances are robust, discriminative and compact. Hash sequence is finally determined by quantifying the perceptual features. Effectiveness of the proposed hashing is verified by various experiments on four open image databases. Experimental results demonstrate that the proposed hashing is superior to some state-of-the-art algorithms in terms of classification and RR IQA application.


I. INTRODUCTION
W ITH the advent of the big data era, the number of images is rapidly increasing and the demands for image quality assessment (IQA) have soared in many applications, such as image super-resolution reconstruction, image transmission and monitoring systems. For example, many image acquisition systems require a useful IQA scheme to adjust system parameters for obtaining good image quality. Therefore, it is highly desired to grade image quality in real-time for maintaining the required quality of service. Consequently, it is an important task to develop efficient IQA schemes for diverse image applications [1]- [3]. hashing [4], [5]. This paper investigates a novel perceptual hashing algorithm for RR IQA. Image hashing [6], [7] is a useful technology of image processing. It can effectively extract a visual content-based compact hash sequence from input image, which can be used to represent the input image itself. Since compact hash sequence has the advantages of low storage and fast calculation of similarity, image hashing has been widely applied to image copy detection, image retrieval, image tampering detection and image content authentication [8]- [10]. Robustness and discrimination are two major properties of image hashing. Specifically, robustness requires that the hash sequence of the original image and that of its distorted version are similar. Discrimination requires that hash sequences of different images are entirely distinct. The two properties are interrelated and mutually constrained. An effective algorithm of image hashing should make a desirable balance between them. In addition, for the application of RR IQA, perceptual image hashing must satisfy distortion sensitivity, which means that hashing algorithm needs to perceive the level of distortion on similar images. Note that there is a mutual constraint between robustness and distortion sensitivity. Strong robustness will lead to insensitivity of measuring distortion of similar image, while high distortion sensitivity will decrease the robustness. Therefore, it is a challenging task to develop novel perceptual hashing algorithms with a good trade-off between robustness and distortion sensitivity for the application of RR IQA.
Currently, most image hashing algorithms only consider sufficient robustness. They often ignore distortion sensitivity, and thus do not reach a desirable performance in the application of RR IQA. To tackle this problem, the paper proposes a novel perceptual hashing algorithm with complementary color wavelet transform (CCWT) and compressed sensing (CS). Compared with the existing hashing algorithms, there are three contributions as follows: (1) The CCWT is exploited to decompose input color image into different sub-bands for extracting perceptual features. Since the calculation of CCWT uses all color channels without discarding any information, the distortions introduced by digital operations on color channels are preserved in the CCWT sub-bands. Since the low-frequency sub-bands contain basic image information, the features extracted from these sub-bands can distinguish images with different contents. As the high-frequency sub-bands reflect multi-directional color information, the color features extracted from these sub-bands can describe image distortion on different color channels.
(2) Perceptual features are extracted from the CCWT sub-bands via block-based CS. Since CS can directly achieve compression coding during the sampling process, the block-based CS features retain original feature distortion and provide good distortion sensitivity of our hashing algorithm. As the Euclidean distance between the block-based CS features is slightly influenced by content-preserving operations, perceptual features constructed by Euclidean distances are robust, discriminative and compact.
(3) Numerous experiments with 140990 images (100350 for robustness analysis and 40640 for discrimination evaluation) are done to evaluate the performances of the proposed hashing algorithm. The results demonstrate that our hashing algorithm not only has good robustness and strong discrimination, but also keeps a good balance between them. In addition, the LIVE and TID2013 databases are used to validate our application in RR IQA. The experimental results show that the proposed algorithm outperforms some well-known IQA schemes.
The rest of this paper is organized as follows. Section II gives a review of the related work. Section III provides a specific description of the proposed hashing algorithm. Section IV describes the experimental results and performance comparisons. Our application in RR IQA is demonstrated in Section V. Finally, Section VI gives the conclusions.

II. RELATED WORK
Various image hashing algorithms have been developed by researchers. According to their feature extraction techniques, the existing hashing algorithms can be categorized into the following four types.

A. Transform Domain Based Hashing Algorithms
Robust features of these hashing algorithms are extracted from transform domain via transform techniques. The discrete wavelet transform (DWT), discrete cosine transform (DCT), Fourier-Mellin transform (FMT), Log-Polar transform (LPT) and Radon transform (RT) are popular transform techniques. For example, Venkatesan et al. [11] first utilized a wavelet representation of DWT to derive hash. This algorithm is robust against normal operations, but it is insensitive to malicious tampering. Wang et al. [12] jointly used DWT and DCT to extract features and compressed features by Karhunen-Loeve transformation. This algorithm is robust to normal digital operations, except rotation. Ahmed et al. [13] used DWT coefficients to generate the intermediate hash and compressed it via the SHA-1 function. This algorithm can find local tampering areas, but it is fragile to contrast and brightness adjustment. Qin et al. [14] used non-uniform sampling to extract frequency features in discrete Fourier domain. This algorithm only make use of the luminance component of images. Swaminathan et al. [15] encrypted features by a key and thus designed the FMT-based hashing algorithm for improving hash security. In [16], image features with rotation invariance were extracted by LPT. The two algorithms [15], [16] are both robust against rotation within 10 • . Liu et al. [17] exploited RT and invariant features to design a hashing algorithm. Since this algorithm contains multiple transformations, its computational complexity is not satisfactory.

B. Visual Saliency Detection Based Hashing Algorithms
Many visual saliency detection methods have been incorporated into the image hashing research for improving the performance of robustness. For instance, Monga et al. [18] developed a hashing algorithm with visual salient feature points. This algorithm can resist JPEG compression, but it is not resilient to geometric distortions. Wang et al. [19] extracted Gabor features for constructing visual system-based hashing. This hashing has a good classification performance, but it is insensitive to malicious changes in small blocks. In [20], a global Zernike moment and visual attention detection were utilized to design a hashing algorithm for content authentication. Wang et al. [21] detected visually important features by Watson's attention model and combined features with key points for image content authentication. This algorithm has good robustness, but it does not consider contrast adjustment and watermark embedding. In [22], the Phase spectrum of Fourier Transform (PFT) model of saliency detection and the ring partition were both utilized to develop a hashing algorithm. This hashing is resilient against large-angle rotation.

C. Matrix Factorization Based Hashing Algorithms
Some commonly-used dimension reduction techniques are introduced to the research of image hashing. For example, Kozat et al. [23] employed singular value decomposition (SVD) to derive hash for ensuring robustness. But the robustness improvement tends to increase misclassification. Motivated by the SVD-based hashing [23], non-negative matrix factorization (NMF) was applied to the design of image hashing [24]. This hashing algorithm can resist JPEG compression and rotation. But its robustness against watermark embedding must be increased. The hashing algorithm with CS and visual information fidelity was designed by Kang et al. [25]. This algorithm demonstrates good robustness against some operations. However, its discrimination performance is not good. In [4], Lv and Wang extracted features via fast Johnson-Lindenstrauss transform to generate hash for RR IQA. This algorithm only considers the compression distortions of JPEG and JPEG2000. In another work, tensor decomposition is first used for designing image hashing by Tang et al. [8]. Their hashing uses image blocks to construct a tensor and exploits Tucker decomposition to generate hash. However, the hashing only resists rotation with a small angle.

D. Statistical Features Based Hashing Algorithms
Different statistical approaches are adopted to extract features in image hashing. For example, Tang et al. [26] selected statistical features of an image ring as a feature vector and calculated invariant distance between feature vectors for hash generation. The ring-based algorithm can be robust to anyangle rotation. Qin et al. [27] used three techniques to extract features with structural information, including SVD, Canny operator and color vector angle. This algorithm is resistant to filtering and compression. Tang et al. [5] combined Canny operator with Weighted DWT (WDWT) statistical features for hash construction. This hashing provides a good measure of perceptual image distortion. But the dataset used for validating IQA performance is small. Huang et al. [1] developed a hashing algorithm for RR IQA of screen content images. This algorithm utilized local features of the gradient magnitude map and the normalized histogram for hash construction. Recently, Singh et al. [28] combined KAZE features and statistical features from the reference image to derive hash. This algorithm reaches good robustness and discriminative capabilities. In addition to the algorithms mentioned above, other techniques are also applied to image hashing. For instance, random Gabor filtering (GF) and dithered lattice vector quantization (LVQ) were jointly exploited to develop an algorithm [29]. This hashing algorithm is robust against JPEG compression and rotation, but its discrimination is unsatisfactory. The random walk (RW) hashing [30] was designed by adopting zigzag blocks with RW. This hashing enhances security via RW, but it is sensitive to rotation and translation. In [31], sparse coding was used to find features for hash construction. A fast discrete collaborative multimodal hashing algorithm was developed for image retrieval [32]. This hashing preserves high-level semantics while keeping low-level data features.
The above review shows that the current hashing algorithms have made a considerable progress. However, most hashing algorithms only consider sufficient robustness. They often ignore distortion sensitivity, and thus do not reach a desirable performance in the application of RR IQA. To tackle this problem, we design a perceptual image hashing algorithm which can reach good classification performance and is effective in RR IQA.

III. PROPOSED HASHING ALGORITHM
The block diagram of the proposed hashing algorithm is depicted in Figure 2. Our hashing algorithm is composed of the following six steps: (i) Pre-processing operations are applied to input image, where bi-cubic interpolation is exploited to convert the size of input image to U × U , and Gaussian low-pass filtering is used to eliminate noise. (ii) Two-level 2D CCWT is applied to the preprocessed image, and a low-frequency sub-band and four complementary color sub-bands are then generated for feature extraction. (iii) Low-frequency features are extracted from the low-frequency sub-band via block-based CS. (iv) Multi-directional edge detection is utilized to extract image edges from complementary color sub-bands. (v) Color features are extracted from maximum edge gradient maps via block-based CS. (vi) The hash sequence is constructed by concatenating the quantized low-frequency features and color features. Details of our hashing algorithm are described in the below subsections. Besides, Table I summarizes the relevant notations used in the paper.

A. 2D CCWT
The classical wavelet transform has been widely applied to many applications, such as IQA [5], image fusion [33]  and image watermarking [34]. To improve the effectiveness of color image processing with wavelet transform, Chen et al. [35] firstly introduced color relations based on complementary visual theory into wavelet transform, and thus developed an innovative tool called CCWT. When the CCWT is used to process color image, it can focus on the advantages of traditional wavelets without discarding colorrelated information.
Complementary colors play key roles in human vision. Generally, they are represented as a pair of colors whose mixture is a white color. For example, white and black, red and cyan, green and magenta, and blue and yellow are four important pairs of complementary colors. It is known that complementary colors can be represented in the RGB hue ring [35], where the R, G and B axes are positioned at angles of 0, 2π 3 and 4π 3 along the hue ring, respectively. In order to extend R, G and B axes to wavelet domain, the 1D CCWT with relative phase differences of 2π 3 is designed, i.e., φ (0) , φ (2π/3) and φ (4π/3) . Then, referring to classical DWT, the 2D singlechannel CCWT is extended from the 1D CCWT. Specifically, 2D single-channel CCWT can be seen as the calculations of 1D CCWT along with horizontal and vertical directions. Since 1D CCWT has three different low-pass and/or highpass filters, 2D single-channel CCWT is achieved by 9 DWT decompositions with varying phase combinations.
After decomposition, high-frequency components are available and then two-dimensional components are further created by calculating permutation and combination. Then the non-zero components are filtered out to obtain the two-dimensional wavelet groups, which have approximately eight directions n = mπ 8 (m = 1, 2, . . . , 8) and three relative phase differences, i.e., 0, 2π 3 and 4π 3 . Therefore, the 2D singlechannel CCWT inherits the directional filtering characteristics from the traditional wavelet and has richer directional selectivity. More details of the 2D single-channel CCWT can be referred to [35].
When the 2D single-channel CCWT is applied to color images, the R, G and B channels are mapped to the corresponding 0, 2π 3 and 4π 3 wavelet bases, respectively. To do this, the relative phase difference wavelets φ l,n , φ (G) l,n and φ (B) l,n . Next, wavelet decomposition of RGB channels is performed by them, and the obtained wavelet coefficient vectors can be expressed as c l,n , respectively, in which r, g and b represent the R, G and B channel vectors and * is the convolution operation. The complementary color operators are determined by the equations (1)-(4) as follows: 1) White-Black Operator: 2) Red-Cyan Operator: 3) Green-Magenta Operator:

4) Blue-Yellow Operator:
Note that the P can be considered as the chroma operator and it will be changed correspondingly when the chroma value changes. The P In this paper, two-level 2D CCWT is applied to the preprocessed image. Here, we take the mean of low-frequency components as the low-frequency sub-band P (L) and the sum of complementary color operators with l = 2 as the final complementary color sub-bands, i.e., White-Black comple- The sizes of these sub-bands are all U l × U l , in which U l = U/2 l . The CCWT coefficients in the low-frequency sub-band contain approximation information of image, while the CCWT coefficients in complementary color sub-bands contain multi-direction color information of image. Therefore, image features extracted from these sub-bands can not only distinguish different images but also indicate image distortion. A visual example of twolevel 2D CCWT is presented in Figure 3, where (a) is a color

B. Low-Frequency Feature Extraction Via Block-Based CS
The block-based CS is utilized to extract low-frequency feature from the low-frequency sub-band. The CS theory was proposed by Donoho [36]. The CS makes a breakthrough from the limitations of the Nyquist sampling theorem and enables compression coding during sampling. It has been successfully used in various fields, such as audio, holography and MRI [37]- [39]. Specifically, CS theory shows that as long as the signal is sparse in the transform domain, it can be measured by an observation matrix unrelated to transform bases and the original signal can be reconstructed from a small amount of sampling data. Let X ∈ R N×N be the test image and w be a wavelet orthogonal transform matrix sized N × N. Thus, the sparse representation X of X is calculated by the below formula [40].
where is the transposition-conjugate operator. Next, X is measured by an observation matrix ∈ R M×N (M < N). Then, a measurement matrix Y with size M × N can be represented as follows: Here, we utilize DWT to generate sparse representation X and use Gaussian matrix as the observation matrix. Moreover, the compression rate M/N is selected as 0.5. This is based on the consideration that the high compression rate cannot show the advantages of the low sampling rate of CS, while the low compression rate increases the possibility of losing important information.
To extract low-frequency feature via block-based CS, we divide the low-frequency sub-band To further calculate the relevance of information in the block, we divide every block into two parts sized b × b/2, i.e., a left sub-block and a right sub-block. To conduct CS, the two sub-blocks are then converted to a squared size b × b by bi-cubic interpolation. Let the resized left part and right part of B i be B (1) i and B (2) i , respectively. We apply CS to the two parts, and then obtain two measurement matrices Y (1) i and Y (2) i as follows: where B (1) i and B (2) i are the sparse representations of B (1) i and B (2) i obtained by DWT, is the observation matrix sized b/2 × b, and the sizes of Y (1) i and Y (2) To make the extracted features reflect the fluctuation of measurement matrix elements, we select variance as the CS feature. The variances of the two measurement matrices can be computed by the following formulas.
i (u, v) and Y (2) i (u, v) are the elements in the u-th row and the v-th column of Y (1) i and Y (2) i , μ Y (1) i and μ Y (2) i are the means of Y (1) i and Y (2) i , respectively. They are defined as follows: Next, we concatenate the CS features of all left and right sub-blocks, respectively. The block-based CS feature vectors are available by the below formulas.
Considering that two CS features are extracted from every block, we take them as a point, i.e., e (1) i , e (2) i , and compress features by computing the Euclidean distance from the point to a reference point. To do so, a reference point μ e (1) , μ e (2) is constructed, where μ e (1) and μ e (2) are the means of e (1) and e (2) , respectively. Thus, the reference point is computed by the below equations: The Euclidean distance from the point e (1) i , e (2) i to the reference point μ e (1) , μ e (2) is calculated as follows: Finally, the low-frequency feature sequence D 1 are available by concatenating d i as follows: The low-frequency feature sequence D 1 has F floating-point numbers.

C. Multi-Directional Edge Detection
As an important visual feature, image edge is the boundary of the area with significant local variations in brightness. It has been widely used in many applications of computer vision and image processing [41]- [43]. Since image edge can represent textures of image, numerous studies use edge detection methods to improve sensitivity to image distortion. Generally, image edge detection methods can find edge information via gradient algorithm and thus the extracted edges contain clear image contours and rich textures. As the complementary color sub-bands reflect multi-directional color information, the color features extracted from the edge of these sub-bands can describe image distortion on different color channels. Here, we use the multi-directional Sobel operator [44] to extract image edge from complementary color sub-bands. Our considerations of choosing the multi-directional Sobel operator for edge detection are as follows. The multi-directional Sobel operator is an improved technique of the well-known Sobel operator by using eight directions. The edge detection with the multi-directional Sobel operator is more accurate than the edge detection with the classic Sobel operator. In fact, the multi-directional Sobel operator can reach a good balance between computational cost and accuracy of edge detection. Moreover, the eight directions of the multi-directional Sobel operator, i.e., n = mπ 8 (m = 1, 2, . . . , 8), coincide with the directions of our complementary color sub-bands, which contributes to detecting rich edge information. Specifically, through eight 5 × 5 convolution templates, the gradients in eight directions of each complementary color sub-band can be calculated by the equations (20)- (27): in which * is the convolution operation, and f represents the complementary color sub-band. For every complementary color sub-band, we compare the absolute values of the gradients in the eight convolution results, and then output the maximum value to generate the maximum gradient map G. Let G( p, q) be the element of G at the position ( p, q). It can be denoted as follows: Thus, the maximum gradient maps of P

D. Color Feature Extraction Via Block-Based CS
To extract color features from the maximum gradient maps, the block-based CS described in Section III-B is adopted again. The four complementary color features are extracted from the corresponding maximum gradient maps via the blockbased CS, and then they are concatenated to generate the final color features. Here, take the White-Black feature extraction for example. Firstly, we divide the maximum gradient map of P (W-B) total into non-overlapping blocks with k × k size. Let U l be the integral multiple of k. Thus, there are a total of E = (U l /k) 2 blocks. Suppose that K j (1 ≤ j ≤ E) is the j -th block. Then, every block is divided into two subblocks, the two sub-blocks are resized, and CS is applied to them. Next, the block-based CS features are taken as a point and the Euclidean distance from the point to a reference point is computed. Finally, the White-Black feature sequence D (W-B) is available by concatenating all distances. Likewise, the Red-Cyan feature sequence D (R-C) , the Green-Magenta feature sequence D (G-M) and the Blue-Yellow feature sequence D (B-Y) can be extracted from the corresponding maximum gradient maps, respectively. Therefore, the final color features are available by concatenating four complementary color feature sequences as follows: The color features D 2 consist of 4E floating-point numbers.

E. Hash Construction
The extracted perceptual features from the CCWT subbands, i.e., the low-frequency feature sequence D 1 and the color feature sequence D 2 , are combined to construct the initial hash sequence as follows: where the parameters α and β are used to adjust the influence of D 1 and D 2 . Clearly, the length of D is Z = F + 4E. Let d(t) be the t-th element of the sequence D (1 ≤ t ≤ Z ). Since d(t) is a floating-point number, d(t) is quantized to an integer for reducing storage by the below equation.
where round [.] is the rounding operation. The final hash is obtained as follows: Consequently, the final hash h is a sequence of Z integers.

F. Hash Similarity Evaluation
We take the L 2 norm to measure similarity of two image hashes. Suppose that h 1 = [h 1 (1), h 1 (2), . . . , h 1 (Z )] and h 2 = [h 2 (1), h 2 (2), . . . , h 2 (Z )] are hash sequences of two images. The L 2 norm of the two hash sequences is defined by the following equation: where h 1 (t) and h 2 (t) are the t-th elements of h 1 and h 2 , respectively. Generally, a smaller d norm means more similar images of the input hash sequences. Therefore, for different images, their corresponding d norm should be a big value.

A. Experimental Settings
The parameters of the proposed hashing algorithm are set as follows. Pre-processing operations are exploited to convert the size of input image to 512 × 512, the low-frequency sub-band is divided into blocks with the size 32 × 32, the complementary color sub-bands are divided into blocks with the size 32×32, and the weight parameters are 0.5 and 2.5, i.e., U = 512, b = 32, k = 32, α = 0.5 and β = 2.5. Therefore, Z = 80. Section IV-E and Section IV-F discuss the effect of block size selection and the effect of weight parameter selection, respectively. Besides, two databases are selected for comprehensive experiments. Experiments are carried out on a computer with a 2.90 GHz Intel Core i5-10400 CPU and 16.0 GB RAM. The coding platform is the MATLAB R2019b and the adopted OS is Windows 10. Each of the used databases is detailed below, where some sample images from the databases are shown in Figure 5. 2) Test Set of COCO 2017 Database: This dataset [47] comprises 40640 color images for discrimination evaluation. It can be observed that the color images in this database contain a variety of contents, such as human beings, animals, buildings, sports, natural scenes, and man-made objects. This database can generate 40640 hash sequences of different images. Thus, hash similarity is computed and then C 2 40640 = 40640 × (40640 − 1)/2 = 825784480 L 2 norms in total are obtained.
3) Evaluation Criterion: To effectively evaluate the classification performance of our hashing algorithm in terms of robustness and discrimination, we adopt the famed tool called ROC graph [48] as the evaluation criterion. In the ROC graph, false positive rate and true positive rate are viewed as a set of points with coordinates for plotting the ROC curve. For hashing algorithms, the false positive rate indicates discrimination and the true positive rate represents robustness. The equations of the true positive rate R t and the false positive rate R f are given as follows: (35) in which n correct is the number of correctly identified similar images, n wrong is the number of wrongly distinguished different images, N similar and N different are the number of similar images and different images, respectively. Clearly, good performance of a hashing algorithm means a high R t and a low R f . This implies that in the ROC graph, the curve closer to the top-left corner has better performance. For the Area Under the Curve (AUC), its value range is [0, 1]. Therefore, the larger the AUC, the higher the classification accuracy.

B. Robustness Analysis
To analyse the robustness of our hashing algorithm, the experiment is carried out on the UCID-based database. In detail, we evaluate hash similarity between 99012 pairs of similar images with L 2 norm. Generally, small L 2 norms of similar image pairs demonstrate that the hashing algorithm can correctly identify similar images. The mean L 2 norms under different parameters of content-preserving operations are displayed in Figure 6. The x-axis of the figure represents the used parameter of the specific operations, and the y-axis of the figure indicates the mean L 2 norm. It can be noticed that the mean L 2 norms under different operations are smaller than 60, except the CO as shown in Figure 6 (j). In addition, it is found that the mean L 2 norms are small and change slightly in Figures 6 (c), (d), (g) and (i). These results demonstrate that our hashing algorithm is robust enough against SN, SPN, JC and IS. The mean L 2 norm of CO is larger than those of other operations because the CO introduces multiple distortions. All averages of the mean L 2 norms are smaller than 60, indicating that our hashing algorithm can resist content-preserving operations.
Furthermore, we tabulate the maximum, minimum, mean and standard deviation of L 2 norm under all operations in Table II. It is clearly seen that the means of all operations are smaller than 40, except the CO. In addition, most statistical results of L 2 norm are small, implying that the proposed hashing algorithm can effectively identify similar images. To further understand the identification accuracy, we calculate the correct detection rate of similar images under different thresholds. Specifically, when the threshold T = 50, 91.37% similar images are accurately identified. And our hashing algorithm can successfully identify 98.77% or 99.96% similar images when T = 90 or T = 150, respectively. These results illustrate that our hashing algorithm can maintain a high correct detection rate which satisfies the need for excellent robustness.

C. Discrimination Evaluation
The test set of COCO 2017 database is adopted to evaluate discrimination performance of our hashing algorithm. Similarly, we evaluate hash similarity between 825784480 pairs of different images with L 2 norm. The distribution of all L 2 norms is depicted in Figure 7, where the x-axis of this figure is the L 2 norm and the y-axis of this figure is the corresponding frequency. The statistical results show that the minimum and maximum of all L 2 norms are 22.83 and 879.83, respectively. The standard deviation and mean of all L 2 norms are 50.68 and 214.37, respectively. It can be seen that there is a great difference between the mean L 2 norm of different images and the mean L 2 norm of similar images, which demonstrates that our hashing algorithm has good discrimination. However, the minimum L 2 norm of different images is smaller than the maximum L 2 norm of similar images. This   indicates that there is a certain classification error between different images and similar images. The correct detection rate (CDR) and false detection rate (FDR) of our hashing algorithm under different thresholds are tabulated in Table III, where CDR is the indicator of robustness, and FDR is the indicator of discrimination. Obviously, FDR decreases with the decrease of the threshold. A small threshold means a low probability of classifying different images into similar ones, indicating good discrimination. However, since robustness and discrimination are mutually constrained, CDR also decreases.
Therefore, a proper threshold should be selected to balance the discrimination and robustness according to the practical application.

D. Hash Storage
To calculate the cost of storing a hash sequence, we use the test set of COCO 2017 database in the discrimination experiment as the data source. As our hash sequence consists of 80 integers, the total number of 80×40640 = 3251200 elements is generated from 40640 color images by our hashing algorithm. Statistical results of 3251200 hash elements are as follows. The minimum and maximum element values are 1 and 481, respectively. Obviously, 9 bits can represent an integer in the range of 0 to 2 9 − 1 = 511. Consequently, one hash element storage requires 9 bits, and the storage cost of our hash is 80 × 9 = 720 bits in binary form.

E. Block Size Selection
Considering the importance of the block-based CS in our hashing algorithm, we discuss the effect of different block sizes on classification performances. Note that b represents block size on the low-frequency sub-band and k represents block size on the maximum gradient maps. Here, we only change b and k and keep other parameters constant. The used block sizes are selected from b ∈ {16, 32, 64} and k ∈ {32, 64}. We do not select k = 16 because the hash length is too long under this parameter value. Therefore, there are six combinations of the block sizes as follows:  are almost overlapping, i.e., b = 16, k = 32 and b = 32, k = 32, meaning that they have similar good classification performance. Nevertheless, for the storage cost, the hash length of b = 32 and k = 32 is shorter than that of b = 16 and k = 32. Taken together, when b = 32 and k = 32, the best performance of the proposed algorithm can be achieved in terms of classification and hash storage.

F. Selection of Weight Parameters
Apart from the block size, the weight parameters are another important factors influencing the performance of our hashing algorithm. Note that the final hash sequence consists of two main components, where D 1 represents low-frequency features and D 2 indicates color features. The weight parameters α and β are used to adjust the contributions of D 1 and D 2 in classification performance. We still only change α and β and keep other parameters constant.
Considering that the influences of content-preserving operations mainly concentrate on high-frequency sub-bands and the changes of low-frequency sub-bands are slight, we reduce the importance of D 1 and enhance the importance of D 2 for balancing their contributions. Specifically, the values adopted for α are 0, 0.3, 0.5, 0.7 and 1, and the values adopted for β are 1, 1.5, 2, 2.5 and 3. The AUCs of our hashing algorithm with different values of α and β are calculated for evaluating classification performance. The results are displayed in Table IV. To make easy comparison, the AUC results under different weight parameters are visually plotted in the Figure 9. Notice that the parameter selection of α = 0 and β = 1 means that the final hash is only composed of color features. For α = 1, as the β value increases, the AUC also increases and reaches the maximum value when β = 2.0 or β = 2.5. This indicates that color features have a positive effect on classification performance. Overall, all AUCs are bigger than 0.99900. Therefore, we can select suitable weight parameters according to the specific performance requirement of the application. In this paper, the target application of our hashing algorithm is RR IQA. Since D 2 helps to improve perceptual sensitivity, the best result is generated with α = 0.5 and β = 2.5 for the application of RR IQA.

G. Performance Comparison Among Different Algorithms
The superior performance of our hashing algorithm is demonstrated by making comparisons with some advanced hashing algorithms, such as PFT-RP algorithm [22], WDWT algorithm [5], SVD-CVA algorithm [27], RW algorithm [30] and GF-LVQ algorithm [29]. The UCID-based database and the test set of COCO 2017 database are deployed to validate performances. All images are converted to 512 × 512 before they are input to the compared hashing algorithms. Here the parameter settings and similarity metrics of these compared algorithms are consistent with those of their original papers. More specifically, for the similarity metric, PFT-RP algorithm adopts the L 1 norm, WDWT algorithm and SVD-CVA algorithm adopt the L 2 norm, RW algorithm and GF-LVQ algorithm adopt the normalized Hamming distance. The experimental results of our hashing algorithm with b = 32, k = 32, α = 0.5 and β = 2.5 are chosen for comparison.
Classification performances of different hashing algorithms are demonstrated in an ROC graph. Specifically, the ROC  Figure 10. According to this figure, we can observe that the curves of all algorithms are near the top-left corner of the ROC graph, and our curve is closer than those of other algorithms. Table V  Obviously, the AUC value of the proposed algorithm is the largest one, demonstrating that our hashing algorithm has better classification performance than other algorithms. Our hashing algorithm can achieve better classification performance. This is mainly contributed by the uses of CCWT and CS. The calculation of CCWT exploits all color channels without discarding any information. Since the low-frequency sub-bands of CCWT contain basic image information, the features extracted from these sub-bands can distinguish images with different contents. In addition, CS can directly achieve compression coding during the sampling process. As the Euclidean distance between the block-based CS features is slightly influenced by content-preserving operations, perceptual features constructed by Euclidean distances are robust, discriminative and compact.
Moreover, the computational time and hash length of these hashing algorithm are reported in Table V, where the computational time is measured by the mean time of producing a hash sequence. The computational time of the proposed algorithm, PFT-RP algorithm, WDWT algorithm, SVD-CVA algorithm, RW algorithm and GF-LVQ algorithm are 0.22, 0.07, 0.35, 8.70, 0.05 and 0.58 seconds, respectively. The proposed algorithm is faster than WDWT algorithm, SVD-CVA algorithm and GF-LVQ algorithm. The hash lengths of the proposed algorithm, PFT-RP algorithm, WDWT algorithm, SVD-CVA algorithm, RW algorithm and GF-LVQ algorithm are 720, 459, 640, 3328, 144 and 120 bits. Our length is much shorter than that of SVD-CVA algorithm, but it is longer than those of other algorithms.

V. APPLICATION IN REDUCED-REFERENCE IMAGE QUALITY ASSESSMENT
In many practical applications, when similar versions of the original image are correctly identified, the assessment of perceptual image quality is then needed. IQA is an increasingly significant issue [49], [50]. RR IQA is a branch of IQA and it assesses the distortion level of an image by extracting its content representation. Therefore, RR IQA requires three conditions as follows: (i) Provide a content-based image representation; (ii) Perceive the level of distortion on similar images; (iii) Predict score relevant to the visual quality of image. Coincidentally, perceptual image hashing satisfies these conditions. It can produce a visual content-based sequence from image and has the ability of sensitivity for measuring various image distortions. Thus, perceptual image hashing can be used for RR IQA. When our hashing algorithm is applied to the RR IQA, the specific process is as follows. At the sender's side, the hash of the reference image is generated by our hashing algorithm and then it is sent to the receiver via auxiliary channel. Meanwhile, the reference image is also sent to the receiver through transmission channel. At the receiver's side, the distorted version of the reference image and the hash of the reference image are both received. Next, the hash of the distorted image is extracted by our hashing algorithm. Finally, objective score is obtained by calculating the L 2 norm between the received hash and the extracted hash. To validate the effectiveness of our hashing algorithm in RR IQA, two databases are adopted to compare the IQA performance with different schemes in terms of some well-known evaluation criteria.

A. Experimental Databases and Evaluation Criteria
1) Experimental Databases: Two image databases are used to verify our application in RR IQA. The first database called LIVE [51] consists of 779 distorted versions of 29 reference images with 5 distortion types. These distorted versions contain 145 white noisy images, 175 JPEG compressed images, 169 JPEG2000 compressed images, 145 Gaussian blurred images and 145 fast-fading Rayleigh channel noisy images. The LIVE database provides the differential mean opinion score (DMOS) of subjective score. The second database called TID2013 [52] consists of 25 reference images and its 3000 distorted versions with 24 distortion types. The TID2013 database provides the mean opinion score (MOS) of subjective score.
2) Evaluation Criteria: Four objective performance evaluation criteria are adopted to evaluate the correlation between objective predicted scores and subjective scores, i.e., Spearman Rank-order Correlation Coefficient (SRCC), Pearson Linear Correlation Coefficient (PLCC), Kendalls Rank-order Correlation Coefficient (KRCC) and Root Mean Squared Error (RMSE), where PLCC and RMSE indicate accuracy and consistency of predictions, respectively, and SRCC and KRCC reflect monotonicity. Let W be the number of distorted images, x i be the converted objective predicted score of the i -th image after nonlinear regression, and y i be the subjective score of the i -th image. Then, these performance criteria are defined as follows: where s i presents the difference between the ranks of the i -th image in subjective and objective predicted evaluations, μ x and μ y are the means of all x i and y i , respectively, W con and W dis are the numbers of concordant and discordant pairs in the database, respectively. Clearly, good performance of an IQA scheme means the values near 1 for SRCC, PLCC and KRCC, and the RMSE value close to 0. Moreover, two average measures, i.e., the directed average and the weighted average, can be both determined by the equation (40).
where O represents the number of databases, and v j indicates one of SRCC and PLCC values in the j -th database. As for the directed average, ω j = 1. As for the weighted average, ω j represents the number of images in the j -th database, i.e., ω 1 = 779 and ω 2 = 3000.

1) Whole Performance Comparison:
In order to illustrate the advantage of the proposed hashing algorithm in RR IQA, we compare it with some popular IQA schemes, including two FR IQA schemes, i.e., PSNR [49] and SSIM [53], and four RR IQA schemes, i.e., WDWT [5], FSI [50], OSVP [54] and WNISM [55]. We summarize whole performance among different IQA schemes on the LIVE and TID2013 databases. values of our hashing algorithm achieve the best performance among all RR IQA schemes and they are larger than those of PSNR. Note that WDWT is a useful image hashing designed for RR IQA, which exploits traditional 2D DWT to extract features from the luminance component of color image (some channels are discarded). The proposed hashing algorithm is better than WDWT on the LIVE database for all criteria. As to the TID2013 database, the PLCC and RMSE values of the proposed hashing algorithm is slightly worse than those of WDWT. But they are better than those of FSI and WNISM, illustrating good IQA performance. Therefore, the whole IQA performance of the proposed hashing algorithm is better than that of WDWT. This is due to the fact that the proposed hashing algorithm uses CCWT to exploit all information for perceptual feature extraction, but WDWT discards some channels during feature extraction. The SSIM has better IQA performance than our hashing algorithm on all databases. This is because SSIM is a FR IQA scheme which can exploit all information of the reference image to assess image quality. But our hashing algorithm is an RR IQA scheme which can only access partial information of the reference image.
In some applications, full information of the reference image is unavailable. In this case, an RR IQA scheme instead of a FR IQA scheme can be used for assessing image quality.
To further examine the whole performance of the proposed hashing algorithm, the directed average and weighted average of SRCC and PLCC are illustrated in Table VI. It is obvious that the proposed hashing algorithm has bigger SRCC values in both directed average and weighted average than other RR IQA schemes. For PLCC, our weighted average is bigger than those of all compared RR IQA schemes, except that of WDWT, and it is also bigger than that of PSNR. In summary, our hashing algorithm achieves promising results in terms of the directed average and weighted average over the two databases.
2) Scatter Plots: The scatter plots of subjective scores versus objective prediction scores for different IQA schemes on the LIVE database are displayed in Figure 11, where a blue "+" indicates one test image, and red curves are the fitted curves estimated by the logistic regression analysis [56]. The LIVE database provides subjective scores (i.e., DMOS) and the IQA schemes generate objective prediction scores. The representative IQA schemes include PSNR [49], WDWT [5], FSI [50], OSVP [54], WNISM [55] and the proposed hashing algorithm. As can be seen from the Figure 11, the points of the proposed hashing algorithm concentrate around the fitted curve, indicating that the objective prediction scores of our hashing algorithm have better consistency with subjective scores than those of the compared IQA schemes.
3) Individual Distortion Comparison: To further comprehensively verify RR IQA performance, the predicted ability of the proposed hashing algorithm for individual distortion types is demonstrated. Here, we give the SRCC among all evaluation criteria. This is because the SRCC indicates the convergency and monotonicity between subjective and objective predicted evaluations. Table VII lists SRCC values under individual distortion types on two databases. For simplicity, the best results in FR IQA and RR IQA schemes are highlighted in bold, respectively. It can be found that SSIM and PSNR in FR IQA scheme are marked 14 and 15 times, respectively. The proposed algorithm is marked 17 times, ranked first in RR IQA schemes. More concretely, there are 11 kinds of noise distortions on the two databases. Compared with other RR IQA schemes, our hashing algorithm reaches the best results in these noise distortions, except for three kinds of noises, i.e., fast-fading noise, impulse noise and masked noise. Furthermore, our hashing algorithm outperforms the classical FR-IQA schemes, i.e., PSNR and SSIM, in many distortion types, such as lossy compression of noisy images, quantization noise, comfort noise and change of color saturation. Consequently, it is further confirmed that the proposed hashing algorithm reaches the advanced performance for individual distortion types. The proposed hashing algorithm achieves a better IQA performance of individual distortions than the compared schemes. This can be understood as follows. The calculation of CCWT uses all color channels without discarding any information and thus the distortions on color channels are well preserved in the CCWT sub-bands. Therefore, our color features extracted from these sub-bands can effectively describe image distortion on color channels.

VI. CONCLUSION
An effective image hashing algorithm with CCWT and CS has been presented in this paper, which reaches good performances of classification and RR IQA application. A key contribution is the use of CCWT to decompose input color image into different sub-bands. Since the calculation of CCWT uses all color channels without discarding any information, the distortions of color channels are preserved in the CCWT subbands. Another major contribution is the perceptual feature extraction from the CCWT sub-bands by the block-based CS. As the Euclidean distance between the block-based CS features is slightly influenced by content-preserving operations, feature construction by Euclidean distances can guarantee robust, discriminative and compact. Numerous experiments have been done for verifying the efficiency of our hashing algorithm. The results have demonstrated that our hashing algorithm can achieve a desirable classification performance. Besides, the results of experiments on two open databases have validated the superiority of our hashing algorithm in RR IQA application.