Exploring More Capacity for Grayscale-Invariance Reversible Data Hiding

Grayscale-invariant reversible data hiding (RDH) in color images was developed recently in which the graysclaes of the marked color image are required to be the same as that of the host color image. The grayscale-invariant RDH technique is very useful for many applications on color images in which the image processing algorithms are designed based on the gray version of the input color image. Some state-of-the-art grayscale-invariant RDH schemes were proposed recently. However their performance on embedding capacity and distortion is not satisfactory. For these existing works, the main bottleneck is the large number of correction bits generated for losslessly recovering the host green scales. In this work, we propose a novel grayscale-invariant RDH method which is able to reduce the need for correction bits. Our main contribution is to design a self-correcting mechanism which can modify the green scales for keeping grayscales unchanged while remembering the information of the host green scales. In order to obtain this task, we first observe that all previous methods generated correction bits in a functional approach which mainly results in a lot of correction bits. Instead of the functional approach, we use a relational approach to preserve the grayscales when embedding message bits. By this relational approach, the proposed algorithm only has to generate a small number of correction bits for losslessly recovering green scales. Moreover, this relational approach allows the proposed algorithm to embed more message bits into some selected green scales. The experimental results show that the proposed method significantly improves the embedding capacity and the rate-distortion performance compared to previous works.


I. INTRODUCTION
R EVERSIBLE data hiding (RDH) is a technique which embeds a secret message into an image such that, from the embedded image, one can not only extract the embedded secret message but also recover the original image without any distortion. This novel technique has found many applications in the area of medical image processing, military, and forensics where it is required to restore the original image without any distortion. There are many reversible data hiding techniques developed in the past decade. These popular methods are constructed based on two major strategies: the difference-expansion-based (DE-based) approach [16] and the histogram-shifting-based (HS-based) approach [11]. In order to generate marked image with small distortion, RDH methods usually apply these strategies to the residuals of the host images such as prediction errors [3], [7], [13], [15].
Generally speaking, most RDH algorithms are designed for gray images. Some RDH algorithms for color images are proposed [1], [6], [12], [17] and their goals aim to explore more correlations within three color channels in order to generate a sharper histogram of prediction errors and minimize the distortion of the marked images. When these RDH algorithms cause some modification for the host image, they may not preserve some important properties of the host color image such as its grayscale. There are many applications for the gray version of color images including black and white printing, black and white eBook reader, and so on. Thus, it is meaningful to develop RDH schemes for color images while keeping their gray version unchanged. For this purpose, Hou et al. [4] proposed the first grayscale-invariance RDH scheme for color images which embeds a secret message into a color image without changing the grayscale of the host image recently. Moreover, their result inspired several grayscale-invariant RDH works such as grayscale-invariant RDH in the domain of AMBTC compression [5]. Some RDH schemes for color images which keep different subjective visual perception invariant are also proposed [18].
In the grayscale-invariant RDH framework suggested in [4], it usually yields a correction bit when modifying a pixel for data embedding. For the purpose of lossless recovery, the generated correction bits should be kept in the host image. In the RDH scheme of Hou et al. [4], the correction bits are embedded in the blue channel while the secret message is embedded in the red channel and the green scale is adjusted to keep the grayscale unchanged. Thus each pixel only takes at most one message bit. In order to enlarge the embedding capacity, Gao et al. [2] used the technique of adaptive embedding pattern to find many pixels in which one can embed more secret bits into their red scales. However, the embedding distortion of their scheme is still unsatisfactory. Recently, Zhou et al. [21] proposed a grayscale-invariant RDH scheme based on recursive code constructions (RCC). First, they approximated the modification-independent ratedistortion model for grayscale-invariant RDH and estimated optimal transition probability matrices. Based on the optimal transition probability matrices, they used the method of recursive histogram modification developed in [19], [20] to achieve lossless data hiding. The RCC-based RDH scheme proposed by Zhou et al. [21] has better performance than previous works. However, its embedding distortion is still not satisfactory when large embedding rate is required. Generally speaking, the main bottleneck is the large number of generated correction bits which are required to be accommodated in blue scales for image recovery.
In this paper, we propose a novel grayscale-invariant RDH algorithm which significantly reduces the need for correcting bits in the framework of Hou et al. [4]. Our key contribution is to design a self-correcting mechanism which is able to modify the green scales for making corresponding grayscales unchanged while keeping the original green scales in mind. To obtain this task, we first observe that the previous methods [2], [4], [21] generated correction bits in a functional approach which results in a large number of correction bits. Instead of the functional approach, we adopt a relational approach to keep the grayscales unchanged when embedding message bits. Following the relational approach, the proposed algorithm only needs to generate a small number of correction bits for lossless recovery. In addition, by the relational approach, the proposed algorithm also can embed message bits into some selected green scales. Experimental results show that, based on the above two advantages, our proposed grayscale-invariant RDH scheme obtains larger embedding capacity and has much better rate-distortion performance than previous works.
The rest of this paper is organized as follows. In Section II, the definitions of grayscale-invariant RDH schemes are given. Moreover, some related works are introduced there. In Section III, the self-correcting grayscale-invariant RDH scheme is proposed. Experimental results are given in Section IV. Finally, the conclusion and the future works are given in Section V.

II. GRAYSCALE-INVARIANT REVERSIBLE DATA HIDING SCHEMES AND RELATED WORKS
In this section, we briefly introduce reversible data hiding in color image with grayscale invariance. Let Among many transformations from a color image X = (R, B, G) into a gray image V = {V i,j }, the most widely used technique is the well-known formula from [22]: for each pair i, j, where f v is defined by and x denotes the nearest integer to x. For convenience, let us denote the generated grayscale image by Based on these notations, in the reversible data hiding framework for color images with gray invariance, the data hider losslessly embeds a secret message into a given color image X = (R, B, G) and generates the marked image X = (R , B , G ) such that f v (R, B, G) = f v (R , B , G ) and the distortion between X and X is minimized. Hou et al. [4] proposed the first RDH scheme for solving the above problem. The flowchart of their proposed scheme is shown in Fig. 1. Their proposed scheme consists of two parts: a polynomial predictor to generate prediction errors and a data hiding scheme which embeds message bits into red-scales and correction bits into blue-scales, respectively.

A. THE POLYNOMIAL PREDICTOR BASED ON GRAY-SCALES
An image X = (R, B, G) is given. The prediction errors of R and B are generated in a raster order. The prediction value of R i,j denoted as R i,j is generated as follows. The predictor computes the coefficient vector [a, b, c] T such that The prediction error of R i,j is defined by e R i,j = R i,j − R i,j . The prediction value B i,j and error e B i,j of B i,j can be obtained in the same way.

B. REVERSIBLY ACCOMMODATING MESSAGES BY DIFFERENCE EXPANSION
One of the well-known RDH techniques is the so-called difference expansion developed by Tian [16]. The method reversibly accommodates message bits by expanding the generated prediction errors and embedding the message bit into the least significant bit of the expanded error. Specifically, the method embeds a message bit b ∈ {0, 1} into the prediction error e by modifying it to e as e = 2e + b.
From e , it is easy to recover e and b by computing b = e mod 2 and e = e /2 .

C. A REVERSIBLE EMBEDDING ALGORITHM FOR MESSAGE AND CORRECTION BITS
Given v, r, b, let us define The algorithm keeps the gray-scale V i,j unchanged in a functional approach. Precisely, the algorithm modifies the green-scale G i,j as Finally the marked pixel To recover G i,j , the recipient calculates where For losslessly restoring G i,j , the algorithm records the correction bit and recursively embeds c i,j into the subsequent host pixels.
In the scheme of Hou et al. [4], the correction bits {c i,j } are embedded into the prediction errors {e B i,j } as shown in Fig. 1 while the secret message bits are embedded into {e R i,j }.

D. THE RHOMBUS PREDICTOR BASED ON GRAYSCALES
To generate a sharper host histogram of prediction errors, we use the rhombus prediction method of Zhou et al. [21] instead of the method of Hou et al. [4]. The rhombus prediction method was originally developed by Sachnev et al. [15]. However, their method is not suitable for grayscale-invariant setting. Zhou et al. [21] extended the rhombus method for the grayscale-invariant setting as follows. The whole cover image is divided into two groups denoted as "shadow" and "blank" as illustrated in Fig. 2. The first (second) half of the secret message will be embedded into the shadow (blank) pixels. Since the embedding procedures in these two layers are the same, we only explain the embedding procedure in the shadow part. The predicted value R i,j of the red scale R i,j is computed based on the linear regression method. Precisely, the predictor computes the coefficient vector [a, b] T such that where [a, b] T is a least-squares solution of the following The prediction error of R i,j is defined by e R i,j = R i,j − R i,j . Following the same way, the predicted value B i,j of the blue scale B i,j is also computed with only a few minor modifications described as follows. Let us define VOLUME 4, 2016 The predicted value B i,j of the blue scale B i,j is computed by the following way. The predictor computes the coefficient vector The prediction error of B i,j is defined by e B i,j = B i,j − B i,j . Next, we determine the order of pixels for message embedding. There are several ways such as [8]- [10] to determine the order. In this paper, we propose the following method to determine the order of pixels for message embedding. The proposed method computes the local variance LV i,j of the (i, j)-pixel in order to determine the order. As shown in Fig. 3, the local variance LV i,j is computed by the following formula: The embedding order is then determined according to the sorting order of the local variance LV i,j .

III. PROPOSED METHODS
In this section, we propose a novel embedding algorithm for reversible data hiding in color image with grayscale invari- ance. The proposed RDH scheme reduces a huge number of correction bits generated in the framework of Hou et al. [4].
To obtain this task, the main technique is a self-correcting mechanism which is able to modify the green scales for preserving the grayscales while remembering the information of the original green scales. This could be viewed a procedure to embed many correction bits into selected green scales as shown in Fig. 4. Now we introduce our proposed RDH algorithm in the following subsections.

A. A RELATIONAL APPROACH TO KEEP GRAYSCALES INVARIANT
Suppose that X i,j = (R i,j , B i,j , G i,j ) is the currently selected pixel. We compute prediction errors {e R i,j , e B i,j } and expand them to {e R i,j , e B i,j } by using Eq. 5 in order to accommodate message bits.
In order to keep the grayscale V i,j unchanged, we adopt a relational approach to modify the green scale instead of the functional approach used in previous works [2], [4], [21]. Precisely, we find a suitable green scaleg to satisfy the rela- According to the relational approach, we consider the following procedure to determine G i,j .
1) (1-to-1 case) In this case, there is only one pair (g, g ) such that Clearly g = G i,j . Now we set G i,j = g . 2) (1-to-2 case) In this case, there are only two pairs (g, g 0 ) and (g, g 1 ) with g 0 < g 1 such that Clearly g = G i,j . Now we can set G i,j as either g 0 or g 1 . Moreover, we can accommodate an additional secret bit b here by setting 3) (2-to-1 case) In this case, there are only two pairs (g 0 , g ) and (g 1 , g ) with g 0 < g 1 such that We immediately set G i,j = g . For lossless recovery, we need a correction bit c i,j to record the original G i,j by setting 4) (2-to-2 case) In this case, there are four pairs (g 0 , g 0 ), (g 1 , g 0 ), (g 1 , g 0 ), and (g 1 , g 1 ) with g 0 < g 1 and g 0 < g 1 such that for any s, t ∈ {0, 1}. For lossless recovery, we set G i,j as For better understanding the above procedure to determine G i,j , we illustrate the algorithm with some examples shown in Fig. 5.
Note that some pixels for modification may yield overflow/underflow scales when embedding message bits. We skip them for modification and record them in the list tag. Precisely, if a pixel X i causes an overflow/underflow scale after modification, then it is labelled with "tag i = 0" otherwise it is labelled with "tag i = 1".

B. RESERVING SPACE FOR AUXILIARY PARAMETERS AND THE FINAL EMBEDDING ALGORITHM
First of all, we give an overview of how we store the auxiliary parameters. For lossless recovery, some particular pixels are reserved for recording auxiliary parameters. As in [4], we adopt the so-called invariant pixels to record the information.
The auxiliary parameters will be embedded into the LSBs (least significant bits) of blue scales of selected invariant pixels. Note that the grayscales of the selected invariant pixels are kept unchanged after LSB substitutions of their blue scales. Next, we embed the auxiliary parameters in the following way. For convenience, we introduce some notations. Let S i be a fixed region and B in i be the ordered set of blue scales of the invariant pixels in S i . Let S 0 be the initial fixed border region which provides enough space for storing some necessary information defined later. The auxiliary parameters are embedded as follows. First, we embed the pure message {m 1 , m 2 , . . . , m p } in the non-border region (that is, the complement of S 0 ) according to the method described in Section III-A. Then we collect the pixels for modification during message embedding as a subset E. Let ρ be a fixed positive integer. Next, starting from i = 0 to |E|/ρ , we do the following: 1) We embed the LSBs of the blue scales in B in i into the pixels outside of S i ∪ E. 2) We collect these pixels for modification during embedding the LSBs as a subset L i . Now we compress the bit string tag for the pixels in E∪L i and concatenate some necessary side information including |S i |, |E ∪ L i |, ending bits as the auxiliary information bits.
3) If the length of the auxiliary information bits is less than or equal to the length of LSBs of B in i , then we put them in the LSBs of blue scales in B in i and the embedding procedure is done. Otherwise, we extend S i to a set S i+1 with S 0 ⊂ S i+1 ⊂ S 0 ∪E, |S i+1 \S i | = ρ, and redo Step 1 to Step 3 for S i+1 .
As a requirement for S 0 , the border region S 0 should be large enough to accommodate the auxiliary parameters |S i | and N = |E ∪ L i | for later extraction. Now we give the detail of the final embedding algorithm. we assume that there are N 0 pixels {X 1 , X 2 , . . . , X N0 } in the non-border region which are selected for accommodating message bits {m 1 , m 2 , . . . , m p } in the sorting order according to their local variances. We illustrate the embedding process in Fig. 6. For the current pixel X i = (R i , B i , G i ), we embed message bits {m 1 , m 2 , . . . , m p } according to the four cases described in Section III-A. First, we embed two message bits denoted as m j and m j+1 into R i and B i respectively. By Eq. 5 in Section III-A, the embedding procedure is executed in one of the following four ways: 1) (1-to-1 case) There is a unique pair (G i , G i ) to make the grayscale V i unchanged and thus no correcting bit is required for recovering G i in this case. 2) (1-to-2 case) There are only two pairs (G i , g 0 ) and (G i , g 1 ) with g 0 < g 1 which make the grayscale V i unchanged. Thus no correction bit is required for recovering G i in this case. In addition, we can accommodate another message bit m j+2 by setting G i = g mj+2 . 3) (2-to-1 case) There are only two pairs (g 0 , g ) and (g 1 , g ) with g 0 < g 1 which make the grayscale V i unchanged. One of g 0 and g 1 is G i in this case. For lossless recovery, we need a correction bit c k to indicate the original G i by setting c k to satisfy G i = g c k .
The correction bit c k will be embedded into another pixel X i+1 as shown in Fig. 6. 4) (2-to-2 case) There are four pairs (g 0 , g 0 ), (g 1 , g 0 ), (g 1 , g 0 ), and (g 1 , g 1 ) with g 0 < g 1 and g 0 < g 1 which make the grayscale V i unchanged. One of g 0 and g 1 is G i in this case. For lossless recovery without correction bits, we set G i as Next, we will check whether the modified pixel X i = (R i , B i , G i ) is overflow/underflow or not. If X i is overflow/underflow, then it is labelled by tag i = 0 and skipped for modification. If X i is not overflow/underflow, then we label it with tag i = 1. After embedding message bits {m 1 , m 2 , . . . , m p } into the non-border region, we collect these pixels for modification as the set E. Note that N 0 = |E|. Next, we find a set S i with S 0 ⊂ S i ⊂ S 0 ∪ E for some i which is described in the beginning of this subsection. Then we use the same embedding procedure to accommodate the LSBs of blue scales in B in i . Recall that L i is the set of pixels which are used for embedding these LSBs. Finally, we get the location map tag for pixels in E ∪ L i and compress it as a part of auxiliary parameters. The auxiliary parameters consist of the compressed string of tag, the cardinality |S i |, and the number N = |E ∪ L i | of pixels used for modification outside of the pixel border region S 0 , and the two ending bits. The auxiliary parameters are embedded into the LSBs of the blue scales in B in i . This finishes the embedding procedure.

C. THE EXTRACTION ALGORITHM
The data extraction and image restoration are described as follows. When receiving the marked image X , the extracting algorithm locates the invariant pixels in the fixed region S 0 , reads the LSBs of the blue scale of these pixels, and extracts the part of the auxiliary parameters including |S i |, N = |E ∪ L i |, and the ending bits. Since |S i | is known, we can find the pixels in the set S i and get the string tag. Next, since N is known, we list the first N pixels {X 1 , X 2 , . . . , X N } according to the sorting order of their local variances. Then we extract the embedded bits of the LSBs of blue scales in B in i in the reverse order. Starting from the pixel X N = {R N , B N , G N }, we compute {e R N , e B N }, extract message bits, and restore {R N , B N }. In addition, we restore G N directly since (R N , B N , R N , B N , G N ) does not belong to the 2-to-1 case. As a result, we restore the pixel X N . In general, based on the location map tag i , we can check if X i was modified in the embedding phase. If tag i = 0, then we skip the pixel X i for restoring since X i = X i . If tag i = 1, then we compute {e R i , e B i }, extract the LSBs of pixels in B in i , and recover {R i , B i }. In addition, we recover G i from the previously recovered embedded bits according to the following conditions: 1) (1-to-1 case) There is a unique pair (G i , G i ) to preserve the grayscale V i . We just restore G i directly. 2) (1-to-2 case) There are only two pairs (G i , g 0 ) and (G i , g 1 ) with g 0 < g 1 which can preserve the grayscale V i . We just restore G i directly. In addition, we extract the embedded bit b by calculating g b = G i . 3) (2-to-1 case) There are only two pairs (g 0 , G i ) and (g 1 , G i ) with g 0 < g 1 which can preserve the grayscale V i . From the previous extracted correction bit c, we can restore G i by computing G i = g c . 4) (2-to-2 case) There are four pairs (g 0 , g 0 ), (g 1 , g 0 ), (g 1 , g 0 ), and (g 1 , g 1 ) with g 0 < g 1 and g 0 < g 1 which can preserve the grayscale V i . One of g 0 and g 1 is G i in this case. We restore G i by Based on the above extraction step and the knowledge of S i , we can retrieve the LSBs of blue scales in B in i and thus restore the original marked pixels in the set E. Now we execute the above extraction procedure for the marked pixels in E in the reverse order to get the message bits {m 1 , m 2 , . . . , m p } and restore the host image X.

A. SELF-CORRECTING ABILITY AND ADDITIONAL EMBEDDING CAPACITY OF THE PROPOSED RDH SCHEME
First, we demonstrate the self-correcting ability of the proposed method with the following experimental results. Six standard 512 × 512 color images including Lena, Baboon, Airplane, Barbara, Peppers and House selected from [23] are used in our experiment. The self-correcting ability of a given algorithm could be quantified by the number of the correction bits generated by the given algorithm. The proposed algorithm adopts a relational approach to reduce the number of correction bits. In fact, it only generates correction bits for the pixels in the 2-to-1 case described in Section III-A.
In Table 1, the rate of each case with respect to the number of total pixels for modification with embedding capacity 50000 bits and 100000 bits is given. Each rate is obtained Rate-distortion performance comparison between the method of Hou et al. [4], Zhou et al. [21], and the proposed method.
by executing the embedding procedure ten times and averaging their results. On average, 23 percent of the pixels for modification belongs to the 2-to-1 case. Thus the proposed method reduces 77 percentages on the number of correction bits. In addition, the proposed method also increases embedding capacity by accommodating message bits in the green scales of the pixels in the 1-to-2 case. On the average, 14 percent of the pixels for modification belongs to the 1-to-2 case. Therefore, the proposed method increases additional embedding capacity by 14 percent in this case.
We also statistically demonstrate the self-correcting ability by using 300 color images from the dataset in [14]. The average result is shown in Table 2. The average rate and the variance of each case with respect to the number of total pixels for modification with embedding capacity 50000 bits and 100000 bits are given. On average, 24.2 percent of the pixels for modification belongs to the 2-to-1 case. Thus, 75.8 percentage on the number of correction bits is reduced by the proposed algorithm. Moreover, on the average, 12.5 percent of the pixels for modification belongs to the 1-to-2 case. Therefore, the proposed algorithm increases additional embedding capacity by 12.5 percent on average. Since the variance of each case is small, this means that the proposed algorithm has nice self-correcting ability for most images from the dataset in [14].
In short, the proposed algorithm not only decreases the number of generated correction bits for the modified green scales but also increases additional space of green scales for further message embedding. Note that reducing the number of correction bits implies that one can accommodate more   [14] with capacity of 50000 bits and 100000 bits, respectively.
Images from the EC=50000 bits EC=100000 bits dataset in [14] Case  [21] uses the rhombus predictor to sharpen the host histogram of prediction errors and minimizes the total distortion by using the rate-distortion optimization method. However, their scheme still generates a large number of correction bits since their scheme also uses the functional approach to keep grayscales invariant. By using a relational approach to make gray-scales invariant, the proposed method reduces quite a lot of correction bits and finds out more space of green scales for embedding message bits.
As shown in Fig. 7, the rate-distortion performance of the proposed method is much better than that of Hou et al.'s method [4] and Zhou et al.'s method [21]. In particular, the proposed method achieves high image quality when the embedding capacity is large. Therefore, our proposed algorithm not only has good self-correcting ability but also obtains nice rate-distortion performance.

C. EMBEDDING TIME COMPARISON
For embedding time complexity, we compare the proposed algorithm with the method of Hou et al. [4] in order to understand the difference between the functional and relational approaches for grayscale-invariant RDH. Two methods are implemented in Visual Studio 2019, and the test machine is a Lenovo personal computer with an i5-6400 CPU @ 2.70 GHz and 16.00 GB of RAM. Both methods embed 50000 bits into six images and their embedding times (measured in seconds) are shown in Table 3. From Table 3, we observe that the proposed algorithm has larger embedding time than the method of Hou et al. [4] for most images since the proposed method requires more time to determine the modified greenscale than the method of Hou et al. [4]. Moreover, for images with rich texture such as Baboon and Peppers, we observe that both algorithms are time-consuming since the prediction errors generated by two algorithms are so large that the algorithms require a lot of time to handle pixel value overflow and underflow.

V. CONCLUSION AND FUTURE WORKS
In this paper, we study grayscale-invariant RDH schemes for color images. Previous methods used a functional approach to keep grayscales unchanged. However, a large number of correction bits are generated by their embedding algorithms. To address this problem, the proposed grayscaleinvariant RDH algorithm adopts a relational approach to keep grayscales unchanged. The proposed RDH scheme not only VOLUME 4, 2016 significantly reduces the number of correction bits but also increases additional space for embedding message bits. The experimental result shows that the proposed RDH scheme significantly outperforms previous methods. We have two future works. Following Zhou et al.'s approach [21], our first future work is to apply the ratedistortion optimization method to further minimize the total embedding distortion of the grayscale-invariant RDH schemes which adopt the relational approach to preserve grayscales. Our second future work is to study whether the histogram-shifting technique can be used for grayscaleinvariant RDH.