Reversible data hiding based on structural similarity block selection

,


I. INTRODUCTION
With the fast development and wide application of Internet technology, the amount of information is increasing dramatically.Leakage of information and violation of image ownership are becoming more common.To guarantee the security of information, the research on data hiding, cryptography, access control has been the hot topics.Among them, reversible data hiding (RDH), which enables receiver to recover original image losslessly and extract hidden secret data completely, has found wide applications in military, secret data transmission, image authentication [1], medical image processing [2] and so on.
In early work, most RDH methods are based on lossless compression [3], [4].The typical principle is to losslessly compress certain features of the original image to save space for reversible data embedding.These initial methods usually The associate editor coordinating the review of this manuscript and approving it for publication was Aniello Castiglione .
have the disadvantage of low embedding capacity and may lead to severe degradation of image quality.In this light, more efficient RDH methods have been proposed to achieve better performance, such as difference expansion (DE) [5]- [10] and histogram shifting (HS) [11]- [15].
For DE-based method, it was first proposed by Tian [5].Tian et al. calculated the difference of two adjacent pixels and expanded it for carrying one bit of embedding data.The main idea of DE-based methods is that there exists strong correlation within adjacent pixels.To further exploit the correlation, DE developed into prediction-error expansion (PEE) by incorporating some prediction methods.PEE-based methods [16]- [20] utilized the correlation of larger neighborhood pixels instead of the correlation of only two adjacent pixels in DE.
In addition to DE-based methods, the efficient RDH method based on histogram shifting (HS) was initially proposed by Ni et al. [11] in which a pair of peak (highest frequency) and zero (zero frequency) bins is chosen such that the pixels at peak bin are used for embedding one bit data and those bins between two bins are shifted one step towards zero bin to create vacancies.According to such principle, the performance of HS-based schemes heavily depends on the determination of the peak and zero bins as well as the sharpness of the generated histogram.
To obtain a sharper histogram and enhance performance, Thodi and Rodriguez [21] proposed the algorithm to combine PEE with HS, thus the prediction-error histogram (PEH) is usually a Laplacian-like distribution centered at 0 which is suitable for expansion embedding.In this way, a large increase in embedding capacity can be realized without causing more distortion.In the light of high efficiency of PEH-based method, many RDH technologies based on PEH have been proposed in recent years, for example, doublelayer embedding [22], adaptive embedding [23], [24], context modification [25], multi histogram modification [26]- [29], optimal expansion bins selection [30]- [33].Furthermore, based on the principle of PEH-methods, other RDH methods are proposed, such as pixel-value-ordering-based methods [34]- [36].In general, RDH methods strive to achieve better capacity-distortion performance, i.e., maintaining certain embedding capacity while minimizing the distortion.
For capacity-distortion performance of RDH methods mentioned above, they are usually evaluated only by PSNR of the marked image versus the original one because PSNR is the simplest and most widely used quality measurement index which has error sensitivity in image quality.However, with wide applications of processed image, such as the images used in medicine, machine learning and so on, PSNR is not very suitable for detecting fluctuations in image visual quality.So recently, the Structure Similarity (SSIM) [37]- [41] as structure distortion measure in RDH has attracted a great deal of attention.Thus it is highly sensitive to structural distortion and visual quality.In [39], they hid some information in the original image with taking the Human Visual System (HVS) into account.In [40], the author proposed an optimal RDH algorithm under structural similarity constraint.But these methods mainly paid attention to reduce structure distortion or improve visual quality without considering PSNR and embedding capacity.
Notice that most former RDH methods usually only use SSIM or PSNR as a quality measurement index.In this paper, we concentrate on enhancing SSIM and PSNR performance simultaneously and further employ double embedding to increase embedding capacity.The mechanism of our proposed method can be described as follows.We integrated SSIM-based block selection into the procedure of the optimal expansion bins selection.Specifically, based on block selection threshold, the original image is divided into non-overlapping blocks.For each block, if the structural similarity index is greater than a given threshold, it is marked as a smooth block; otherwise, marked as a rough block.In this way, we exclude those blocks with low structural similarity and keep the pixels in these blocks unchanged in order to avoid distortion.For the smooth blocks, we use optimal expansion bins to carry data and double embedding to increase the embedding capacity.The experiment results show our method performs better than the former methods in both SSIM and PSNR.
The rest of the paper is organized as follows.The fundamental of the Structural Similarity and PEH-based methods are briefly introduced in Section II.Section III describes the proposed the SSIM-based block selection and double embedding scheme in detail.In Section IV, the experimental results as well as the comparison with former block selection method and prior arts.Finally Section V concluded this paper.

II. RELATED WORK
In this section, we explain the concept of structural similarity which was used as a measurement index in the former methods, and then review the conventional prediction-error histogram expansion embedding.

A. STRUCTURAL SIMILARITY
The SSIM index was first put forward by the Laboratory for Image and Video Engineering at the University of Texas at Austin [37].It is to compare the structures of the reference and the distorted signals and has gained widespread popularity for its reliable evaluation in image structure [38]- [41].
Suppose that x stands for the set of all pixel points of each block in the reference image, and y stands for the set of all pixel points of each block in the distorted image in the same position as x.SSIM is based on three comparisons of similarity measurement-luminance, contrast, and structure-between x and y as follows where µ x , µ y , σ x and σ y represent the mean and standard deviation of x and y, respectively, and σ xy represents the covariance between x and y. c 1 , c 2 , and c 3 are constants to avoid system errors caused by µ 2 x + µ 2 x , σ 2 x + σ 2 y and σ x,y being zero.
Then combining the three comparisons of luminance, contrast and structure, the SSIM index function µ(•) is denoted as: To calculate the SSIM between two images, both images are divided into non-overlapping blocks of n × n size.Then a window of n × n size is slid block-by-block to calculate SSIM value µ of each block.Finally taking the average as the SSIM of whole image, the M-SSIM index function μ(X, Y) is denoted as: where M is the total number of blocks.

B. PREDICTION-ERROR HISTOGRAM SHIFTING
The procedure of data embedding of the conventional PEH-based RDH scheme is as follow.
An image is scanned in the raster-scan order from top to bottom, left to right.The pixels are collected into a onedimensional sequence as (x 1 , . .., x N ), where N is the total number of collected pixels.
Each pixel x i is predicted by some prediction algorithms among which the two most popular prediction techniques are median edge detector (MED) [7] and gradient-adjusted predictor (GAP) [16].
Then a predicted value x i is genetaeted where x i should be rounded to an integer.Thus we obtain the prediction-error of pixel x i as: and then an one-dimensional prediction-error sequence (e 1 , . . ., e N ).
The corresponding PEH denoted by h(•) can be obtained as: where # is the cardinal number.
One bit secret data m ∈ {0, 1} is embedded into e i through expanding and shifting as: where b is the chosen expansion bin.
Then after data embedding, the marked pixel is In the early work of PEH-based methods, bins (b − 1, b) are usually set as (−1, 0) since the bins (−1, 0) are usually two highest frequency bins.But with the development of PEH-based schemes, it is possible to improve the former modification by properly selecting the expansion bins to minimize the distortion for a given embedding capacity.
After selecting the optimal bins, the modification to prediction-error is adjusted into: where a < 0 b, a, b ∈ Z.
We use an example introduced in [30] to illustrate the optimal bins selection.First, the prediction-error histogram is created, assume that h(−1) = h(0) = h(1) = H and EC = 2H , so according to the modification shown in (6), the expected value of embedding distortion (the l 2 -norm) is calculated as: where E(•) is the expectation operator, I is the original image and I is the marked image, N stands for the total number of image pixels.
With the improvement of ( 8), the bins (−1, 0) are changed into bins (−1, 1).In this case, the bin 0 keeps unchanged, the bins larger than 1 and smaller than −1 are shifted, which means the prediction-error equalling 0 does not cause any distortion since its value remains unchanged in the embedding process.The expected value of embedding distortion is reduced from N − H to N − 2H .
In order to obtain the optimal expansion bins, Wang et al. [30] pre-calculated the embedding distortion for each possible choice, and determined the choice with minimum distortion.By this way, the embedding distortion can be reduced compared with former methods.

III. THE PROPOSED METHOD
In this section, our proposed method is described in Fig. 1.First, we introduce the embedding process and its mechanism.Second, we give an algorithm to determine block selection threshold and optimal expansion bins.Third, we explain the block selection, double embedding and extracting in detail.Finally, we simple point out auxiliary information which is necessary in our scheme.

A. EMBEDDING PROCESS 1) PREDICTION
Assume the original image I of N × N size.By scanning the original image in the raster-scan order, we obtain a twodimensional original matrix X: where x i,j represents the pixel value of location (i, j).MED [7] is employed to predict each original pixel.The algorithm of MED is low-complexity with an inherent edgedetection mechanism operating on a three-neighbor context.The pixels u, v, w at the right, bottom and bottom-right of the current pixel x i,j shown in Fig. 2 are defined as the context of x i,j .
Therefore, each predicted pixel x i,j is calculated as:   We collect each predicted pixel to generate the corresponding two-dimensional predicted image matrix X: Then we calculate the two-dimensional prediction-error matrix E = X − X, which is denoted as: where e i,j represents the prediction-error of location (i, j) and can be calculated as: In order to calculate SSIM value, X and X are both divided into N /n × N /n non-overlapping blocks, where • is the floor function and each block size is n × n.Correspondingly, the prediction-error matrix E is also divided into N /n × N /n non-overlapping blocks.The block representation of E is denoted as Ē.
We can know that in (1) and ( 2), α > 0, β > 0 and γ > 0 are parameters used to adjust the relative importance of the three components.In practice to simplify the expression and calculation, we set α, β, γ equal to 1, and c 3 = c 2 /2 for each block.Then we calculate SSIM value of each block as: Each SSIM value is collected into a two-dimensional matrix µ as: Therefore, each block Ēi,j has its corresponding SSIM value µ i,j .

2) BLOCK SELECTION
In this part, an algorithm to determine the block selection threshold and optimal expansion bins (a, b) is proposed.
The pixels in the original image need to be processed before data embedding, while such an operation including further data embedding may severely damage the structure and visual quality of some regions of the image.To reduce embedding distortion as soon as possible, we propose an SSIM-based block selection method.
Referring to ( 16), we use the same method to obtain the matrix µ because the SSIM value indicates the change in structure and visual quality before and after prediction.
Giving a block selection threshold τ and the matrix µ, we have that each block (i, j) is a Due to such selection, we are able to use the pixels of smooth blocks for data embedding, and keep the pixels of rough blocks unchanged.This SSIM-based block selection ensures that the local structural similarity of the image remains stable after prediction and also prevents more damage on structure and visual quality caused by further data embedding.
To guarantee the reversibility, we establish a binary block map matrix M to record the locations of smooth and rough blocks as a part of auxiliary information: where each M i,j = 1 if the block (i, j) is smooth; otherwise, M i,j = 0.

3) DETERMINATION OF OPTIMAL BLOCK SELECTION THRESHOLD
According to Ē, µ and the SSIM-based block selection threshold τ , we collect the prediction-error of block Ē as follows: From (18), we can know that the higher the τ , the stricter the selection, which can lead to a better SSIM performance.
According to B( Ē, τ ), we can obtain the prediction-error histogram denoted as h(•): h( Ē, τ ) = #{ 1 p, q N : e p,q = e, e p,q ∈ B(e, τ )} (19) According to the corresponding prediction-error histogram, the value of embedding distortion in our proposed method can be estimated as: where  19) and ( 20), we can see that the block selection result directly affects the number of pixels used for embedding, so the generation of prediction-error histogram heavily depends on the block selection result.
Different h(e, τ ) lead to different results of D(a, b, τ ).Therefore, D(a, b, τ ) is highly related to the determination of τ .
According to the aforementioned analysis, for a given embedding capacity, we need to maximize τ to determine the minimum embedding distortion with its corresponding expansion bins (a, b).
When the embedding capacity C is given, the problem of how to determine the optimal block selection threshold is formulated as follows: where C is the sum of embedding capacity (EC) and auxiliary information.C1 implies that under the constraint of a determined τ , we calculate embedding distortion for each possible choice of b, and determine one with minimum distortion.C2 implies that we need to ensure there are enough space for data embedding.

4) DOUBLE EMBEDDING
After obtaining the block selection threshold, we can divide the original image into smooth blocks and rough blocks, and then embed data into the pixels in smooth blocks.Therefore, the block representation of X can be rewritten as follows where B l,j contains n × n original pixels and each block B l,j has its corresponding classification index M l,j .The pixels in smooth blocks are preprocessed to avoid overflow/underflow.The maximum modification to each pixel in our proposed method is 1.The pixels which are equal to 0 or 255 may lead to overflow/underflow.The pixels with value 0 are changed into 1 while the pixels with value 255 are changed into 254.In this case, a binary location map L 1 is generated to record all the locations of these pixels.
After the preprocessing, scanning the block classification matrix M in the raster-scan order, we skip and keep them unchanged for rough blocks.For smooth blocks, beginning from the pixel x p 1 ,q 1 , we scan each pixel of each block in the raster-scan order.Each pixel x p,q is predicted to obtain the predicted value x p,q as (11).Each prediction-error e p,q is generated as ( 14) and then modified as if a < e p,q < b e p,q + m, if e p,q = b e p,q − m, if e p,q = a e p,q + 1, if e p,q > b e p,q − 1, if e p,q < a (23) where m ∈ {0, 1} is one bit to-be-embedded data, a and b are the determined optimal expansion bins.
Remark 1: The prediction-errors which are equal to a or b are expanded one step to carry one bit data.To ensure blind extraction and reversibility, for the prediction-error e p,q > b and e p,q < a, they are shifted one step right and one step left respectively.Doing so creates vacancies for the expanded prediction-errors and ensures that when blind extracting, the prediction-errors which are equal to b, b + 1, a − 1 and a are all and only used to data embedding.
The shifted pixels do not carry any data and are only used to ensure reversibility and blind extraction.For the predictionerror histogram, there usually exists zero frequency points as shown in Fig. 9, and assume that their corresponding prediction-error are ē (ē > b) and e (e < a) respectively.Because the zero frequency points ē and e create extra vacancies to store the shifted prediction-error, we only need to shift the prediction-error ē > e p,q > b one step right and e < e p,q < a one step left.For the prediction-error e p,q > ē, e p,q < e, they keep unchanged since they do not affect the reversibility and blind extraction.
By this way, the image distortion from shifting prediction errors can be reduced.Therefore, we have the following  embedding and shifting approach if e p,q < e, a < e p,q < b, ē < e p,q e p,q + m, if e p,q = b e p,q − m, if e p,q = a e p,q + 1, if b < e p,q < ē e p,q − 1, if e < e p,q < a (24) After modification to the prediction-errors, the marked pixel x p,q is obtained x p,q = x p,q + e p,q (25) So far, we have completed the first round embedding.To increase embedding capacity, we employ double embedding method similar to [23] rather than those approaches based on block selection.
To implement double embedding, we need to provide a mechanism to inform the decoder unambiguously which pixels are double-embedded.Meanwhile, it reflects the local complexity of the marked pixel x p,q after first embedding.Referring to Fig. 4, we use the mean square error (MSE) of x p,q as the parameter to decide whether or not the double embedding operation is required.
The MSE index function σ for x p,q is denoted as where x p,q = 1 3 (u + v + w).Therefore, we can introduce an MSE threshold T σ to control which pixels can be utilized to embed the second bit data.The smaller the T σ is, the less the number of available pixels is.
For each marked pixel x p,q , we have • if σ p,q T σ , the pixel x p,q is used for double embedding.
• otherwise, the pixel x p,q keeps unchanged.The method in the second embedding is similar with that in the first embedding, i.e., employing (23).Notice that, we need to make sure that the marked pixel value x p,q ∈ [1, 254] after the first embedding to avoid overflow/underflow.Therefore, we need another location map L 2 to record all the overflow/underflow pixels location.
After embedding the required data into the pixels from x p 1 ,q 1 to x p ,q in smooth blocks, where x p 1 ,q 1 is the first available embedding pixel and x p ,q is the last pixel for embedding data, we need to embed the auxiliary information as a part of payload.The size of auxiliary information is denoted by S aux .
• First, scanning each pixel of the embedded image in raster-scan order, we record LSBs of the first S aux pixels to obtain a binary sequence s.
• Next, using the proposed embedding method, s is embedded into the pixels from x p n ,q n to x p end ,q end in smooth blocks, where x p n ,q n is next available pixel of x p ,q and x p end ,q end is last embedding pixel in smooth blocks.
• Finally, by using LSB replacement, we embed the auxiliary information into the first S aux pixels to generate the final marked image.Finally, after embedding the required data and auxiliary information into the pixels of the original image matrix X, the marked image matrix X is generated.

B. EXTRACTING PROCESS
Extracting process is completely opposite to embedding.Through collecting the LSBs of the first S aux pixels, we obtain the auxiliary information.Then through auxiliary information, we extract the sequence s first.The extraction procedure is explained as follow.
Referring to (22), correspondingly, the marked image matrix X is divided into N /n × N /n non-overlapping blocks.The representation of X is rewritten as B.
According to M and the matrix B, in the revers scanning order to scan each pixels from x p end ,q end to x p n ,q n in smooth blocks, we obtain each marked pixel x p,q .For each x p,q , we calculate σ p,q first to determine whether the pixel x p,q is double embedded or not.
• if σ p,q T σ , we extract two bits and recover the original pixel.
• otherwise, we extract one bit data and recover the original pixel.For each time of extraction, we obtain the predicted value x p,q the same as (11) for each x p,q .The expanded or shifted prediction-error is obtained as: Then the original prediction-error is calculated as: • If e p,q ∈ {b, b + 1}, the data is extracted as m = e p,q − b and the original prediction-error value e p,q is recovered as e p,q − m • If ẽp,q ∈ {a − 1, a}, the data is extracted as m = a − ẽp,q and the original prediction-error value e p,q is recovered as e p,q + m • If ē + 1 > e p,q > b + 1, no data need to be extracted and the original prediction-error value e p,q is recovered as e p,q − 1 • If e − 1 < e p,q < a − 1, no data need to be extracted and the original prediction-error value e p,q is recovered as e p,q + 1 • Otherwise, no data need to be extracted and the original prediction-error value e i is e i itself Then the original pixel value x p,q is recovered as: x p,q = x p,q + e p,q (28) For those double-embedded pixels, the method of two extractions is completely same.Notice that, after the first  extraction, according to L 2 , we need to update the overflow/underflow pixel value.Update its value as 255 if x i is 254, or 0 if x i is 1 After the sequence s is extracted, we use s to replace the LSBs of the first S aux pixels to obtain the embedded image.
In the revers scanning order to scan each pixel from x p ,q to x p 1 ,q 1 in smooth blocks, we extract the embedding data according to the above procedure.
Finally, according to L 1 , we update the overflow/underflow pixel value, i.e., updating its value as 255 if x i is 254, or 0 if x i is 1.

C. AUXILIARY INFORMATION
In our proposed method, we introduce two binary variables ϕ 1 , ϕ 2 to record whether there exists a location map in the first and second embedding, respectively.In addition, we utilize M , L 1 , L 2 to record the block location, overflow location in the first embedding and overflow location in the second embedding.Moreover, L 1 , L 2 and M are all losslessly compressed into CL 1 , CL 2 and CM .Their compressed size information is also a part of auxiliary information, denoted as S CL 1 , S CL 2 , S CM , respectively.
Then for our method, the necessary auxiliary information includes:

D. EMBEDDING AND EXTRACTING ALGORITHM
In this part, the embedding and extracting algorithms are given.

IV. EXPERIMENTAL RESULT
In this section, some experimental simulations are conducted to demonstrate the performance of the proposed method and three benchmark methods.Specifically, we use six 512 × 512 sized standard gray-scale images including 'Lena', 'Barbara', 'Airplane', 'Peppers', 'Boat', 'Elaine' as test images, shown in Fig. 5. end for 10: h(e, τ ) is generated.For pixels in B l,j , embedding by (24) 22: if σ T σ then For pixels in B l,j , keeping unchanged 27: end if 28: end for 29: Return X In particular, our proposed embedding algorithm is compared with Cai and Gui [36], Sachnev et al. [22], and Wang et al. [30].Among them, Cai and Gui [36] proposed a method based on block selection, and assigned the median of each block as the reference pixel such that those pixels with the same reference value as the smooth level to select the block.Both Sachnev et al.'s method [22] and Wang et al.'s method [30]  For the test image 'Lena', the performance in different subblock sizes is shown in Fig. 6, and we can observe that the best performance happens in the case of n = 8.Similarly, for other test images, we can obtain the similar result through a larger number of experiments.Hence, the size of sub-block is set to 8, i.e., n = 8, in our experiment.
To show the intuitive effect of smooth and rough blocks, we provide a block structure of 'Lena' in Fig. 7 with the embedding capacity 10000, 30000 and 50000 bits.The white for j = j end to j n do 4: if M l,j == 1 then 5: pixel from x p end ,q end to x p n ,q n is 6: scanned in the reverse scanning order. 7: Extract and recover by ( 27) to (28) 8: if σ T σ then 9: Repeat step.if M l,j == 1 then 20: pixel from x p ,q to x p 1 ,q 1 is scanned in 21: the reverse scanning order.part is the set of smooth blocks, while the black part is the set of rough blocks.We can observe that as the increase of the embedding capacity, the number of smooth blocks (white part, corresponding to the available embedding pixels) begins to increase.
From Fig. 8, we can observe that as the increase of T σ , the number of available embedding pixels begins to increase in the second embedding process.From Fig. 9, we can observe that as the increase of τ the prediction error histogram becomes more sharp, which indicates that the embedding distortion would be less in case of a larger block selection threshold τ .
As shown in Fig. 10, we can observe that for the four test images 'Lena', 'Peppers', 'Boat', and 'Elaine', the performance of our proposed method is much better than other three ones.Our curve is on average 2dB higher in PSNR FIGURE 10.PSNR comparison of between our method and three methods in [22], [30], [36].
than that of other three methods.Moreover, the SSIM of 'Lena', 'Peppers', 'Boat' is relatively high which means the structure of the predicted image is highly similar to the original one, and then we are able to eliminate the blocks which are not suitable for further adjustment during the block selection.Notice that, for 'Elaine', although the SSIM is low, it has enough sub-blocks with high SSIM to guarantee the embedding capacity and performance.
For the test images 'Airplane', 'Barbara', the performance of our proposed method is not stable.The SSIM of sub-block is pretty close to each other, and there are a small number of sub-blocks with relatively low SSIM.So when the embedding capacity is not high enough (for 'Airplane' about 17500 bits, for 'Barbara' about 11000 bits), the embedding algorithm can not show the advantage of the block selection mechanism, but the influence of auxiliary information in 'Airplane' and 'Barbara' is more apparent.For 'Airplane', [36] has better PSNR when the embedding capacity is less than 17500 bits.For 'Barbara', the proposed method is pretty close to [36] and [22] when the embedding capacity is less than 11000 bits.With the increase of the embedding capacity, more pixels are utilized to embed data, and our proposed embedding algorithm outperforms other methods.
For the six test images, when a relatively small embedding capacity, the SSIM difference is not apparent between the proposed method and others.Through the PSNR experimental result, we select a relative high embedding capacity for six test images to make SSIM more intuitive.As shown in Tab. 1,

FIGURE 2 .
FIGURE 2. MED predictor for x i ,j .

FIGURE 3 .
FIGURE 3.An example of prediction-error histogram that contains zero frequency points.

FIGURE 4 .
FIGURE 4. Marked pixel x i ,j after the first round embedding.

1 :
Obtain auxiliary information 2: for l = l end to l n do 3: end for 16: Obtain the first embedded image 17: for l = l to l 1 do 18:for j = j to j 1 do 19:
are two typical PEH-based ones.Sachnev et al. proposed a sorting technique to select prediction-error based on local variance, while Wang et al. proposed to pre-calculate the minimum embedding distortion to obtain optimal expansion bins.