Efficient Reversible Data Hiding Simultaneously Exploiting Adjacent Pixels

As an effective means to content authentication and privacy protection, reversible data hiding (RDH) permits us to hide a payload such as authentication data in a media file. The resulting marked content will not introduce noticeable artifacts. In order to achieve superior payload-distortion performance, the conventional RDH algorithms often exploit the smooth content for data embedding. Since RDH allows both the embedded payload and the raw content to be perfectly reconstructed, it is required that, the altered smooth regions within the cover should be identified without error by a data receiver. Therefore, a core work in RDH is to design an efficient content-aware algorithm that can enable a data hider to take advantages of the smooth cover elements as much as possible while the detection procedure for the marked elements should be invertible to a data receiver. This has motivated the authors to present a novel patch-level selection and breadth-first prediction strategy for efficient RDH in this paper. However, different from many conventional RDH works, the proposed approach allows a data hider to preferentially and simultaneously use adjacent smooth elements as many as possible, which can benefit data embedding procedure a lot. Experiments show that our work significantly outperforms a part of advanced RDH algorithms in terms of the payload-distortion performance, which has demonstrated the superiority and applicability.


I. INTRODUCTION
Reversible data hiding (RDH) [1]- [3] allows us to modify an image to carry a secret payload without introducing artifacts. Different from steganography [4]- [7], RDH ensures that both the hidden data and the original media content can be perfectly reconstructed. RDH is a fragile technique, meaning that, when a media file containing extra data was manipulated, one will find it is not authentic and the original media content may be not fully reconstructed. RDH has been applied to sensitive applications requiring no degradation of the cover such as medical imagery and military imagery.
Many RDH algorithms are designed to images. A straightforward idea [3], [8] is to losslessly compress pixel values to reserve space for data embedding. To avoid obvious artifacts, the compression often executes on the LSBs of pixels, where the compression ratio is relatively low, which may The associate editor coordinating the review of this manuscript and approving it for publication was Leandros Maglaras . not provide a sufficient payload. More efficient techniques such as histogram shifting (HS) [1] and difference expansion (DE) [2] have been proposed to achieve a high payload or a low distortion. Most existing RDH systems [9]- [14] use HS or its variants, e.g., prediction error (PE) [9] and prediction error expansion (PEE) [13], [14]. The information-theoretic analysis of RDH [15]- [18] has also been deeply studied.
RDH into encrypted images [19]- [22] has also been studied. The motivation is, a content owner may share images only with an authorized receiver. Thus, before uploading an image to the cloud, the owner may encrypt the original image. After receiving an encrypted image, the manager appends extra data to the encrypted image. RDH in encrypted images is an intuitive and effective way to deal with it. In addition, RDH has also been combined in multimedia signal processing (MSP) applications such as image contrast enhancement [23]. Comparing with conventional MSP methods, a significant advantage for RDH-based MSP methods is that it can protect the original content from permanently distorted. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ A RDH system can be evaluated by the payload-distortion performance [24], [25]. Namely, for a payload, it is required to keep distortion as low as possible. In other words, for a fixed distortion level, it is desirable to embed as many message bits as possible. Strong spatial correlations exist between neighboring pixels in an image. It inspires researchers to design prediction based RDH algorithms, which use the prediction-errors (PEs) to carry a payload. A common operation [11], [24] used in RDH is therefore to generate a prediction error histogram (PEH) for data embedding. On the one hand, the PEs are noise-like components of the cover, meaning that, the slight modification to PEs would not result in obvious artifacts. On the other hand, as a pooled vector, the PEH allows us to embed secret data in an effective way. It indicates that, to provide superior performance, the design of a system may focus on PEH generation and its modification. Lots of works [26]- [32] are developed along this line.
In this paper, we present a new pixel selection and prediction strategy to PEH generation for efficient RDH. Different from many previous works, our method allows a hider to preferentially and simultaneously use adjacent smooth elements as many as possible, which can benefit data embedding procedure a lot. Our experiments show that the proposed work ensures reversibility and significantly outperforms related works in terms of payload-distortion performance, which shows superiority.
This rest are organized as follows. We revisit prior arts and clarify motivation in Section II. We show the framework of proposed work in Section III. The details of proposed work are given in Section IV, followed by experiments and analysis in Section V. Finally, we conclude this paper in Section VI. This work is an extension version of the paper [33].

II. PRIOR ARTS REVISITED AND MOTIVATION
Many RDH works generate a Laplacian-like PEH [34], [35], which benefits the payload-distortion performance a lot. This requires a well-designed predictor. Regardless of designing a predictor, it is desirable to choose smooth content for RDH since the prediction accuracy would be very high, namely, many small PEs can be collected. Accordingly, a necessary work for advanced RDH systems is to identify smooth content. However, to reconstruct the embedded payload and the original content, a receiver should be able to identify the smooth content. Thus, a RDH system should guarantee that, detecting marked locations is reconstructable.
There are two classical methods to deal with the aforementioned problem. One [14] is to divide the cover pixels into two sets, denoted by S 1 and S 2 where |S 1 | ≈ |S 2 |. We take Fig. 1 (a) for example. The pixels in S 1 are used for predicting the pixels in S 2 . The PEs of pixels in S 2 are determined to constitute a PEH used for data embedding. Thereafter, one may use the marked S 2 to predict S 1 and generate another PEH used for carrying the rest payload. It is seen that, the prediction accuracy may decline when to use the marked S 2 to predict S 1 since the pixels in S 2 have been altered, which may not benefit the payload-distortion performance. Another method [26] is to set |S 1 | |S 2 |, for which the pixels in S 1 are usually unchanged (in some cases, one may use some pixels in S 1 to store system parameters, e.g., the secret key). We take Fig. 1 (b) for example. One may use the pixels in the first row and first column to constitute S 1 and the rest belong to S 2 . Thus, the pixels in S 2 can be orderly predicted and embedded, e.g., from left to right and from top to bottom. However, in some cases, it may not benefit the performance since the smoothness of pixels may not match the embedding order, e.g., assuming that ''Region I'' in Fig. 1 (b) is relatively more complex than ''Region II'', if we orderly process the cover pixels from left to right and from top to bottom, we may alter ''Region I'' before ''Region II'', which is not desirable from performance optimization view.
One can use a local-complexity function [14] to evaluate the smoothness of a pixel such that smooth pixels can be embedded first. However, in many existing works, the prediction context for a pixel heavily relies on its neighbors. It indicates that, though a smooth pixel may be embedded, its neighbors which may be also smooth pixels should be unchanged during embedding, e.g., in Fig. 1 (a), {u, v, w, x} are unchanged before predicting A. Namely, many adjacent smooth pixels may not be simultaneously used for data embedding, e.g., {u, v, A} in Fig. 1 (a) may not be embedded at the same time, which provides the room for improvement.
It motivates the authors in this paper to introduce a novel patch-level selection and breadth-first prediction strategy to take advantages of smooth pixels as much as possible. We expect to preferentially and simultaneously use many adjacent smooth pixels for data embedding. And, a data receiver can identify the marked smooth pixels without error, ensuring that, both the secret message and the original image content can be perfectly recovered. In the following section, we will introduce the proposed method in detail.

III. GENERAL FRAMEWORK
With an image x = {x i,j } h×w , x i,j ∈ I = {0, 1, . . . , 2 d − 1}, d > 0, e.g., d = 8 for 8-bit grayscale image, the proposed work embeds a payload m = {m i } l i=1 ⊂ {0, 1} l , into x, resulting in a marked image y = {y i,j } h×w , y i,j ∈ I. For compactness, we sometimes consider x as a set including all pixels and x i,j as the pixel at (i, j) whose value is x i,j .
A patch is defined as a small matrix sized r × c, where 1 ≤ r ≤ h, 1 ≤ c ≤ w. The proposed work first collects a number of disjoint patches from x, each of which is then classified as ''a smooth patch'' or ''a complex patch''. Only smooth patches are used for pixel prediction and data embedding. The complex patches are unchanged. Then, by using proposed breadth-first prediction procedure, the pixels to be potentially embedded in smooth patches are predicted, which allows a data hider to collect all required PEs. By computing the local-complexity of each pixel to be potentially embedded, an ordered PE sequence can be generated. Thereafter, with optimized HS parameters, m can be successfully embedded into the corresponding PEH, finally resulting in a marked image y. Notice that, to avoid the underflow and overflow problem, a losslessly compressed location map has also been self-embedded into the PEH together with m.
At the data receiver side, according to the necessary parameters extracted from y, one can successfully identify the marked pixels and determine the marked PE sequence. Thus, with the HS parameters extracted from y, m can be perfectly reconstructed. Meanwhile, the original image x can be also recovered according to the extracted location map and other necessary information. Fig. 2 shows the sketch for the proposed work. In Section IV, we detail each important part.

IV. DETAILS OF PROPOSED METHOD A. PATCH-LEVEL SELECTION
The proposed patch-level selection procedure divides x into two disjoint subsets, denoted by S A and S B . Both S A and S B are further divided into two disjoint subsets, i.e., S A will be unchanged and used for predicting the pixels in S With the patch size, by processing x from left to right and from top to bottom, we can collect i,j } r×c be the k-th patch indexed by a row-by-row manner. For each p k , we randomly generate a binary matrix b k = {b where α (k) ≥ 1 is a predetermined integer parameter. For each p k , we determine a set s k including such elements that they are corresponding to 1 in b k , namely, For s k , the difference between the maximum element and the minimum element can be therefore computed as: We classify p k as a smooth patch if d (k) ≤ β (k) . Otherwise, p k is considered as a complex patch. Here, β (k) ≥ 0 is also a predetermined integer parameter. A smooth patch implies that, it would be suitable for data embedding. A complex patch will not be embedded. Accordingly, we determine: (4) VOLUME 8, 2020 Then, S A and S (1) A are determined as: Furthermore, S B and S (0) B are determined as: The patch-level selection procedure finally generates four disjoint pixel-sets depending on a sequence of parame- Thus, we only set two parameters.

B. BREADTH-FIRST PREDICTION
Once we collect four pixel-sets, we predict the pixels in S (1) A and collect PEs for data embedding. Since we expect to simultaneously use adjacent smooth pixels for data embedding, we cannot directly use the conventional predictors. We propose a breadth-first prediction algorithm to determine the prediction values of pixels in S A . The proposed algorithm corresponds to an iterative process. During the prediction process, we will select a specific pixel from S A . And, we will predict all its adjacent non-predicted pixels belonging to S (1) A simultaneously. The ''breadth'' prediction process will be terminated when all required pixels are predicted. This ''breadth-first'' perspective requires a low computational cost, i.e., O(4|S A |), since each pixel will be selected and processed at most once.
Letx i,j denote the prediction value of x i,j . We first append all pixels in S (0) A to an empty queue Q, and mark pixels in S (0) A as processed. Then, iteratively, we select a pixel x i,j from Q and perform the following steps until Q is empty.
Step 1) Collect all the adjacent pixels of x i,j . The collected pixels should be in S (1) A and have not been marked as pro- Step 2) For each collected pixel Step 3) Append all collected pixels (if any) to Q, and remove x i,j from Q. Mark the collected pixels as processed.
Algorithm 1 shows the pseudocode. We take Fig. 3 for explanation. Fig. 3 (a) shows the original pixel values. The patch size is 3 × 3, and α = 2. Moreover, S (0) Then, we select x 1,3 out from Q, and computex 1,2 = x 1,4 =x 2,3 = x 1,3 = 162. We further update Q as shown in Fig. 3 (b). We continue to select x 2,6 out from Q, and computex 1,6 =x 2,5 =x 3,6 = x 2,6 = 157, which is described in Fig. 3 (c). Fig. 3 (c) also shows the updated Q after processing. Fig. 3 (d, e, f, g) show the subsequent prediction operations, e.g., in Fig. 3 (f), x 1,1 is predicted from x 1,2 byx 1,1 =x 1,2 . select a pixel x i,j from Q, collect all its adjacent pixels (if any) in S (1) A that have not been processed 5: for each collected pixel x u,v (if any) do 6: append all collected pixels (if any) to Q, and remove x i,j from Q 13: mark all collected pixels (if any) as processed 14: end while 15: return prediction values of pixels in S (1) A As shown in Fig. 3 Thereafter, we will select x 2,3 out from Q. However, we will not use x 2,3 to predict its neighbors since all its neighbors have been processed. Moreover, no pixel will be appended to Q, and x 2,3 should be further removed from Q. The same operation will be orderly applied to x 1,6 , x 2,5 and x 3,6 . In this way, we finally have Q = {x 3,2 , x 2,4 , x 3,5 , x 1,1 , x 2,2 , x 1,5 }.
We continue to select x 3,2 out from Q, and computex 3,1 = x 3,2 = 162. Fig. 3 (h) shows the corresponding result. Then, x 2,4 and x 3,5 will not be used to predict their neighbors. Thus, Q is updated as {x 1,1 , x 2,2 , x 1,5 , x 3,1 }. The subsequent operations are straightforward. Fig. 3 (f) shows all the required prediction values, which are close to the raw values.
The prediction algorithm ensures that, the prediction value of a pixel in S (1) A is predicted from its nearest pixel that belongs to S (0) A . Obviously, the Manhattan distance is at least 1. And, its upper bound is r + c. When we keep r + c low, the prediction accuracy will be satisfactory. It relies on the assumption that strong spatial correlations exist between smooth pixels with a small distance.

C. DATA EMBEDDING
After predicting all the pixels in S (1) A , we can determine the PEs of the pixels. For each x i,j ∈ S (1) A , the PE is defined as: We need to generate an ordered PE sequence for embedding m. It is quite desirable to further sort the PEs according to their local-complexities such that small PEs can be used first. This perspective has been utilized in many existing works [11], [14]. We use the prediction values of pixels in . Notice that, though the breadth-search order can be not unique, it can be controlled by a secret key. That means, by using the identical procedure, the data receiver would be able to produce the identical prediction.
A smaller local-complexity implies better prediction accuracy. Accordingly, we can sort all PEs in an increasing order of the local-complexity so that small PEs can be embedded first. We use e = {e i } n e i=1 to represent the sorted PE sequence. m will be embedded into e by using two pairs of peak-zero points. Let (l p , l z ) and (r p , r z ) denote the two pairs of peak-zero points. It meets that l z < l p < r p < r z . For a secret bit b ∈ {0, 1} and a PE e i ∈ e, the marked PEê i is produced by: e i will be added to the prediction value of the corresponding pixel so as to generate the marked pixel. It is mentioned that, we will terminate the data embedding procedure when m is fully embedded. It means that, there is a PE position t ≤ n e that all PEs {e i | i > t} are unchanged. It is necessary to optimize (l p , l z ) and (r p , r z ) such that the distortion by embedding m into {e i } t i=1 can be kept low. For a specific t, we can easily determine the corresponding PEH, i.e., It is inferred that, to carry m, we have h(l p ) + h(r p ) = l, where l is the length of m. Moreover, e t ∈ {l p , r p }. It is always assumed that h(l z ) + h(r z ) = 0 for natural images. Since m looks like a random bit-stream, we can estimate the data embedding distortion, denoted by We call t usable if there exist (l p , l z ) and (r p , r z ) meeting that h(l p ) + h(r p ) = l and e t ∈ {l p , r p }. Accordingly, we can collect all usable t, which can be done during the process of orderly collecting PEs [11]. For each usable t, we can further determine the suboptimal (l p , l z ) and (r p , r z ) by minimizing Eq. (13) with a complexity O(4|I|). Then, we can get global-optimal (l p , l z ) and (r p , r z ) from different usable t with a complexity O(n e ). In this way, m can be embedded. It is mentioned that, when the payload size is large, one may use multiple PEH bins to carry the secret payload. Moreover, the used multiple PEH bins can be optimized as well.

D. PARAMETER EMBEDDING AND PARAMETER EXTRACTION
For reversibility, the system parameters including the secret key, r, c, α, β, (l p , l z ) and (r p , r z ) will be embedded into the LSBs of some pixels belonging to S A . We use 16 bits to store the secret key. And, r (4 bits), c (4 bits), α (6 bits) and β (4 bits) are stored with 18 bits. 4d + 4 bits are used to store l p , l z , r p and r z , each of which requires d + 1 bits. In addition, before data embedding, a part of boundary pixels with a value of 0 or 2 d − 1 in S (1) A are adjusted into the range [1, 2 d − 2] and recorded to constitute a location map, which will be losslessly compressed by arithmetic coding. The compressed location map is also considered as a part of m. Thus, the size of pure payload is L = l − 38 − 4d − C LM , where C LM is the size of losslessly compressed location map. In most cases, C LM for a natural image could be quite small, e.g., the standard Lena image has no boundary pixels in our experiments, meaning that, its impact on L can be roughly ignored. In our experiments, we will use l to denote the payload measurement. For a receiver, after obtaining the marked image y, he should extract the embedded parameters at first for subsequent operations. The parameter extraction operation is straightforward.

E. DATA EXTRACTION AND IMAGE RECOVERY
By extracting (r, c) from LSBs of some pixels, a receiver can collect all N p patches. The N p binary matrices are computed by the secret key and α. With β and the binary matrices, the receiver can identify smooth patches. Accordingly, S e i will be added to the prediction value of the corresponding pixel to reconstruct the preprocessed pixel value at the sender side. With m, the receiver can further obtain the original LSBs of specific pixels in S (1) B as well as the losslessly compressed location map, which allows x to be finally recovered without error. Obviously, the secret message can be parsed as well.

V. EXPERIMENTS AND ANALYSIS
In this section, we conduct experiments for evaluation. We first take standard images 1 Airplane, Lena, Tiffany, Peppers, Baboon, Boat shown in Fig. 4 varying from smooth to complex for experiments. We apply proposed patch-level selection procedure to the images. Fig. 5 shows the resulting distribution maps. It is seen that, the proposed work selects smooth content out from an image. The selected pixels to be potentially embedded could be adjacent to each other. It implies that, many adjacent smooth pixels could be simultaneously embedded, which has the potential to provide superior performance.  It is also observed that, due to the diversity, different images have a different number of smooth pixels, e.g., Baboon has a less number of smooth pixels comparing with other images. It indicates that, generally, different images will result in different payload-distortion performance. From the performance optimization view, to provide satisfactory performance, it is necessary to choose suitable parameters so that the payload can be fully carried and extracted without error. Meanwhile, the distortion should be as low as possible.

A. PARAMETER SELECTION
We should guarantee that, the preselected parameters, i.e., (r, c, α, β), enable a hider to generate a sharp PEH that can fully carry the payload. This can be done by tuning (r, c, α, β) within a small range. On the other hand, the aforementioned HS optimization algorithm can be adopted to keep distortion low. Actually, the setting of (r, c, α, β) does not only affect the size of embeddable payload, but also affect the performance since different (r, c, α, β) often result in different S A having different statistical characteristics. Accordingly, it is desirable to analyze the statistical characteristics of S A due to different (r, c, α, β), which could guide us to select suitable parameters.
For an image, we collect four disjoint pixel-sets. Since only S (1) A carries m, we therefore define the overall smoothness as: where Eq. (16) shows the average value of neighbors. It is easy to slightly modify Eq. (16) when a pixel position is out of the image. We apply the pixel selection algorithm to Airplane and Baboon, and compute the corresponding overall smoothness. Fig. 6 shows the results due to different (α, β) in cases r = c = 4 and r = c = 8. For the smooth image Airplane, for a fixed α, when β increases, ρ converges to a certain level. However, for the complex image Baboon, when β increases, ρ becomes relatively larger, especially for r = c = 8. A small ρ can be achieved by using small α and β. A small ρ also implies that the selected pixels are more smooth. Therefore, for RDH, we could use small α and β. Since different images   may use different α and β, from performance optimization view, it is desirable to optimize α and β within a small range. Based on this perspective, in our experiments, we will vary α from 1 to r×c 2 , and β from 0 to 8 for all images. The (α, β) resulting in the best performance will be chosen for RDH. Notice that, the parameter optimization is only available to a data hider. For a receiver, he only extracts the parameters.
It is also seen that, the smoothness in cases r = c = 4 and r = c = 8 are near to each other (though intuitively Baboon has relatively large fluctuation of ρ compared to Airplane). It indicates that small r and c correspond to similar smoothness. Moreover, as mentioned in Section IV-B, small r and c ensure that, a pixel can be predicted from a pixel (in S (0) A ) that has a small Manhattan distance to the present pixel, which can benefit data embedding operation. Based on this perspective, in our experiments, for optimization, we will vary both r and c from 1 to 8. It is also required that r ×c ≥ 2. And, in our experiments, we use 2α ≤ r × c, which ensures that |S A | so as to provide a sufficient payload. After applying the patch-level selection procedure, we predict the pixels in S (1) A . Since the pixels in S (1) A are relatively smooth, the prediction accuracy will be high. We take Airplane for example. Fig. 7 shows the PEHs for Airplane due to different β in case r = c = α = 4. It is seen that, the resulting PEHs are quite sharp. We further identify such pixel positions in S (1) A having a PE value of ''0''. Fig. 8 shows the distribution map with r = c = α = β = 4. The white pixels have a PE value of ''0''. It is observed that, many adjacent pixels have a PE of ''0'', implying that, the proposed work has good ability to simultaneously use adjacent smooth pixels.
In addition, we use a secret key to randomly generate binary matrices. Different binary matrices correspond to different divisions of the cover. We take Airplane and Baboon for statistical analysis. For specific (α, β, r, c), we ranomly generate a key and produce the binary matrices for 100 times. In each time, we further determine the overall smoothness. Fig. 9 shows the statistical results. It is seen that, for fixed (α, β, r, c), the overall smoothness due to different divisions are relatively stable, implying that, the impact of randomly generating binary matrices can be roughly ignored. It is also observed that, Baboon has relatively larger fluctuation of the overall smoothness comparing with Baboon, which is due to the complex content of Baboon and working mechanism of proposed work. It indicates that, due to the between-difference of images, we need to optimize parameters (α, β, r, c), which has been analyzed previously.

B. PERFORMANCE EVALUATION
The above analysis shows that our work has the potential to well predict smooth pixels and exploit smooth pixels for RDH. To evaluate the payload-distortion performance, we compare the proposed work with a part of recently reported HS based RDH works introduced by Dragoi et al. [27], Hsu et al. [35], Luo et al. [29], Sachnev et al. [14], Tsai et al. [9] and Hong et al. [26]. In our experiments, we use a random bit-string to represent the secret message. Fig. 10 shows the comparison results. It is observed that, the PSNR gain is significant at relatively low embedding rates for test images, which indicates that the proposed work significantly outperforms the related works in terms of payload-distortion performance. When the size of payload increases, the performance will decline, resulting in a low PSNR gain.
The performance for Baboon declines significantly, resulting in relatively worse performance at relatively high embedding rates. It is due to the working mechanism of proposed work. In detail, the proposed work aims to use smooth content for data embedding. However, due to the between-image difference, different images actually have different smoothness, meaning that, different images result in different performance (always true for content-dependent works). Moreover, the proposed pixel prediction algorithm is effective for smooth images since in smooth images, the values of adjacent pixels (or pixels with a small distance to each other) are quite close to each other. However, for a complex image, e.g., Baboon, the difference between two adjacent pixels could be very high, meaning that, the number of smooth pixels is limited, which cannot benefit the proposed prediction procedure. From the viewpoint of real-world, more natural images are likely to have relatively sufficient smooth pixels, indicating that, the proposed work has the potential to provide superior performance.

C. SYSTEM SECURITY
Many conventional RDH methods use prespecified contentaware procedure, which, however, according to Kerckhoff's principle, may allow an unauthorized receiver to identify the marked pixels and further reconstruct the embedded message. E.g., the division patterns shown in Fig. 1 may be not secure, especially when the content-dependent operations are prefixed. For the proposed work, a key is used to generate a random binary matrix for each patch. It ensures that, one cannot fully distinguish between marked pixels and nonmarked pixels without required parameters. It means that, it is almost impossible for an unauthorized receiver to reconstruct the embedded message within a short time, which has shown the security of proposed work. It can be said that, a goal of generating random binary matrices is to secure the work.

VI. CONCLUSION AND FUTURE WORK
Improving the utilization of smooth content has been a core topic in RDH. It motivates the authors in this paper to introduce a novel patch-level selection and breadth-first prediction strategy to further take advantages of smooth pixels as much as possible. The proposed system divides a cover image into a number of disjoint patches, which are further classified as smooth patches and complex patches. The pixels in smooth patches have strong spatial correlations and therefore are used for data embedding since the PEs could be kept low. The prediction value of each cover pixel is determined from a pixel in a specific pixel-set that has the minimum Manhattan distance, which relies on the reasonable assumption that the values of adjacent smooth pixels are rather close to each other. Unlike many traditional works, the proposed work enables a data hider to preferentially and simultaneously use many adjacent smooth pixels to carry a payload, which can benefit payload-distortion performance a lot. Experiments have shown that, the proposed work provides superior performance at low embedding rates. A future work is to improve the performance on complex images at high embedding rates.