A High-Capacity Reversible Data Hiding Scheme Using Dual-Channel Audio

In recent years, the reversible data hiding (RDH) based on dual stego cover is developing rapidly because of its high capacity and low distortion. For image case, however, two consecutive images of the same image will draw the attention of adversaries during transmission. In this article, we propose a high-capacity RDH scheme using dual-channel audio, by exploiting the natural dual-channel property of the stereo audio. Specifically, we first convert secret message into novenary digits, which could increase the embedding capacity. Then, the magic matrix is used to embed the secret digit into a single-channel audio to generate two single-channel stego-audio. Finally, the two single-channel stego-audio is combined as a convention dual-channel audio. Extensive experiments have demonstrated that our proposed method could significantly boost the stego quality (the SNR is improved by 16% on average), when comparing with the state-of-the-art methods.


I. INTRODUCTION
This Data hiding is a technique of embedding secret messages in digital media, which exploits the redundancy of human perception to achieve covert communication. There are several common schemes of traditional steganography: LSB [1], F5 [2], STC [3]. However, these schemes cause permanent distortion of the original cover which is not allowed in some critical scenarios [4]. To protect these signals, reversible data hiding (RDH) [5] is developed. In addition to covert communication, RDH can make the cover erasable so that the storage space can be used repeatedly. Furthermore, there are some other applications [6]: reversible adversarial example, reversible visual transformation and reversible image processing.
The existing RDH schemes can be roughly divided into the following four categories: difference expansion (DE), histogram shifting (HS), pixel value ordering (PVO) and dual-image.
Difference expansion (DE)-based: difference expansion (DE) [7], [8] was first proposed by Tian et al.. The latter The associate editor coordinating the review of this manuscript and approving it for publication was Jingchang Huang . prediction error expansion (PEE) [9], [10] are based on DE. The core of these methods is to first generate a small prediction-error through the correlation between adjacent pixels, then convert this prediction-error into binary format, and finally embed the secret message bit in the least significant bit of the binary prediction-error in an expansion way. The strength of the prediction mechanism directly affects the performance of such algorithms. Therefore, the incurred distortion of these methods is usually large.
Histogram shifting (HS)-based: histogram shifting (HS) [11], difference histogram shifting (DHS) [12] and prediction error histogram shifting (PEHS) [13]. This type of scheme first generates a histogram based on pixels or pixel differences or prediction errors, and then finds the peaks and zeros in the histogram, the values of both sides of the peak point and zero to move one bit to make room. Finally, the embedding is performed by moving the value between the peak point and the zero point. Because the embedding is by moving the value between peak point and zero point, the embedding capacity is often limited.
Pixel value ordering (PVO)-based: Li et al. [14] presented a blocked-based RDH method by combining PEE and pixel value ordering (PVO). PVO is kept unaltered by modifying VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the maximum in a block increase at most one while the minimum decrease at most one. The distortion thus can be lowered. But it only use the pixel where the prediction error is equal to one, so the embedding capacity is also lower. Hence, Peng et al. [15] proposed an improved PVO (IPVO) method. They use the pixel where the prediction error is equal to zero or one for embedding. It increase the embedding capacity to some extent and many subsequent research [16], [17] are based on them. Low embedded capacity is always an imperfection of these methods. RDH based on dual images: owing to the above traditional methods are faced with the problems of low embedded capacity or high distortion. Chang et al. [18] proposed a RDH method with dual images that implemented large capacity and guaranteed good image quality. They proposed a new information hiding scheme, in which two base-5 secret digits are embedded individually into two stego-images by using a specifically-designed magic matrix. Their scheme can achieve a considerable embedding rate (ER) of about 1.0 bit per pixel (bpp), and it provides good quality of stego-images that have an average PSNR value of 45 dB. Due to one quinary digit can represent secret message of two bits, while novenary can represent secret message of three bits. In 2013, The quinary representation of data bits was replaced in [19] with a novenary representation. This increased the embedding capacity to about 1.5bpp. Meanwhile, the quality of two stego-images remain around 39dB. In 2015, Qin et al. [20] proposed an RDH scheme based on exploiting modification direction (EMD) to generate two stego-images. The algorithm achieved an embedding rate that slightly above 1 bpp. However, the quality of the two stego-images was asymmetric. In addition, there are some works [21], [22] on further increase the embedding capacity.
However, most of the aforementioned methods are proposed for image case, or for the single-channel audio [10], [23]. Only a few works considered the audio for the dual-channel stego cover. We propose a novel RDH scheme for dual-channel audio in this article. First convert the binary secret message into the novenary digits, and then create a two-dimensional magic matrix made up of the numbers 0 to 8. We take a single-channel audio as the cover and use a sample pair to conceal a secret digit to generate a dual-channel stegoaudio. Because a secret digit can represent the secret message of three bits and the secret message is all hidden in the left channel, the embedding rate can be up to 1.5 bit per sample (bps). Compared with existing schemes, our method achieve lower distortion.
The rest of the paper is organized as follows: Section II introduces the related work; Section III is our proposed method, including the embedding algorithm and extraction algorithm; Section IV is the experimental result and analysis; Section V concludes this work.

II. RELATED WORKS
In this section, we introduce one typical for reversible data hiding based on dual images [19].  Chang et al. [19] developed the dual-image reversible data hiding method based on the magic matrix. The magic matrix is generated by M (x 1 , x 2 ) = (x 1 + 3× x 2 ) mod 9, where the parameters, x 1 and x 2 denote two pixels ranged in [0, 255], and M (x 1 , x 2 ) represents the secret data that is shown in Fig. 1. The embedding algorithm is described below.
First, extract n secret bits from the secret data S and transform these extracted bits into a decimal value d, where the initial value of n is set to be 4. If the decimal value d is not equal to 8, then n-1. Then, extract one pixel x 1 from the cover image I and copy it by x 2 = x 1 . Finally, embed the novenary digit d into the pixel pair (x 1 , x 2 ) by Table 1. where M = (x 1 + 3 × x 2 ) mod 9, and x 1 represents the pixel of the first stego image and x 2 denotes the pixel of the second stego-image. The extraction and recovery algorithm is described below.
First, extract two pixels x 1 and x 2 from two stego-images. Next, obtain the novenary digit d by d = (x 1 + 3 × x 2 ) mod 9. Then, if the novenary digit d is higher than 7, it is transformed into four secret bits. Otherwise, it is transformed into three secret bits. Finally, calculate the integer average value of two stego-pixels x 1 and x 2 to derive the original pixel x 1 , where As one can see, when the pixel value is equal to {0, 1, 2, 3, 252, 253, 254, 255}, the pixel may become negative after the embedding operation. Therefore, the embedded capacity will be influenced. In addition, due to the current pixel pair is replaced by one of its four upperleft or bottom-right pixel pairs, in which its corresponding value is equivalent to secret data. In fact, the distortion caused by this method is relatively large. Therefore, there is still room for improvement.

III. RDH OF DUAL-CHANNEL AUDIO
In this section, we propose a new reversible mechanism based on audio using magic matrix. The proposed algorithm takes single-channel audio as the cover and generate a dual-channel audio after embedding. The complete flow chart of the algorithm is shown in Fig. 2. Section III-A introduces the embedding algorithm. Section III-B introduces the extraction and recovery algorithm.

A. EMBEDDING ALGORITHM
In order to guarantee the embedding capacity, we convert the binary secret data into novenary digits. This is because a novenary digit can represent at most log 2 9 binary bits which can significantly increase the embedding capacity.
To ease of calculation, we adjust the range of the samples between [0, 65535] by adding 32768 to each sample. Before embedding, we create a magic matrix M of size 65535 × 65535 by the following: where x 1 , x 2 ∈ [0, 65535] are represent samples of audio, the magic matrix M is shown in Fig. 3. From Fig. 3, we can obviously see that any grid of size 3 × 3 include 0 to 8. We use a pair of sample to conceal a novenary secret digit. When the the corresponding value M (x 1 , x 2 ) of the current sample pair (x 1 , x 2 ) is not equal to the secret digit d, the current sample pair is replaced by one of its eight surrounding sample pairs (t 1 , t 2 ), in which its corresponding value is equivalent to the secret digit d. After the above operation, the left-channel audio X l is obtained by following: When the corresponding value M (x 1 , x 2 ) of the current sample pair (x 1 , x 2 ) is equal to the secret digit d, the left-channel audio is obtained by following: To ensure invertibility, we calculate c by Clearly, c is between 0 and 8. We find a sample pair (h 1 , h 2 ) around the sample pair (x 1 , x 2 ) which the corresponding value M (h 1 , h 2 ) is equal to c. Then, the right-channel audio is obtained by following:

B. EXTRACTION AND RECOVERY ALGORITHM
After receiving the dual-channel stego-audio, we use the left channel to extract the secret digits. First, extract a sample pair (x l 1 , x l 2 ) from the left-channel audio X l . Then, the corresponding value M (x l 1 , x l 2 ) is the secret digit d. Upon restoring the original audio, we extract a sample pair (x r 1 , x r 2 ) from the right-channel audio X r and obtain c = M (x r 1 , x r 2 ). If c is more than equal to d, we find the original sample pair (x 1 , In our method, the modification of one sample of the audio is at most two. For example, suppose the binary secret data is (101101) 2 that the corresponding novenary secret digits is (50) 9 . The digit d = 5 is embedded first, suppose the original audio X is (5, 5). Find M (5, 6) is equal to 5 around the corresponding value of the sample pair (x 1 , x 2 ), then the left-channel stego-audio X l is (5, 6). Calculate the c = (M (5, 5) + 5) mod 9 = 7, because M (4, 4) = 7, so the right-channel stego-audio X r is (4,4). When extracting the secret data, extract the sample pair (5, 6) from left-channel audio X l . The secret digit is d = M (5, 6) = 5. Then extract the sample pair (4, 4) from right-channel audio X r , c = M (4, 4) = 7, because c > d and M (5, 5) = 7 − 5 = 2, original audio X is (5,5). The second digit d = 0 is the same operation.

IV. EXPERIMENTAL RESULTS
In this section, the proposed method is compared with Chang et al. [19] and Xiang et al. [10]. end for 27: end for 28: return X l , X r By comparing Fig. 4 and Fig. 5, it's not difficult to find that the audio waveform of proposed method is similar to original audio. It's because the proposed method modifies a single sample value by one at most. Fig. 6 illustrates this result. Similarly, Fig. 7 shows the difference between the original audio and steganographic audio of Chang's method. It can be found that the maximum modification amplitude of Chang's method to a single sample is reached to four. In contrast to the above, Xiang is a method of prediction error expansion. Due to the size of the prediction error is not     controllable, the distortion will be severe after the expansion operation. It can be seen from Fig. 8 that the distortion caused by Xiang's method is obviously larger. In fact, there is no theoretical upper limit to this distortion.
To better describe the algorithm distortion, we use mathematical expectation(ME) to quantify it. As shown in Table 2, in the case of embedding one secret digit, the probability of Chang's method modifying one sample by 0, 1, 2, 3, 4 are 1 9 , 2 9 , 2 9 , 2 9 , 2 9 , respectively. Obviously the mathematical expectation is equal to 20 9 . However, for our method, the maximum modification of one sample is one, so the modifier of two to four's probability is zero. When the amount of modification is 0, 1, the corresponding probabilities are 1 9 , 8 9 , respectively. Obviously the mathematical expectation is equal to 8 9 . Clearly 8 9 is less than 20 9 , which means that our method has a lower distortion than Chang's method.
We use two common metrics, signal-to-noise-ratio (SNR) and objective difference grade (ODG) to evaluate the distortion of stego-audio. Fig. 9 [10] and Chang [19] of nine kinds of audio (ER = 1bps). method [10] at different embedding rates. It is not difficult to find that the proposed method has a significant improvement in SNR compared with other two methods. This is because the proposed method modifies the cover slightly. In order to evaluate the auditory quality of a stego-audio more objectively, Fig. 10 shows the ODG of proposed method, Chang et al.'s method [19] and Xiang et al.'s method [10] at different embedding rates. It is observed that our method performs better than the other two methods on ODG for different audio. In addition, it can be seen that the quality of left channel audio is similar to right channel audio, which conforms to the characteristics of dual-channel audio. Finally, Table 3 and Table 4 show the SNR and ODG results of nine kinds of audio respectively. Above all, compared with other methods, the proposed approach for a variety of audio has universality, and performance are superior to other methods.

V. CONCLUSION
This article presents a novel audio reversible data hiding scheme. The proposed method can embed three secret bits into one sample pair. Firstly, we convert binary secret data into the novenary digits. Then embedding the digits into the original single-channel audio using magic matrix and generates two single-channel stego-audio can be combined into a dual-channel audio. The human ear cannot distinguish the difference between the original single-channel audio and the dual-channel audio. In addition, compared with other digital media, the dual-channel audio cover is prevalent in cyberspace, which is more natural than two consecutive images. Experimental results show the proposed method has lower distortion and the audio quality is better than the existing algorithms under the same embedding rate.