Steganalysis of Compressed Speech Based on Association Rule Mining

Currently, steganography based on compressed speech streams is gathering more and more attention. Meanwhile, it poses a huge threat to cyber security. As a counter technique, steganalysis can detect whether an illegal secret message is embedded in a compressed speech. To further improve the detection performance of current methods, a novel steganalysis method based on codeword association rule mining (CARM) is proposed in this paper. Firstly, we analyzed the spatiotemporal relationships between codewords in compressed speech. Secondly, the steganography-sensitivity codeword association rule base in training set was built based on the confidence change of codeword association rules before and after steganography. Thirdly, the steganography characteristic index and the corresponding dynamic partition threshold in validation set were computed to determine whether the compressed speech segment contains covert communication or not. Finally, comprehensive experiments were conducted to evaluate the performance of the proposed CARM steganalysis method under various conditions, including different association rule patterns, whether to use dynamic partition threshold, different embedding rates, different speech lengths, etc. The experimental results verify that CARM can achieve better performance than the comparison methods. In addition, the detection accuracy of CARM method can be improved significantly by using dynamic partition threshold at low embedding rates.

is one of the most popular carriers of covert communica- 26 tion [12]. Hence, steganography and steganalysis based on 27 VoIP have gradually the research hotspot. 28 The associate editor coordinating the review of this manuscript and approving it for publication was Mohamed Elhoseny .

61
Early QIM steganography was proposed for digital water-62 marking and information embedding by Chen et al. [34]. ing speeches, such as, images [35], audios [36], videos [37], 71 etc. In view of of the QIM-based speech steganography, 72 Xiao et al. [29] proposed the complementary neighbor ver- network and classification network. However, there is still 106 much room for improvement in detection accuracies, espe-107 cially in the cases of low embedding rates and short speech 108 lengths.

109
In this paper, we propose a steganalysis method for com-110 pressed speech based on codeword association rule mining 111 (CARM). The main contributions of this paper can be sum-112 marized as follows: 113 1) Analyzing the spatiotemporal correlations between VQ 114 codewords based on the confidence of codeword asso-115 ciation rules, we build the steganography-sensitivity 116 codeword association rule base. The base can reflect 117 the codeword association rule changes before and after 118 steganography as much as possible.
119 2) We introduce the steganography characteristic index 120 and the corresponding dynamic partition threshold to 121 implement steganalysis. By using dynamic partition 122 threshold, the detection accuracies are improved sig-123 nificantly at low embedding rates. Besides, the exper-124 imental results verify that the proposed CARM can 125 achieve superior performance compared with other ste-126 ganalysis methods.

128
LPC is one of the most effective speech signal analysis 129 methods in the process of speech encoding. It uses a linear 130 prediction model to achieve a compressed representation of 131 the spectral envelope of the speech signal. LPC synthesis can 132 provide very accurate prediction results of speech parameters, 133 and the LPC synthesis filter is defined as where a 1 , a 2 , . . . , a p are the p-order LPC prediction coef-136 ficients of the speech signal. During the encoding process, 137 the filter coefficients are computed based on minimum mean 138 square error criterion. Then, the LPC synthesis filter is con-139 structed and the residual signal is obtained. After that, the 140 LPC synthesis coefficients are converted to line spectral 141 frequency coefficients. Finally, by split vector quantization, 142 the line spectral frequency coefficients are encoded into three 143 8-bit VQ codewords: VQ 1 , VQ 2 , VQ 3 .

145
The codewords in speech streams are compressed from the 146 original speech signal through a series of encoding processes. 147 Therefore, these codewords reflect speech characteristics to a 148 certain extent. Since the speech characteristics are affected by 149 speaker's emotion, content and other factors, the generation 150 process of compressed speech streams F can be described as where F encoder denotes the speech codec, such as G.723.1 or 153 G.729. Formally, a speech stream segment F which contains 154 n frames can be defined as where w i,j , j ∈ {1, 2, . . . , k} denotes the value of the j-th 157 codeword of the i-th frame.

158
To facilitate the discussion and analysis of codeword asso-159 ciations, we define S l i as the set of codewords from the i-th 160 frame to the (i + l)-th frame: A codeword association pattern can be illustrated as: It indicates the associations between the r codewords of 165 frame t 1 and frame t 2 . If t 1 = t 2 , it indicates the intra-frame 166 association; if t 1 = t 2 , it indicates the inter-frame association.

167
In addition, P t 1 m and Q t 2 n are satisfied as Theoretically, compressed speech streams in network trans-170 mission have infinite length. In this paper, to facilitate the 171 mathematical modeling, we define t as the maximum delay 172 span. That is: Since the target of this paper is to use these associations 175 for steganalysis, t is set as 1 to simplify the following 176 description. In addition, we focus on LPC related codewords, The set of items is called itemset, and a itemset con- expressed as the frequency of X (r) in F, and the confidence 183 of a association rule generated by X (r) is the frequency 184 of X (r) when the left key of the association rule appears.

185
The confidence can be used to characterize the correlation 186 between codeword itemsets. In this paper, we record the  When the length of the codeword association rule is large, 195 it is difficult to alloc the memory. For instance, S ≥ 10 10 196 at the case of r ≥ 4, Therefore, in order to mine the 197 sensitive association rules effectively, this paper analyzes and 198 experiments the case of r ≤ 3. Since the intra-frame patterns 199 1 belong to the same 3-itemset, only one of them needs to be 201 considered while generating the termsets. The combination 202 of codeword itemset values after eliminating duplication is 203 shown in Table 1.

204
After the association rules are determined, the correspond-205 ing codeword itemsets can be counted by scanning each 206 set S l i . The codeword itemset counting algorithm flow is as 207 follows: 208 Algorithm 1: Codeword Itemset Counting n is the ratio of occurrence number of P ∪ Q to that 210 of P, which is calculated as This paper will construct a steganography-sensitive associ-213 ation rule base through the confidence change of R (t 1 , t 2 , r) 214 before and after steganography, and then realize steganogra-215 phy classification.

217
The correlations between the intra-frame and inter-frame 218 codeword valuse will lead to a non-uniform distribution of 219 VOLUME 10, 2022 the confidence of association rules. In this paper, we will 220 use association rule pattern Similarly, the confidence distribution of inter-frame asso- association rules will also change. In this section, we will take 252 R (t 1 , t 2 , 2) t 1 =t 2 : {VQ 1 } ⇒ {VQ 2 } as an example to ana-253 lyze the confidence change before and after steganography. 254 CNV steganography method [29] is adopted to generate the 255 steganographic association rules. The change of confidence 256 distribution is shown as Fig. 3.

257
As can be seen from Fig. 3, the confidence of intra-frame 258 association rules R (t 1 , t 2 , 2) t 1 =t 2 : {VQ 1 } ⇒ {VQ 2 } changes 259 significantly after steganography. In particular, the confi-260 dence of some association rules changes from nonzero in 261 Fig. 3(a) to zero in Fig. 3(b). At the same time, some new 262 association rules emerge in Fig. 3(b). Therefore, steganalysis 263 can be realized by constructing a steganography-sensitive 264 association rule base based on the rules with obvious changes 265 in confidence. Let the confidence of association rule R before 266 and after steganography as conf c (R) and conf s (R), respec-267 tively. We define L to quantify the confidence change degree 268 before and after steganography. L is defined as It can be obtained from Equ. (10) that L is +∞ when 271 one of conf c (R) and conf s (R) is not 0, which means that the 272 correlation of the codeword values corresponding to asso-273 ciation rule R varies infinitely. In this case, the association 274 rules are most affected by steganography. In this paper, 275 we use these association rules whose L is +∞ to build the 276 steganography-sensitive association rule base. and D s R . In order to visualize the steganography-sensitive 282 association rule base, Fig. 4 shows D c R and D s R of 283   is embedded with secret information; otherwise, it is not 297 embedded. However, the steganalysis effect is not satisfied 298 by comparing the size of n c and n s directly, which will be 299 discussed and analyzed in the following contents. Therefore, 300 we introduce steganography characteristic index J to describe 301 the change degree of association rules after steganography. 302 The calculation flow of J is as Algorithm 2. 303 We randomly select 50 cover samples from the validation 304 dataset and the corresponding stego samples with 4 embed-305 ding rates of 20%, 40%, 60%, 80%. Then, we compute their 306 J values. The results are shown in Fig. 5.

307
As shown in Fig. 5, the discrimination of steganogra-308 phy characteristic index between cover samples and stego 309 samples increases as the embedding rate increases. This is 310 VOLUME 10, 2022 R and hash(R t (r)) ∈ D s R then n s ← n s + 1 end if end for end forJ ← n s n c +n s return J because the higher the embedding rate is, the more the num-311 ber of steganography-sensitive association rules is. As men-312 tioned above, we can judge whether secret information is 313 hidden in the speech sample by directly comparing the size 314 of n c and n s . In this case, it is equivalent to using 0.5 as the 315 threshold for judgment. That is, if J ≥ 0.5, the speech sample 316 is embedded; otherwise, it is not embedded. Nevertheless, it is 317 obvious that 0.5 is not the best threshold when the embedding 318 rate is 40% in Fig. 5(b). At the same time, multiple thresholds 319 can be used for judgment in Fig. 5(c) and in Fig. 5(d).

320
In order to cope with these two situations and obtain better where the CNT function represents the number of correctly 339 calssified speech segments. After obtaining J thr , we can 340 achieve the classification of an unknown-type speech seg-341 ment. The classification process can be expressed as: The classification results contains two classes: cover and 344 stego. The proposed CARM steganalysis framework is shown 345 in Fig. 6.  The compressed cover samples are generated by using 359 the speech codec G.723.1 (6.3kbits/s), and three categories 360 of stego samples are generated by using the CNV [29] Since the steganography-sensitive rules generated by differ-371 ent patterns are variable, the detection accuracies of different 372 FIGURE 6. The overall framework of our proposed CARM steganalysis method. Firstly, we mine codeword association rules from the training set and construct a steganography-sensitive association rule base. Secondly, we mine codeword association rules from the validation set and calculate steganography characteristic index and the corresponding dynamic partition threshold. Finally, we use the threshold to classify the unknown-type sample. higher than 90% when the embedding rate is 30% or above.

385
The union pattern can still achieve satisfactory steganalysis 386 performance under low embedding rates.

387
For different category patterns, the three inter-frame pat-388 terns perform better than the two intra-frame patterns when 389 they have the same number of itemsets. The reason might 390 be that when a codeword is modified due to steganogra-391 phy, the number of association rules of inter-frame pat-392 tern is affected more than that of intra-frame pattern. 393 For instance, the number of affected association rules 394 of inter-frame pattern R(t 1 , t 2 , 2) t 1 =t 2 is 3, and that of 395 intra-frame pattern R(t 1 , t 2 , 2) t 1 =t 2 is 2. In other words, inter-396 frame association rules can better reflect the steganography 397 change than intra-frame association rules. In the following 398  experiments, we will use the union pattern for analysis and 399 discussion. The steganography characteristic index J can be used to quan-403 tify the change degree of association rules after steganogra-404 phy. The larger J is, the more secret information is hidden.

405
In this paper, the threshold J thr is used as the classifier 406 for steganalysis. As mentioned in Section III-B, when the 407 dynamic partition threshold is not used, it is equivalent to that 408 J thr is 0.5. Nevertheless, in practice, 0.5 is not the optimal 409 partition threshold. Therefore, we experiment with dynamic  Theoretically, compressed speech streams in network trans-431 mission have infinite length. Therefore, when the number of 432 secret information to be embedded is determined, reducing 433 the embedding rate is the most effective and convenient way 434 to improve security. In order to evaluate the steganalysis 435 performance of the proposed CARM method, we test the 436 detection accuracies with 10 different embedding rates at the 437 speech length of 10 s. The experimental results are shown in 438 Table 3 and Table 4. 439 For the Chinese dataset, as shown in Table 3, the detection 440 accuracy of each method increases as the embedding rate 441 increases. The performance of CARM is better than the other 442 steganalysis methods at all embedding rates, especially at 443 low embedding rates. For the CNV steganography method, 444 CARM achieves an accuracy of 74.74% when the embedding 445 rate is 10%, which is 17.23%, 16.63% and 15.46% higher 446 than that of CEBN, RNN-SM and SFFN, respectively. When 447 the embedding rate is 30%, the detection accuracy of CARM 448 method is 94.33%, and that of other methods is lower than 449 80%. For the NPPR steganography method, all steganalysis 450 methods except CBN can achieve satisfactory accuracies 451 when the embedding rate is 20% or above, and the accuracies 452  against three stegangrophy methods with ten different speech 475 lengths at the embedding rate of 100%. The results are shown 476 in Table 5 and Table 6. 477 As can be seen from Tables 5 and Table 6, the proposed 478 CARM steganalysis method has stable and effective per-479 formance for the three steganography methods in both the 480 Chinese dataset and the English dataset. As the speech length 481 becomes shorter, the accuracy of CARM decreases more 482 slowly than that of the comparison methods.

483
For the Chinese dataset, CARM performs better than the 484 other steganalysis methods at all speech lengths, especially 485 at short speech lengths. When detecting the CNV steganog-486 raphy method, CARM achieves an accuracy of 98.59% at 487 the speech length of 1 s, which is over 10% higher than that 488 of the other methods. For the NPPR steganography method, 489 all the steganalysis methods can achieve satisfactory accura-490 cies. The accuracies of CBN, RNN-SM and SFFN reach more 491 than 98%, and that of CBN is 82.31%. When detecting the 492 SEC steganography method at the speech length of 2 s, the 493 accuracies of CBN, RNN-SM and SFFN are lower than 76%, 494 while that of CARM remains higher than 92%.

495
For the English dataset, we can draw the similar 496 conclusions as the Chinese dataset. CARM can achieve the 497 VOLUME 10, 2022  detection accuracy of more than 92% in all cases except when 498 detecting the SEC steganography at the speech length of 1 s.

499
The proposed CARM steganalysis method performs better 500 than the comparison methods, especially in the case of short 501 speech lengths.  Table 7, RNN-SM consumes the least time 519 due to its simple structure. The proposed CARM method only 520 utilizes the intra-frame and inter-frame codeword associa-521 tion rules to calculate the steganography characteristic index 522 J , and realizes steganography classification by comparing 523 the size of J and the threshold J thr . The average detection 524 time of CARM is 3.47ms, which is shorter than CBN and 525 SFFN. It proves that CARM can meet the real-time detec-526 tion demand of compressed speech streams. The steganog-527 raphy characteristic index and the corresponding dynamic 528 partition threshold are introduced to implement steganalysis. 529 The steganography characteristic index and the correspond-530 ing dynamic partition threshold are introduced to implement 531 steganalysis.