Quantized Gaussian JPEG Steganography and Pool Steganalysis

Currently, algorithms for compressed image steganography mainly embed hidden message by minimizing the resulting distortion or statistical detectability. However, as a result of purely heuristic distortion definitions and numerically solvable equations in statistical models, there are no closed-form solutions for JPEG steganography. The absence of closed-form expression to model JPEG steganography is the main limitation on understanding single image and pool steganalysis behavior. In this study, building upon our previously proposed framework for spatial steganography, we develop a statistical framework for JPEG steganography in which the cover and the hidden message are modeled by multivariate Gaussian distribution. Based on this statistical model, we propose a novel quantized Gaussian JPEG steganography that is able to accomplish embedding using any costs defined in spatial or discrete cosine transform (DCT) domain as well as residual variances. We conduct our experiments using a popular database with different compression qualities to determine the effectiveness of the proposed model. The experimental results show that the proposed model improves the security of previous works and outperforms the state-of-the-art JPEG steganography algorithms. Furthermore, we extend the closed-form expression of single image steganalysis error to pool steganalysis for an omniscience optimal detector. We employ the derived expression to approximate the empirical results of pool steganalysis based on the empirical detection error of single image steganalysis. The practical advantage of the approximation is that even though it is derived based on the adopted statistical model, it is accurate regardless of payload, embedding domain, embedding method, and steganalysis feature as long as the pooling strategy is optimal. In addition to approximation of the error, we employ the proposed model to make predictions about the variance behavior of pool steganalysis error. We mathematically show that the variance increases as the pool size increases in small payloads. The same behavior is observed in experimental results which re-validates our analytical model. We conclude that although pooling improves detector’s performance, it makes the detector less stable in low payloads and high pool sizes.


I. INTRODUCTION
Steganography is the art of embedding hidden message in a cover medium without getting detected by the warden [1]. The most common medium for steganography is digital image data due to having high redundancy which results in high capacity for embedding. In early works in digital image steganography both in spatial and compressed domains, The associate editor coordinating the review of this manuscript and approving it for publication was Yassine Maleh . non-adaptive methods were proposed and they treated all the pixels or DCT coefficients in the same manner. Examples of such methods in spatial domain are [2], [3] and in JPEG steganography are Jsteg [4], F5 [5], and nsF5 [6]. As a result of not taking pixel to pixel or coefficient to coefficient dependencies into consideration, all non-adaptive approaches have low security [6], [7]. Thus, for attaining a higher security performance, adaptive methods have been developed.
Content adaptive steganography methods embed more in textured regions rather than smooth regions of an image to VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ minimize the produced distortion [8], [9]. Distortion minimization embedding is formulated to source coding with a fidelity criterion [10], and it is solved for a general case by syndrome trellis codes [11], [12]. This coding scheme employs a distortion measure or embedding cost for each cover element and executes embedding accordingly, e.g. higher embedding rate in low cost elements. Many methods are available for computing the embedding costs for image steganography for both spatial and JPEG domains. HILL [13] and SUNIWARD [14] are well-known examples for spatial domain steganography. CHAT-GAN [15] is another example that uses generative adversarial networks to embed hidden messages in spatial domain. For JPEG domain steganography, UED [16], UERD [17], IUERD [18], JUNIWARD [14], and GUED [1] are among most frequently used approaches. Even though some of these methods such as HILL, SUNIWARD, and JUNIWARD have the highest security, they are all based on heuristically defined distortions and therefore, there is no theoretical/statistical measure for their performances. This issue has been addressed in another type of image steganography, called statistical or model based. Statistical or model based image steganography methods mathematically model the cover image and perform embedding while minimizing a distance measure between the cover and the stego image. Examples of such approaches in spatial domain are HUGO [19], MG [20], MVGG [21], and MiPOD [22]. Denemark and Fridrich introduced the only statistical based method in JPEG domain called J-MiPOD based on MiPOD statistical model and also proposed algorithms for steganography with pre-cover for both spatial and JPEG domains (SI-MiPOD and SI-J-MiPOD) [23]. In all of the mentioned statistical based approaches, the optimization problem, defined as minimizing distance between cover and stego images while embedding, results in numerically solvable equations. Thus, there are no closed-form expressions for the embedding probabilities and detection error. Having such an expression, specially for the detection error, would be beneficial in understanding and estimating image steganography behavior as well as batch steganography and pool steganalysis.
In our previous work, we developed a statistical framework for spatial steganography which resulted in closed-form expressions for embedding probabilities and detection error while achieving state-of-the-art empirical performance [24]. In this work, we extend our model to JPEG domain and propose a statistical framework for JPEG steganography which results in closed-form expressions for detection error and embedding probabilities. Our proposed framework is able to employ any embedding costs defined in the spatial or JPEG domain, and also any residual variance estimator for JPEG steganography. In addition, it can be utilized to model single image and pool steganalysis.
Pool steganalysis is the extension of steganalysis problem in which the warden knows multiple objects share the same source and therefore, pools evidence from all of the objects to achieve a higher detection performance. This problem was introduced by Ker and it is the dual of batch steganography problem in which the steganographer embeds a payload in multiple cover objects [25]. Both problems are major research problems in steganography [26]. Previous studies have proposed methods for ranking multiple sources according to their ''guiltiness'' [27], [28]. However, a more general question still remains; which source is guilty? This question was studied under the assumption of an omniscience detector and it was shown that for finding the guilty source, the average pooling strategy performance is close to optimal for a very wide range of hidden message distribution strategies [29]. In another study, the problem of sequential steganalysis is discussed and a method is proposed for finding the first stego message in a sequence of objects [30]. Cogranne et al. formulated the problem in spatial domain and demonstrated that knowledge of the steganographer's strategy increases the performance of pool steganalysis [31]. In contrast to these studies, Zakaria et al. assumed that steganalyzer does not know the payload spreading strategy and proposed a pooling methods that performs close to an omniscience steganalyzer for all the state-of-the-art payload spreading strategies [32]. In all the mentioned works, there is no statistical analysis for modeling pool steganalysis of steganography with state-ofthe-art payload spreading strategies in real images.
In this study, we derive the detection error for single image steganalysis mathematically based on the adopted statistical model. We show that the detection error formula is valid for embedding in spatial domain or any linear transformation domain. This allows us to derive a unified closed-form formulation for the optimal pool steganalysis strategy and it's error for steganography in any domain. Here, we assume steganalyzer is omniscience and payload is spread among all of the images uniformly or using the state-of-the-art batch steganography method [24]. To show the relevance of the results, we employ the derived closed-form expression for pool steganalysis error to approximate the empirical detection error of JPEG steganography and it's variance for various pool sizes. We demonstrate that our proposed approximation is precise considering the error of empirical steganalysis. As a result, one can approximate the pool steganalysis results instead of running time consuming and cumbersome experiments.
In this work, our contribution is threefold: 1) We develop a statistical model for JPEG cover and stego images. Based on that, we extend our previous embedding model for spatial steganography to JPEG steganography and derive the closed-form detection error for such an embedder against an optimal hypothesis detector [24]. The embedding model is generalized in the sense that it is able to utilize any embedding cost or variance estimator defined in spatial domain or JPEG domain, and it results in superior security comparing to the state-of-the-art approaches. 2) We extend the closed-form expression of single image steganalysis detection error to pool steganalysis for an omniscience optimal warden. We employ the derived expression to approximate empirical results of pool steganalysis computed by an ensemble classifier steganalyzer based on the empirical detection error of single image steganalysis [33]. Although the approximation is derived based on our proposed embedding model, it is precise for all the payloads, embedding domains, embedding methods, and steganalysis features as long as the pooling strategy is optimal. It also holds for single image steganography and batch steganography using the state-of-the-art batching strategy, i.e. AdaBIM [24]. 3) We approximate the variance of such a pool steganalyzer and show that it increases as the pool size increases in small payloads employing the proposed detection error approximation. Small payloads are more interesting as they are more applicable than high payloads which are easily detectable. Therefore, we conclude that although pooling makes the detector more reliable as it decreases detection error, it makes detector less reliable in the sense that it increases the variance. In other words, pooling makes the steganalyzer less stable. We observed the exact same behavior in empirical results as well which confirms the correctness of the approximation.
This paper is organized as follows. The statistical models for cover and stego message are presented in Sec. II. Based on the proposed Gaussian model, a framework for quantized Gaussian JPEG steganography is introduced in Sec. III. The results are then extended to pool steganalysis in Sec. IV. In Sec. V, we provide the empirical results. Sec. VI summarizes and concludes this work.

II. STATISTICAL MODELS
In this section, we describe the statistical model for cover image in spatial domain and subsequently, we derive probability distribution of DCT coefficients of cover. Also, we derive the statistical model of the stego image in DCT domain by embedding a Gaussian message in each coefficient.

A. COVER MODEL
We show an 8-bit gray-scale image in spatial domain by P = [P 1 , . . . , P n ], where n is the number of blocks, and P b is the b th block of 8 × 8 pixels, P b = [p bij ] 8×8 . Note that total number of pixels shown by n is n = n × 64. All the pixels, p bij , are quatized to {0, 1, . . . , 255}. Lets assumep bij is an unbiased estimation of the pixel based on its neighbors. We model the estimation errors, defined as e bij = p bij −p bij , as independent Gaussian random variables, N (0, ω 2 bij ). This model is based on the assumption of fine quantization which is given by ω bij 1, since the quantization step is 1. For a detailed explanation of this model, refer to [22]. Suppose the scaled DCT coefficients of the cover image are similarly 8×8 and each coefficient, f bkl , is given by w(k, l, i, j)p bij ∀k, l ∈ {0, 1, . . . , 7} (1) where q kl is the kl th element of JPEG quantization matrix and w(k, l, i, j) is defined as where c(x) is given by By using Eq. (1) and the estimation in spatial domain,p bij , we can estimate the scaled DCT coefficients as well. The estimation,f bij , iŝ w(k, l, i, j)p bij (4) and the residual of the estimation, x bkl = f bkl −f bkl , is which is a linear combination of zero mean Gaussian random variables. Therefore, the distribution of scaled DCT coefficient residual is where σ bkl , based on all e bij being independent, is given by where ω 2 bij is the residual variance of ij th pixel of the b th block in the raw image. The conclusion of DCT residuals having Gaussian distribution, shown in Eq. (6), is drawn based on the fact that DCT is a linear transformation. Thus, the conclusion is valid for any linear transformation of image. Note that Eq. (6) is the probability distribution of scaled DCT coefficient residual or estimation error not the coefficient's distribution. It is well known in the literature that the probability distribution of the scaled DCT coefficient of an image is Laplacian [34], [35]. The Gaussian distribution of the residuals or in other words noise in the JPEG domain can alternatively be derived based on the previous studies on DCT coefficients of JPEG images. By analysing JPEG errors, it has been shown that the summation of all the quantization, rounding, and truncation errors has a Gaussian distribution [36]. In a later work on uncovering JPEG compression history, Li et al. have shown that distribution of the error in JPEG domain depends on the number of compression cycles and quantization matrix elements and it has a Gaussian distribution or a quantized-Gaussian distribution [37]. VOLUME 10, 2022 Now, we prove that given the independence of the estimation errors in spatial domain, the errors are independent in DCT domain as well. Based on Eq. (5), and E[e bij e bi j ] = 0 for two distinct pixels, the covariance of the errors in the same block is We can assume that ω 2 bij is constant in each block, which is reasonable as in real image ω 2 bij is highly correlated with the energy of the b th block and it has small variation in each block of 8 × 8 pixels. At the end of this paragraph, we show that this assumption results in a diagonal covariance matrix which elements are shown in Eq. (8). But in general, the covariance matrix is not necessarily diagonal. It can be diagonalized/whitened using eigen-decomposition because it is a real symmetric matrix. Suppose the eigen-decomposition of error covariance matrix of b th block is U Then, the hidden message can be computed using the method which is explained in Sec. III based on the whitened error covariance, b . Then the computed message is multiplied by U b , quantized and embedded into DCT coefficients. This method is explained thoroughly in Sec. V-C where we show that it results in slightly better performance only in high payloads comparing to skipping the whitening step. It also drastically increases the time complexity which is discussed in Sec. V-D. Note that dependant hidden message elements cannot be embedded in dependent cover elements by syndrome trellis codes in practice because of violating the additive distortion assumption of such coding method, although there has been some studies on using STC for non-additive distortion coding for steganography in special cases such as [38]. As a result, for the rest of this study, we assume that ω 2 bij is constant in each block, unless mentioned otherwise. Therefore, we can move the ω 2 bij out of the summation in Eq. (8). Given that 7 i,j=0 w(k, l, i, j)w(k , l , i, j) ≈ 0 unless k = k and l = l , the covariance of the errors are Thus all x bkl are independent zero mean Gaussian random variables with variances shown in Eq. (7). Note that the residual variances, ω 2 bij , can be calculated using any variance estimator such as the ones proposed in [21], [22]. In the following two subsections, we discuss the cases where the cost of embedding in spatial domain and DCT domain is given.

1) EMBEDDING COST IN SPATIAL DOMAIN
For the proposed Gaussian embedding model, any embedding cost in spatial domain, e.g. costs defined in [13], [14], can also be used as a proxy to calculate ω 2 bij . As we have shown in our previous work, ω 2 bij ≈ 1/η 2 bij where η bij is the cost of changing the ij th pixel of the b th block by 1 in the raw image.
Therefore, based on Eq. (7), the DCT residual variances are derived as follows in case of having spatial domain embedding costs, i.e. η bij , instead of residual variances, ω 2 bij .
In case of having the cost of embedding in each DCT coefficient as η bij , which is the cost of changing the scaled ij th DCT coefficient of the b th block, the DCT residual variance is given by based on our previous work where we showed the reciprocal of the squared embedding cost can be used as a proxy for calculating residual variance [24]. In Eq. (11), η bij can be computed by any of the methods proposed in [16], [17]. As a result of Equations (7), (10), and (11), the embedding model is universal and it works with embedding costs or residual variances calculated in the spatial domain, or the embedding costs calculated in the DCT domain. This statistical cover model is violated in practice in smooth or saturated blocks because of assuming unbounded DCT coefficients and σ bkl 1. However, our proposed method will avoid embedding in those regions anyway which is covered thoroughly in Sec. III.

We show hidden message by
In contrast to all the previous works in which hidden message elements are modeled as discrete random variables, we model them, m bij , as Gaussian random variables with variances β bij distributed according to The scaled DCT coefficients of the stego image is the summation of the cover coefficients with hidden message elements, i.e. S = F + M . Hence, the kl th scaled DCT coefficient residual of the b th block is y bkl = x bkl + m bkl , and based on Eq. (6) and Eq. (12), its probability distribution is derived as in which we assume unbounded quantization levels and

III. METHODOLOGY
In this section, we describe our proposed model as illustrated in Fig. 1. We first discuss the problem of JPEG steganography in a single image which is formulated into the following constrained optimization.
where ν is the number of non-zero AC DCT coefficients, P E is the detection error of the steganalyzer derived in the following section, H(p m bij ) is the entropy of a random variable with probability distribution p m bij in natural unit of information (nats) and p is the relative payload in nats per non-zero AC coefficients.
Assume the worst case scenario in which the steganalyzer is omniscience and knows all the cover and hidden message probability distributions, i.e. p x bij and p m bij . To compute the detection error of this steganalyzer, i.e. P E (B), suppose that it employs a likelihood ratio test (LRT) to decide whether the received image is a cover or it conveys a hidden message, shown by null hypothesis (H 0 ) and alternative hypothesis (H 1 ) respectively.
Suppose R = [R 1 , . . . , R n ] are the residuals of received image's DCT coefficients where R b is the b th block of 8 × 8 residuals, i.e. R b = [r bij ] 8×8 , and they are statistically independent. Therefore, the likelihood ratio for all the DCT coefficients can be simplified as n b=1 7 i,j=0 bij where bij is the likelihood ratio for the ij th residual of b th block. Given Eq. (6) and Eq. (13) Thus the natural logarithm of bij is given by In Eq. (16), r bij is Gaussian random variable. Thus, ln bij is a constant term plus a Gamma distributed term with shape (k bij ) and scale (θ bij ) parameters, i.e. (k bij , θ bij ). In both cases of H 0 and H 1 , the shape parameter is equal to 0.5, i.e. k bij = 0.5. However, the scale parameter, θ bij , depends on the variance of r bij , and it is given by Based on our previous work [24], for large enough number of DCT coefficients (or pixels), the following approximation for probability distribution of n b=1 7 i,j=0 ln bij holds.
Eq. (18) shows that embedding hidden message in scaled DCT coefficients changes variance of detectors output, however the mean stays the same. This behaviour is similar to the one explained by shift hypothesis for embedding in spatial domain [25]. A steganalyzer utilizing a LRT compares the likelihood ratio with a decision threshold to figure out if there is hidden message in an image or not. The natural logarithm of the LRT is given by It has been previously shown that for the given LRT, using minimax, one of the most common optimal decision criteria, the decision threshold equal to 0 results in the lowest expected risk over all possible priors [24]. As a result, based on Eq. (18) and Eq. (19), the detection error for the optimal detector is given byP where φ is the cumulative density function of standard normal distribution.P E shown in Eq. (21) is monotonically decreasing as α increases. Thus, to achieve a more secure steganography method, we can minimize α instead of maximizing the error of the steganalyzer. The same conclusion can be made employing other common optimal decision rules such VOLUME 10, 2022 as Bayes and Neyman-Pearson. Consequently, the problem shown in Eq. (14) can be simplified as The solution of Eq. (22) using Lagrangian multiplier method is given by where λ is the Lagrangian multiplier determined by the payload constraint in Eq. (22) as a function of the relative payload, p, and it is derived as follows where n = n × 64 is the total number of pixels or DCT coefficients. The closed-form expression for detection error of steganalysis shown in Eq. (26) is derived based on the Gaussian distribution of cover elements residuals and hidden message elements. In addition, the Gaussian distribution is drawn from the fact that DCT is a linear transformation. As a result, the closed-form expression for detection error of steganalysis shown in Eq. (26) is valid for embedding using the proposed adopted model in raw image, or any linear transformation of image such as DCT. Based on this generalized error formulation, in the next section, we develop an statistical model for pool steganalysis which is valid for steganography in raw image data or any linear transformation of image data.
Eq. (23) shows that the message variance is proportional to the DCT coefficients residual variance. As a result, we embed more nats by adding a Gaussian with higher variance in noisy coefficients comparing to coefficients with small variance. Now that the problem is solved in the continuous domain, we translate the problem, shown in Eq. (22), to discrete domain by quantizing hidden message to Q = {−q, . . . , −1, 0, 1, . . . , +q}, as follows  Eq. (28) is a truncated Gaussian random variable indicating the probability of changing the ij th coefficient of b th block by k. We utilize the Newton-Raphson method to find the Lagrangian multiplier, λ(p), which determines all hidden message variances, i.e. β bij , and distributions, i.e. p m bij . To be able to take advantage of practical embedding methods such as syndrome-trellis codes (STCs) [12] for real world implementation of the proposed embedding model, the cost of changing each coefficient is required. We show cost of changing the ij th coefficient of b th block by k by ρ bij (k). Assuming symmetric costs, i.e. ρ bij (k) = ρ bij (−k), there are 64×n×q variables and equations having Gibbs form given by ∀b ∈ {1, . . . , n}, ∀i, j ∈ {0, . . . , 7}, ∀k ∈ {1, . . . , q}.
Computing these costs, allows us to utilize STCs for the actual embedding when q = 1 and multi-layered STCs for q > 1 [12]. However in this manuscript, similar to conceptual studies in steganography, we disregard the coding process and change the coefficients according to the change rates shown in Eq. (28). A summary of our proposed method is shown in Algorithm 1.

IV. POOL STEGANALYSIS
In previous section, we have derived the closed-form solution for JPEG steganography against optimal single image staganalysis and its error. In this section, we discuss the case where the steganalyzer also knows the source of a pool of images. Then, the detection error is derived for an arbitrary sized pool of images, in which the images are all stego or cover. The notation is the same as before except that we show the image number using a superscript in parenthesis, e.g. λ (i) is the Lagrangian multiplier for the i th image. In addition, we show the detection error for pool size l byP E (l) when it is theoretically estimated and by P E (l) when it is empirically computed. The following theorem explains how to derivê P E (l) and what would be its error's behavior.

Theorem 1: Statistical Model for Pool Steganalysis Detector's Error and Variance
Suppose that l images are sent from the same source and in the case of being stego images, they carry the same amount of hidden message or embedding has been done using the stateof-the-art batch steganography method [24]. An omniscience optimal detector should examine the images together and decide based on the summation of all the images detection statistics. The error of such optimal detector can be approximated byP The standard deviation ofP E (l), i.e.σ l , as a function of the standard deviation ofP E (1), i.e.σ 1 , is given bŷ which is an increasing function of the pool size (l) until l = l 0 and a decreasing function afterwards, where l 0 is written as See Appendix A for the proof. Given that Theorem 1 is true for steganography in raw image data or any linear transformation of image data, its true for JPEG steganography as well. The beauty of this approximation is that utilizing it, one can run only one experiment employing an ensemble classifier steganalyzer [33] to findP E (1), and plug the result in Eq. (30) to findP E (l) for any l. In Sec. V-E, we show that although this approximation is based on the Gaussian embedding model and optimal pool steganalysis, it works for any embedding method, as long as the same steganalyzer is employed for all the pool sizes using the explained optimal pooling strategy.
Another conclusion that can be drawn form Theorem 1 is that although pool steganalysis gives better results comparing to single image steganalysis, it suffers from an increasing variance as the pool size increases for some payloads. To the best of authors knowledge this phenomenon has never been discussed nor been formulized in the literature. The variance increases until pool size reaches l 0 , shown in Eq. 32 and Fig. 6, and it decreases afterwards. In Sec. V-E, we observe that this statistical model and its results are aligned with the empirical results.

V. EXPERIMENTS AND DISCUSSION
Throughout this paper, we use the BOSSbase 1.01 database containing 10k gray-scale 512 × 512 pixels images [39]. All the images are compressed to JPEG with two quality factors, 75 and 95. Performance evaluations are done using an ensemble of classifiers with 10-fold cross validation trained on steganalysis features extracted from 5k images chosen randomly as training/validation set and tested on features extracted from the rest 5k images [33]. We utilize two different state-of-the-art JPEG steganalysis feature vectors DCTR [40], and GFR [41] with 8000 and 17000 elements respectively. Performances are reported by the classifier average detection error defined as the mean of false alarm and missed detection rates in payloads ranging from 0.05 to 1, i.e. To find out if a performance improvement is statistically significant, we employ the significance level of 0.05. For all the performance evaluations in this article, sample sizes are 10, and the standard deviations of samples are in the range of 0.001 to 0.005. In the worst case scenario of comparing two performances both having standard deviation of 0.005, if the difference between them is greater than 0.0047, it is statistically significant.
For all the experiments, we employ six different JPEG steganography methods. The first three are the three state-ofthe-art JPEG steganography methods, i.e. UERD [17], JUNI-WARD [14], and GUED [1], with their optimal parameters for achieving best security. The next two methods are based on the mentioned methods, UERD and JUNIWARD, but utilizing our proposed quantized Gaussian embedding, we show them by G-UERD and G-JUNIWARD respectively. In addition to these five methods, we also experiment G-JHILL which employs the proposed quantized Gaussian embedding model using spatial domain embedding cost computed by HILL algorithm as shown in Sec II-A1. HILL algorithm is used with a 3 × 3 Ker-Bohme high-pass filter and a 3 × 3 and a 15 × 15 averaging low-pass filters [13].

A. DETERMINING MAXIMUM DCT COEFFICIENT CHANGE (Q)
The parameter q of the proposed quantized Gaussian embedding model summarized in Algorithm 1 controls the maximum amount that DCT coefficients will be changed during embedding. In other words, our embedding model is a (2q + 1)-ary embedding. To determine optimal q value for achieving highest security, we evaluate all the JPEG steganography methods with different q values, i.e. q ∈ {1, 2, 3}. The results are presented in Table 1. It can be seen that for our proposed Gaussian embedding model, reported in the top three sections of the table, i.e. G-UERD G-JUNI G-JHILL, higher q values results in higher performance, however the improvement is not significant for lower payloads. The security is significantly improved only for JPEG quality factor of 95 and in higher payloads (p ≥ 0.5) which are less   important comparing to lower payloads due to high detection probability. Note that using higher q values results in a more complex encoding algorithm [12]. As a result, for the rest of the experiments, we only consider q = 1 which has similar performance comparing to q = 2 and q = 3 for most of the payloads and requires a less complex encoder.
We have also shown the results of different (2q + 1)-ary embedding scenarios for non-Gaussian embedding algorithms, UERD and JUNIWARD, in the bottom two sections of the Table 1. It can be concluded that higher q values results in lower security for almost all the payloads.
These observations suggest that the proposed quantized Gaussian embedding model is more accurate comparing to the widely used Gibbs form [11] for calculating embedding probabilities.

B. COMPARISON OF QUANTIZED GAUSSIAN EMBEDDING WITH PRIOR ARTS
In this section, we compare the security of the proposed steganography method with the state-of-the-art JPEG steganography methods against steganalysis using DCTR and GFR features. We conclude that using the proposed embedding model results in performance improvement for all the algorithms in most of the payloads. We also show that the proposed G-JHILL method outperforms all the previously developed methods in all the payloads.
We compare the detection error of UERD, G-UERD, JUNIWARD, G-JUNIWARD, G-JHILL, and GUED using GFR features in Table 2. For UERD algorithm, the proposed Gaussian version (G-UERD) outperforms UERD significantly in payloads less than 0.3 bpnzac for images with JPEG quality 75 and its detection probabilities at these payloads are similar to the one for JUNIWARD which is a more time consuming algorithm. For JPEG quality of 95, G-UERD has statistically similar performance comparing to UERD. For JUNIWARD, the proposed Gaussian version (G-JUNIWARD) performs better than or similar to the original JUNIWARD algorithm, and the improvement is statistically significant for JPEG quality of 95 and payload greater than 0.3 bpnzac. The GUED performs better than JUNIWARD, but its performance is less than the proposed Gaussian version (G-JUNIWARD) in both JPEG qualities. The proposed G-JHILL outperforms all the mentioned algorithms in all the payloads and JPEG quality factors (or performs similarly to the most secure one). For images with JPEG quality factor of 75, the gap between the performance of G-JHILL and the most secure algorithm amongst the other methods (G-JUNIWARD for p ≤ 0.3 and UERD for p > 0.3) In addition to running experiments using GFR features, we utilize DCTR features as well and the results are reported in Table 3. Similar behaviors as the ones seen using GFR can be seen there, however the performance gaps are greater comparing to Table 2.
We also provide examples of stego images for the six embedding algorithms as shown in Fig. 2. These images are generated using payload equals 0.3 bpnzac and JPEG quality factor equals 75.
We believe that the proposed quantized Gaussian embedding model improves performance due to the fact that it embeds more bits in low cost or high variance DCT coefficients and less bits in high cost or low variance ones comparing to the Gibbs measure used by all the spatial and JPEG steganography methods.

C. WHITENING
In this section, we conduct experiments on G-JHILL algorithm to check the empirical results of applying whitening explained in Sec. II-A. For applying whitening to all the blocks, there are two extra steps that are added to the algorithm explained in Algorithm 1. First, instead of the residual variances computed in ''if'' clause in lines 1 through 7, we use variances of the whitened residuals using the eigen-decomposition. In other words, in each block, we first decompose each block residual covariance matrix by eigen-decomposition to U b b U T b , where U b is the orthogonal 64 × 64 matrix of eigenvectors and b is the diagonal matrix of eigenvalues. Then the diagonal elements of b are used instead of residual variances, i.e. σ 2 bij . The second extra step is that in each Newton-Raphson iteration for solving Eq. (27) after computing B * b = [β * bij ] 8×8 , the hidden message elements are transformed back by U b ·vec(B * b ) where vec is vectorization function. This process increases the time complexity of the embedding method, but it increases the performance. In Table 4, the performances of G-JHILL algorithm is reported for both cases of using and not using whitening. It can be seen that there is no statistically significant change in the detection error for payloads up to 0.5 bpnzac. However, in 0.75 an 1 bpnzac, the G-JHILL version that employs whitening performs significantly better. In the next section, we discuss the amount of increase in computation time for using whitening.

D. COMPUTATIONAL TIME
In this section, we compare the computation time needed for all of the steganography algorithms studied in this paper. The computation times are reported in seconds per image in Table 5 for two JPEG quality factors, i.e. 75 and 95, and two payloads, i.e. 0.1 and 0.2 bpnzac. It is observed that the proposed Gaussian embedding versions of UERD and JUNIWARD are 2 to 5 times slower than the original algorithms, which is still reasonable given their higher performance. The computation time of GUED is slightly less than the computation time of JUNIWARD. G-JHILL (Wh.) is the G-JHILL algorithm with whitening which is 2 to 3 times slower than G-JHILL. It can be seen that higher payload increases the embedding time but the JPEG quality factor does not affect the computation time significantly.

E. POOL STEGANALYSIS DETECTION ERROR
In this section, we conduct experiments regarding Sec. IV and Theorem 1, where we have shown that instead of running cumbersome pool steganalysis experiments, one can estimate the detection error for pool sizes greater than 1 based on Eq. 30 and empirically computed detection error for pool size equal to 1. We use various pool sizes, i.e. l ∈ {1, 3, . . . , 99}, for both empirical and estimated results. The pooling strategy here is using the summation of detection statistics of all images in a pool. This strategy is shown to be optimal in Theorem 1 in case of embedding the same payload in all images or using the state-of-the-art batch steganographer [24].
Results for using G-UERD embedding algorithm are shown in Fig. 3. In each plot, the pink lines are the empirical  To show that the proposed estimation is precise for other quality factors and other steganalysis features as well, we show similar plots for JPEG quality factor of 75, using DCTR feature, and payloads of 0.05, 0.1, 0.2, 0.3, 0.4, 0.5 bpnzac on the right column. Based on Fig. 3, the proposed estimation is valid in all the payloads, JPEG quality factors, and steganalysis features for G-UERD algorithm. To show that it is valid for all embedding methods regardless of them using the proposed Gaussian embedding model or not, we provide similar plot for G-JUNIWARD, G-JHILL, UERD, and JUNIWARD in Fig. 4. In this Figure, we have tried to cover all experimented embedding algorithms, JPEG quality factors with different payloads by the fewest possible number of plots due to space and computation limitations.

F. POOL STEGANALYSIS DETECTION ERROR VARIANCE
In this section, we discuss the behavior of the variance of the pool steganalysis detector. In Sec. IV and Theorem 1, we have shown that although pooling improves detection error, it increase the variance of the detector for some payloads depending on the value of single image steganalysis detection error. In other words, according to Theorem 1, the variance of the detection error is an increasing function of pool size for pool sizes smaller than l 0 , defined in Eq. (32), and it is a decreasing function for greater pool sizes.
To examine this finding, in all the plots in Fig. 3 and Fig. 4, in addition to the empirical and estimated pool steganalysis results shown by pink error bars and solid blue lines with '' * '' markers respectively, we show the estimated standard deviation shown in Eq. Similarly for the fourth plot from the top where l 0 ≈ 6.6, the size of the pink error bar is increasing as l increases until around l = 9 where it starts to decrease. For the second plot from the bottom in the left column of Fig. 3, where l 0 ≈ 2.9, error bars start to shrink after approximately l = 5. And for the last plot where l 0 ≈ 1.5, the error bar size is a decreasing function of l.
As a result of the mentioned behavior which we also mathematically proved in Theorem 1, pool steganalysis suffers from instability, i.e. high variance, for small payloads when single image steganalysis detection error is near 0.5. The instability is a serious disadvantage for pool steganalysis specially in low payloads and high pool sizes as the standard deviation can grow form a small number such as 0.004 in pool size equal to 1 to a huge number such as 0.04 in pool size equal to 99.

VI. CONCLUSION
In this study, we extend our previously proposed statistical framework to JPEG steganography in which we employ a Gaussian model for the cover coefficients and also the hidden VOLUME 10, 2022 message elements. Based on that, we propose a quantized Gaussian embedding model that is able to work with any embedding cost or residual variance computed in spatial or DCT domain. We show that using this embedding model improves the performance of the existing JPEG steganography algorithms in most of the payloads, and also achieves superior performance for all the payloads using cost calculated by HILL. Subsequently, the proposed statistical model allows us to derive the closed-form expression of an optimal omniscience single image steganalyzer error and extended it to pool steganalysis. We use the closed-form expression of pool steganalysis error to accurately approximate the empirical results for pool steganalysis. The main benefit of this approximation is that it is accurate if the pooling method is optimal regardless of payload, steganalysis feature, and embedding method and domain. In addition to approximating the error, we correctly predict the error variance empirical behaviour with respect to pool size, and therefore, reveal a deficiency of pool steganalysis.
As a part of the future work, we plan to investigate side-informed steganography as an immediate extension of this study. In addition, the derived closed-form expressions can be used for calculation of embedding costs and residual variances. Another future path could be using the proposed statistical model for video steganography if frame to frame dependencies are taken into account in computation of the residual variances.

APPENDIX A STATISTICAL MODEL FOR POOL STEGANALYSIS DETECTOR'S ERROR AND VARIANCE
In this section, we discuss the pool steganalysis problem for steganography in raw image or any linear transformation of image. The discussion is based on the Gaussian statistical model which is valid for any linear transformation of image. The model for spatial domain steganography is presented in [42] and for JPEG steganography is shown in Sec. II-A and II-B. Within the adopted statistical model, the detection error of an optimal single image steganalysis is given by where λ(p) is the Lagrangian multiplier for relative payload p. Now, we discuss the case in which the detector knows that l images are sent by the same source. We prove that in such cases, an optimal pool steganalyzer should examine the images together. To show this, we compare the detection error for both cases of inspecting l images together and separately. Inspecting images together results in a similar detection error with summation of Lagrangian multipliers for all of the l images because the logarithm of the likelihood ratio is equal to summation of logarithm of likelihood ratios for l images. Given that the steganographer is embedding in each image separately, the Lagrangian multiplier values are different for every image, i.e. λ (a) (p) is the Lagrangian multiplier for the a th image. The detection error for such a detector is as follows This shows that the optimal detector developed here uses pooling strategy of summing detection statistics of all the images in the pool. If l images, known to have the same source, are inspected separately, the average detection error is given by Eq. (35) is greater or equal than the formula below based on Jensen's inequality and the fact that Eq. (36) is greater than detection error shown in Eq. (34) based on the fact that φ(− √ x) is a decreasing function of x if x > 0. This proves that steganalyzer should inspect all the images from the same source together to achieve a lower detection error. However, this approach will result in a detector with higher variance which is covered later in this section. Now that we have derived the optimal pool steganalysis strategy and its detection error, we show that instead of running time consuming pool steganalysis experiments, one can utilizes Eq. (34) to approximate the results.
Assume that a database of N images (JPEG or raw) is used for embedding a relative payload of p nats (p nats per non zero AC DCT coefficients for JPEG images and p nats per pixel for raw images) using the proposed Gaussian embedding model. The average detection error of an optimal single image steganalyzer for the whole database is given bŷ which can be approximated as shown below by assuming that all λ (a) (p) values are the same and equal to a value λ(p) The mentioned assumption is true for the state-of-the-art batch steganography method which embeds in each image according to its steganographic capacity and uses an image merging sender which results in equal values of λ [24], [31].
The assumption is an approximation for a steganographer that embeds the same payload in all the images but it still results in a precise estimation as shown in Sec. V-E. If the images are received in pools of l images, the detection error of an optimal pool steganalyzer is given bŷ  which can also be approximated as shown below using the same assumption of equal Lagrangian multiplierŝ Therefore, based on Eq. (38) and Eq. (40), an approximation ofP E (l) based on the value ofP E (1) is given bŷ where φ −1 is the inverse function of cumulative standard normal distribution, φ.
In the rest of this section, we discuss the error of this approximation ifP E (1) has an error with standard deviation of σ 1 . We show the standard deviation of error ofP E (l) withσ l . Suppose that all the errors are small, i.e. ∀lσ l 1. Therefore, our approximation shown in Eq. (41) has error with standard deviation, i.e.σ l , given by This can be further simplified using the following Taylor series expansion By plugging in this Taylor series in Eq. (42), our approximation error can be calculated aŝ The variable γ 's behavior with respect to l depends onP E (1) value. In Fig. 5, γ is shown for different l andP E (1), which shows that for a allP E (1), γ = 1 when l = 1 and it has one global maximum. It can be seen that utilizing pool steganalysis results in greater variances for some pool sizes comparing to single image steganalysis for higherP E (1), because γ is greater than 1. To find out exactly when this happens, we derive the derivation of γ with respect to l which is given by Since l takes only natural numbers in practice, γ is a decreasing function of l if its derivation shown in Eq. (46) goes to zero for l ≤ 1. The derivation of γ with respect to pool size, l, is zero if l = l 0 where l 0 is Fig. 6 depicts l 0 vsP E (1). Therefore, for anyP E (1) > 0.1587, our approximation show that the variance,σ l , increases as l increases until l = l 0 . Then, the variance of detection error decreases. The same behaviour is also observed in practice in Sec. V-F for empirical detection error which reassures the precision of the proposed approximation and mathematical model for pool steganalysis.