Digital Image Steganalysis: Current Methodologies and Future Challenges

With the growing use of the internet and social media, data security has become a major issue. Thus, researchers are focusing on data security techniques such as steganography and steganalysis. Steganography is the approach of concealing the existence of secret messages in digital media for secure transmission. Steganalysis techniques aim to detect the existence of concealed messages and extract them. Digital image steganography and steganalysis techniques are classified into the spatial and transform domains. In this paper, we provide a detailed survey of the state-of-the-art works that have been performed in two-dimensional and three-dimensional image steganalysis. We present the most popular datasets and explain some steganographic methods for embedding hidden data. Steganalysis is a very difficult task due to the lack of information about the characteristics of the cover media that can be exploited to detect hidden messages. Therefore, we review studies performed on image steganalysis in the spatial and transform domains using classical machine learning and deep learning approaches. Additionally, we present open challenges and discuss some directions for future research.

of two Greek terms, ''steganos'', which means ''covered'', 23 and ''graphein'', which means ''writing'' [2]. Therefore, 24 steganography is a method that hides specific information 25 inside digital data [3]. The input data are called the cover 26 object, and the output is the stego object which contains the 27 hidden message. 28 There are major distinctions between steganography and 29 other information-embedding methods. The main differ-30 ence between steganography and cryptography, for example, 31 The associate editor coordinating the review of this manuscript and approving it for publication was Li He . is that cryptography hides the message, while steganogra-32 phy hides the presence of the message. Watermarking is 33 used to protect owners' property rights, and its aim is to 34 add additional information to the source (cover data). Cur- 35 rently, many steganography techniques exist in the spatial 36 and transform domains [4], [5], [6]. With the increased 37 development and use of steganography techniques, there 38 is a need to detect hidden messages. Steganalysis is an 39 approach that distinguishes whether a message is hidden by 40 steganography inside a certain media [7]; it is categorized 41 into passive and active types [7]. The hidden data in the 42 spatial domain are embedded directly by adjusting the value 43 of the pixels in the cover image. In contrast, the hidden 44 data in the transform domain are embedded in the coeffi- 45 cients of the cover image. Thus, Passive steganalysis detects 46 the existence of hidden messages, while active steganalysis 47 retrieves the hidden messages, and they are further classified 48 into the spatial and transform domains. Image steganaly-49 sis methods utilize feature-based approaches to extract the 50 discriminative attributes from images, such as local binary 51 patterns (LBP) [8] and the subtractive pixel adjacency model 52 been developed using classical machine learning and deep 109 learning technologies. We mainly focus on algorithms for 110 images in the spatial and transform domains. Developing and 111 adapting steganalysis techniques begins with a good under-112 standing of steganography. Therefore, in Section II, we begin 113 the survey by providing an overview of the steganography 114 algorithms used on 2D and 3D images. In Section III, we out-115 line the most commonly used datasets in steganalysis and 116 categorize them into 2D and 3D datasets. In Section IV, 117 we analyze various steganalysis methods for 2D and 3D 118 images that have been performed using machine learning and 119 deep learning. In Section V, we highlight some open research 120 challenges in the steganalysis field. Finally, we conclude the 121 paper in Section VI. 123 Any steganography technique can be defeated once its ste-124 ganalysis technique is determined [22]. This section provides 125 an introduction to digital image steganography and some 126 steganography schemes in the spatial and transform domains. 127 Steganography is the science of communicating secretly 128 by hiding multimedia data inside an appropriate multimedia 129 carrier, such as an image, text, file, or video [23]. These mul-130 timedia carriers are called cover objects. The first steganog-131 raphy technique was developed in ancient Greece, and the 132 importance of steganography has increased recently due to 133 the increase in data exchange in social media networks. Image 134 steganography techniques have been developed for informa-135 tion concealed exclusively in images. The secret message is 136 hidden in a cover image and sent to a receiver in such a way 137 that only the sender and the receiver are aware of its existence. 138 Both the secret message and cover image constitute the input 139 of the steganographic encoder. The stego image is obtained 140 by embedding the secret message in a cover image. In the 141 end, the stego and cover images are very similar and show 142 no visible changes. The receiver must input a stego image 143 into a steganographic decoder to read the secret message. 144 A stego key is used for encoding and decoding the secret 145 message.

146
There have been many steganographic techniques pre-147 sented in the literature. All these techniques must satisfy at 148 least three requirements to be applied correctly: The max-149 imum amount of information that can be concealed inside 150 the cover image (embedding capacity) must be considered; 151 the visual quality of the stego image must remain unchanged 152 (imperceptibility), and it must be robust against noise [3]. 153 There are a number of methods to hide information inside 154 a 2D image. These embedding methods can be in either the 155 spatial or transform domain [24]. The idea of spatial domain 156 embedding techniques is to use the actual physical location 157 of a pixel of information in the image. These techniques are 158 considered easy to implement because of the simplicity of 159 their algorithms and mathematical analysis. Spatial domain 160 techniques provide high embedding capacity; however, their 161 robustness is weaker than their counterparts [25]. The most 162 commonly used technique is least significant bit (LSB). 163

204
Transform domain embedding is a method for representing 205 the signal in another form. However, the information con-206 tent present in the image is not changed. Wavelet transform 207 (WT) is a mathematical procedure used to transform a spatial 208 domain into a frequency domain [33]. The main idea of using 209 WT in image stenographic techniques is based on separating 210 the high-frequency and low-frequency information on a pixel-211 by-pixel basis. Transformation techniques use JPEG com-212 pression due to the significant increase in available steganal-213 ysis tools. Discrete cosine transform (DCT), discrete wavelet 214 transform (DWT), and discrete Fourier transform (DFT) in 215 the embedding process are the utilized transform steganog-216 raphy techniques. The DCT domain embedding technique 217 is very popular because it is the core of the lossy image 218 compression algorithm known as JPEG, which is the format 219 used for digital cameras [34]. In comparison to DCT, DWT 220 shows high robustness, and the embedded secret image can 221 be extracted with a high visual quality [35].

222
Another classification for steganographic methods is based 223 on the coded formats of images. Image steganography can be 224 applied in different formats for cover images, such as BMP, 225 JPEG and GIF. High color quality JPEG format images are 226 the most mainstream images in modern communications. The 227 most efficient JPEG steganographic techniques are based on 228 Syndrome Trellis Coding (STC) [36] and Uniform Embed-229 ding Distortion (UED) [37], which uses only nonzero DCT 230 coefficients of different magnitudes with equal probability. 231 This schema possibly leads to minimal artifacts for the statis-232 tics of all the DCT coefficients, which makes them naturally 233 content adaptive [38].

234
Early steganography algorithms focused mainly on 2D 235 images, videos, and audios. However, due to the rapid growth 236 in digital media, the use of 3D images as input media in 237 steganography algorithms has been consistently established 238 in the past decade. A 3D image is a geometric setting that 239 requires three coordinate axes to represent the position of 240 a point. Steganography algorithms hide the secret data bits 241 VOLUME 10, 2022 inside the points of a 3D mesh. Fig. 2 shows the differences 242 between 2D and 3D images.

243
The advantage of using 3D images is that they have a data  nificantly, which may lead to malicious attacks. Tsai [43] 275 proposed an adaptive steganography algorithm that achieved 276 a high accuracy of the complexity estimation for each embed-277 ded vertex and the embedding capacity. A steganography 278 method that combines both the spatial and representation 279 domains is presented in [44]. A number of 3D data hiding 280 schemes have been investigated in steganalysis for example 281 adaptive-steganography [45], 3D wavelet-based high capac-282 ity and 3D wavelet-based fragile steganography [46], shifting 283 and truncated steganography [47], distortion-free steganog-284 raphy [48], permutation steganography [49], the maximum 285 expected level tree data-hiding approach [50], and a data 286 hiding approach for polygon meshes [51]. Some researchers 287 have tested their steganalysis techniques using other data 288 hiding techniques, such as watermarking. The Laplacian 289 coordinate-based watermarking method [42], the two variants 290 of robust watermarking [52], frequency-based watermark-291 ing [53] and steganalysis-resistant 3D watermarking [54] are 292 examples of watermarking algorithms. These techniques are 293 detected by some of the steganalysis methods discussed later 294 in this paper.

296
Steganography can be used on various types of media, such 297 as images, videos, audio, etc. Therefore, researchers need 298 to evaluate their steganalysis techniques on large datasets. 299 We classify datasets into two categories, 2D and 3D datasets, 300 and 2D datasets can be further categorized into grayscale and 301 colored images. This section explains the commonly used 302 datasets in the field of steganalysis.  of the challenge (BOWS-2) was presented in 2008 [56]. This   Currently, with advancements in 3D technology and hard-362 ware, it has become easy to obtain 3D models for natural 363 objects. These models have become commonly used in differ-364 ent fields, such as medical imaging, virtual reality, augmented 365 reality, games, movies, and many more areas. 3D steganogra-366 phy has become one of these fields due to its high embedding 367 capacity in 3D meshes, which can be excellent data carriers. 368 Table. 4 provides more details on 3D datasets.

370
Most steganalysis techniques have been formulated as a 371 binary classification problem. Rich model-based steganalysis 372 is one of these methods that achieves better detection accu-373 racy than most other steganalysis algorithms. The method 374 first extracts various handcrafted features from the filtered 375 digital images in the training phase. Then, an ensemble classi-376 fier is trained to distinguish cover images from stego images. 377 The trained classifier is used in the testing phase to determine 378 whether a new input image includes concealed data. In ste-379 ganalysis using classical machine learning, the features are 380 extracted by handcrafted methods and are separated from the 381      These ADC methods showed that the computational cost of 438 the methods massively decreased and will help significantly 439 in steganalysis problems.

440
In the following sections, we present the main contribution 441 of this survey, which is to highlight the works that have been 442 performed using classical machine learning and deep learning 443 techniques in the spatial and transform domains in the image 444 steganalysis field. Different methods for 2D image steganalysis using machine 448 learning techniques have been proposed. These methods use 449 two phases to solve the steganalysis problem. The first phase 450 is a handcrafted feature extraction, which has the capability 451 of modeling the embedding distortions in the image using 452 any steganographic algorithm. The second phase is the clas-453 sification process that uses an integrated classifier for feature 454 training. Different classifiers can be used for image steganaly-455 sis, such as the SVM and ensemble classifiers. The following 456 sections discuss the steganalysis methods in the spatial and 457 transform domains.  [85], spatial rich models (SRM) [86], and WOW [30].

464
A method for detecting steganographic least significant 465 bit matching (LSBM) was presented by Pevny et al. [9].    ize local structure changes, and they seem to be promising. 519 LBP can effectively summarize the local structures of an 520 image by comparing pixels with their neighbors. Inspired 521 by this idea, Gui et al. [93] proposed extracting multiscale 522 rotation invariant LBPs from smooth pixels as unique tex-523 tural features, which are then fed into the linear SVM. The 524 experimental results showed that the method performed well 525 in detecting stego images and had a high accuracy. 526 Liu et al.
[94] presented a blind image steganalysis method 527 based on a nature-inspired feature selection method. The fea-528 tures are extracted for image steganalysis using SPAM. Then, 529 the ideal feature subset is chosen from the original features 530 using the binary bat method (BBM) [95]. The classifiers used 531 to verify the proposed method are KNN, RF, AdaBoost, DCA, 532 NB and SVM. The proposed method was tested using the 533 BOSSBase v1.01 dataset, and the accuracy was 68.08% with 534 the SVM classifier. . These steganography methods 545 leave minimal traces of hidden data, so it is necessary to 546 extract independent features from the image to proceed to the 547 next phase. Therefore, efficient features for the steganalysis 548 process include the Markov transition probabilities of pixels, 549 histogram of residuals, cooccurrence matrices, LBP opera-550 tors, etc. The next phase is the classification process in which 551 integrated classifiers for feature training are used. Classifiers 552 that can be used for image steganalysis include the SVM and 553 ensemble classifiers. 554 Liu et al. [98] presented a new method based on feature 555 mining, the DCT domain and SVM for JPEG image steganal-556 ysis. They extracted features using both the intr-block and 557 intera-block neighboring joint density from the DCT coeffi-558 cient; then, they fed these features into SVM for detection. 559 To predict the hidden amount in JPEG steganography, the 560 authors applied a neural-fuzzy inference system. Their exper-561 imental results showed that their method performed better 562 than the well-known Markov process-based method. 563 Holub and Fridrich proposed a novel feature set for 564 JPEG steganalysis called discrete cosine transform residual 565 (DCTR) [99]. These features are low in complexity and small 566 in dimension, and they are created as histograms of the 567 residuals achieved using 64 DCT bases. The authors used the 568 Fisher Linear Discriminant (FLD) [100] ensemble as a binary 569 classifier. The results show that DCTR achieved competitive 570 detections over many JPEG methods. 571 Song et al. [101] proposed a steganalysis method by apply-572 ing 2D Gabor filters for the feature extraction phase to detect 573 VOLUME 10, 2022 Yang and Ivrissimtzis [115] presented the first 3D steganal-628 ysis features (YANG208) for detecting hidden messages in 629 triangle meshes. For each mesh, they calculated the char-630 acteristic feature vector that captured the geometric infor-631 mation from its Cartesian and Laplacian coordinates. They 632 then applied a calibration technique on the extracted feature 633 vector by computing the difference between the mesh and 634 the reference mesh to extract the discriminative features. The 635 extracted features were then fed into the supervised learning 636 method based on quadratic discriminant analysis (QDA). The 637 method was tested on six well-known steganographic frame-638 works and showed satisfactory accuracy rates. The authors used the PSB dataset that contains 354 3D mesh 680 cover objects. Stego objects were created using three different 681 steganographic methods for information hiding. The experi-682 mental results showed that the FLD ensemble provided the 683 best results for the steganalysis process when the mean-based 684 watermarking steganographic method [52] was used to iden-685 tify the information embedded in 3D objects.

698
These features were fed into an FLD ensemble. The proposed 699 method was tested on 354 cover 3D mesh objects from the 700 PSB dataset. The 3D stego meshes were produced by six 701 3D information hiding techniques. The experiment showed 702 that the proposed method is efficient in implementation and 703 concluded that the edge vector plays a significant role in 704 steganalysis. 705 Zhou et al. [45] proposed a specific steganalysis method 706 using the PCA transform-targeted feature to differenti-707 ate between stego and cover 3D mesh objects. The 708 transformation matrix of a stego mesh is close to the iden-709 tity matrix after a PCA transform, while the transformation 710 matrix of a cover mesh is far from the identity matrix on most 711 occasions. The one-dimensional feature is defined by the 712 norm between the two transformation matrices. This method 713 VOLUME 10, 2022 was tested on the PMS and PMN datasets. The proposed 714 steganalysis method was only efficient for steganographic 715 methods based on the PCA transform.
shape context to distinguish a stego 3D mesh object from section, we present deep learning models that aim to reduce   They used an RBF neural network to update the bin center 769 and width of the model, and eigenvalues were used to find 770 the minimum and maximum values of the RBF. The method 771 showed significant improvement in texture classification. 772 Abazar et al. [127] presented a novel framework to reduce 773 the learning cost by using a divide and conquer technique. 774 The dataset is split into five disjoint clusters by employing 775 k-means. Each cluster is fed into a distinct CNN. The net-776 works are combined leveraging a fast weighting process. The 777 proposed model is able to reduce the size of the training 778 data for each model. The experimental results showed that 779 the proposed framework reduces the time complexity while 780 maintaining the accuracy.

781
The following sections provide a summary of the state-782 of-the-art works that have been performed in 2D image ste-783 ganalysis using deep learning techniques in the spatial and 784 transform domains. As we mentioned previously, in the spatial domain steganog-787 raphy, the payload bits are hidden in a cover image by chang-788 ing the pixel intensity values directly in the spatial domain. 789 Knowing this, researchers have begun to take advantage of 790 applying deep learning for spatial domain steganalysis. The 791 first attempt to use an unsupervised deep learning method for 792 steganalysis was carried out by Tan and Li [128]. The authors 793 used stacked convolutional autoencoders (SCAEs) [129]. The 794 weights of the kernels and filters in the CNN were randomly 795 initialized. The authors believed that a well-trained CNN 796 must perform comparably to the well-known and successful 797 SRM. They used a nine-layer, three-stage CNN based on a 798 blind steganalyzer.

799
Qian et al. [21] were the first to propose using supervised 800 learning with CNNs for steganalysis. Their network consists 801 of three steps, a high-pass filter used as a preprocessing 802 layer, a convolutional layer for feature extraction and then a 803 fully connected layer for classification. The high-pass filter 804 layer is used because the stego has a weaker signal than 805 the content of the image. This model achieved reasonable 806 results compared to traditional models using handcrafted 807 features. Wu et al. [130] proposed a new feature extraction 808 framework that can learn joint features from input images 809 and their corresponding residual images. Their feature fusion 810 process in CNN is completely unsupervised. To minimize 811 data dimensions, the method chooses feature maps from the 812 middle three hidden layers and concatenates them into a 1D 813 vector that is passed into the fully connected layers to obtain 814 the classification result. The aim is to decrease the negative 815 impact of the high-pass filter to guarantee that the network 816 remains convergent.
[67] tested more than 40 CNN architectures 818 and found that the best shape consists of two convolutional 819 layers followed by three fully connected layers. The input 820 image of the CNN is filtered first by high pass, as is done 821 in the work of Qian et al. [21]. The CNN is evaluated on 822 two scenarios, the first of which is a clairvoyant scenario in 823 which it is assumed that the same embedding key is applied on  Ye et al. [132] introduced YeNet, which has a new trun-849 cated linear unit (TLU), in the CNN steganalysis model. 850 The network contains 10 convolutional layers, and 30 high-851 pass kernels were initiated using SRM and used as a prepro-852 cessing layer. In the first convolution layer, the authors used 853 TLU, and in the remainder of the layers, they employed the 854 ReLU activation function. The output from 144-dimensional 855 feature vectors was fed into one fully connected layer, fol-856 lowed by a softmax layer. YeNet achieved lower detection 857 error rates in comparison with the SRM and maxSRMd2 858 steganalyzers. 859 Yedroudj et al. [133] presented a CNN model by 860 incorporating one preprocessing layer consisting of 30 high-861 pass layers from SRM kernels followed by five convolutional 862 layers and, finally, one softmax layer in the spatial 863 domain. Their CNN model is similar to Xu's net [131]  layers. The CNN model was trained using stochastic gradient 924 descent (SGD). 925 Yousfi et al. [137] won the ALASKA steganalysis 926 challenge in 2019 by using SRNet [138] to train differ-927 ent combinations of three input channels, luminance Y and 928 chrominances Cr and Cb. SRNet used residual skip connec-929 tions, and the filter size was 3 × 3. All the convolutional 930 layers were followed by a batch normalization and the ReLU 931 activation function. The first eight convolutional layers did 932 not incorporate the pooling layer since average pooling is 933 assumed to be a low-pass filter, while steganalysis is con-934 cerned with high-pass content where stego data are found. 935 The output of these convolutions was fed to a fully connected 936 layer that produced two outputs and was fed to a binary 937 classifier.

938
Inspired by the idea of using the transfer learning of 939 pretraining neural networks on unrelated tasks and refining 940 steganalysis, Yousfi et al. [139]

952
In 2D image steganalysis using classical machine learning, 953 it seems that SVM and SRM are the most popular binary 954 classifiers, while FLD is the most popular ensemble classifier 955 for 3D image steganalysis. In deep learning, 2D CNN archi-956 tectures are commonly used by researchers to implement ste-957 ganalysis models for image steganalysis. It is known that 3D 958 meshes have a higher embedding capacity than 2D images. 959 However, many steganalysis studies target 2D images. There-960 fore, it is important to investigate the possibility of detecting 961 3D mesh steganography using deep learning techniques.

963
While steganalysis has received considerable attention in the 964 past decade, some challenges remain unsolved. First, the 965 different CNN models presented in this survey are designed 966 to be suitable for specific datasets. To date, there is no 967 generalized CNN model that can detect hidden messages in 968 unseen data. Second, none of the currently available deep 969 learning models take into account the use of generative adver-970 sarial networks (GANs). It is worth investigating whether 971 the generator of GAN models can learn from stego and 972 cover images and generate reasonable outputs to distinguish 973 between the two. This will help to simplify the task of 974 detecting steganography. Third, as discussed in Section III, 975 many datasets are available with different specifications, such 976 as the data domain. However, the current steganalysis deep 977 learning models use specific datasets. Therefore, there is a 978   CNNs have achieved prominent performances compared to 1008 the classical machine learning methods in the field of ste-1009 ganalysis. Detecting stego images from CNN models is still 1010 in the early stages, and the deep learning models need to be 1011 robust against steganographic algorithms. Further research 1012 needs to explore how well the generative adversarial network 1013 architecture helps develop steganalysis algorithms for images 1014 in the wild. pp. 1-5.