Automated Deep Learning BLACK-BOX Attack for Multimedia P-BOX Security Assessment

Resistance to differential cryptanalysis is a fundamental security requirement for symmetric block ciphers, and recently, deep learning has attracted the interest of cryptography experts, particularly in the field of block cipher cryptanalysis, where the bulk of these studies are differential distinguisher based black-box attacks. This paper provides a deep learning-based decryptor for investigating the permutation primitives used in multimedia block cipher encryption algorithms.We aim to investigate how deep learning can be used to improve on previous classical works by employing ciphertext pair aspects to maximize information extraction with low-data constraints by using convolution neural network features to discover the correlation among permutable atoms to extract the plaintext from the ciphered text without any P-box expertise. The evaluation of testing methods has been conceptualized as a regression task in which neural networks are supervised using a variety of parameters such as variations between input and output, number of iterations, and P-box generation patterns. On the other hand, the transfer learning skills demonstrated in this study indicate that discovering suitable testing models from the ground is also achievable using our model with optimum prior cryptographic expertise, where we contribute the results of deep learning in the field of deep learning based differential cryptanalysis development.Various experiments were performed on discrete and continuous chaotic and non-chaotic permutation patterns, and the best-performing model had an MSE of $1.8217{e}^{-04}$ and an $R^{2}$ of 1, demonstrating the practicality of the suggested technique.

minimizing computation time and storage space. This uni-94 form distribution of elements does not occur in natural images 95 when using images without unified distribution or when 96 deploying permutation algorithms in the operational mode, 97 which reflects the work's weakness. Despite the higher recov-98 ery performance, it's indeed insufficient and cannot properly 99 determine the appropriate permutation aspects. In the fre-100 quent circumstances of non-uniform color distribution, where 101 the calculated size of the key search space rises appropriately, 102 the outcome does not appear to be sufficiently pleasing. 103 Furthermore, those linked research appears to be infeasible 104 against lower color number ciphers, particularly white-and-105 black images, where reconstructing the real shape of the 106 encrypted picture is impossible. However, these studies do 107 not address another critical issue in the field of cryptog-108 raphy because all prior research is based on a black box 109 attack that uses a predetermined number of (cipher/plain) 110 pairs encrypted by permutation with no specification of the 111 permutation rounds number or key generator pattern type.

113
This paper extends prior researches [43], [44], [46], [54] to 114 overcome their drawbacks by using deep learning to assess 115 P-box permutation methods and technologies widely used in 116 multimedia encryption. Most studies that fit this condition 117 in the literature attempt to recover the whole plain form of 118 a particular cipher using classical research approaches and 119 various optimization methods to find the used key or most 120 parts of it by using black-box attacks. At the same time, those 121 methods make it impractical in many situations because they 122 go into important details and constraints of the optimiza-123 tion algorithms and parameters of methods used, whereas 124 cryptanalysis's real purpose is limited to the acquisition of 125 just the cipher sense and its basic concept, which can be 126 reached by black-box based deep learning attacks without any 127 complicated task of cryptanalysis and hard algorithm details. 128 Furthermore, these approaches appear to be hard to reuse. 129 On the other hand, our technique provides a relatively basic 130 process that can easily be reused in the testing processes. 131 Image files, unlike text files, have unique characteristics such 132 as large data capacity, redundancy, and strong adjacent pixel 133 correlation that necessitate the use of specialized strategies 134 to deal with them in the encryption process to break the 135 correlation of adjacent pixels. Among these are permutation 136 algorithms, which are based on non-linear systems, and chaos 137 theory.This technique appears to be beneficial in transfer-138 ring media files and high-resolution pictures across insecure 139 channels.   [16] deep learning models were trained, veri-226 fied, and validated on data that included plaintext, ciphertext, 227 and intermediate round data created with the same encryp-228 tion key. In the work of [10], he developed a learning algo-229 rithm to recover the secret keys of the Caesar and Vigenere 230 poly-alphabetic and substitution ciphers. In [11] also genera-231 tive adversarial networks have been employed to break these 232 traditional cryptosystems. Machine learning algorithms and 233 classification skills have been used to detect cryptographic 234 algorithms from ciphertexts in the works of [12] and [13]. 235 Classifiers were trained using known ciphertexts produced 236 by a collection of six widely used cryptographic methods. 237 Benamira et al [8] conducted a more detailed investigation 238 of the operation of ML-based distinguishers, focusing on 239 what information they employ, Their results demonstrate that 240 these machines not only perform the differential distribu-241 tion on ciphertext combinations but that the distinguisher 242 is influenced by the penultimate or ante-penultimate round. 243 They suggest a new pure cryptanalysis distinguisher with the 244 same accuracy as Gohr's neural distinguisher based on their 245 findings. [1] investigated the influence of block cipher char-246 acteristics on prediction accuracy by training deep learning 247 algorithms to estimate the amount of active S-boxes for GFS 248 cryptosystems. Deep learning has been used in both of these 249 strategies, rather than just simpler, conventional machine 250 learning techniques.

251
Further machine learning algorithm distinguishers and 252 cryptanalysis against Simon, Speck, and non-Markov ciphers 253 have also been introduced [14], [15]. The findings in [64] add 254 another contribution by investigating the capability of lin-255 ear and nonlinear machine learning classifiers in evaluating 256 block cipher security in cryptanalysis using machine learning. 257 According to their findings, machine learning models identify 258 VOLUME 10, 2022 • It just says: the K −th item of P(b) represents the i −th K item 296 of b.

297
• P-BOX is a special type of S-BOX.

298
• P-BOXES permute, repeat or discard the elements of the

329
• This notion assumes that those cryptographic algorithms 330 based on permutation can be deciphered using the same 331 technique.

332
• As a result, the starting location of pixel p must be 333 determined using the inverse function T −1 k .

334
• The T k function as well as its inversion T −1 k are formed 335 by the encryption private key k and have the same 336 dimension as the considered plaintext witch is the block 337 cipher size (P-box size).

338
• As a result, this framework reveals that image encryption 339 reflects a symmetric block cipher, with an input size of 340 (MN ) as well as a key size of (MN ).

341
• We conclude that all permutation approaches will be 342 included within the [MN ]! possible scenarios that also 343 reflect the greatest number of selected plaintexts that 344 outcomes in a conclusion, and that permutation crypt-345 analysis should thus concentrate on those dimensions as 346 a problem space [54].

347
Permutation methods are frequently included as multi-348 media content encryption techniques, and they are strongly 349 94022 VOLUME 10, 2022 advised as an attractive option in the composition of mar-

410
The above approach has been further expanded in [42] to 411 break permutation-only picture encryption of pixel bits.

412
However, if the atoms of each element L are not low and the 413 entropy contained in each element is significant, it is evident 414 that finding for all (L!) possibilities is very complex, and it 415 hence COA is practically infeasible.

416
As a result, several research strategies propose building 417 more complicated ways to generate secret permutations in 418 order to provide higher security while also satisfying various 419 additional application-dependent needs [25], [26], [30], [32]. 420 Despite efforts to improve the resilience of permutation-421 only ciphers to cipher text-only attacks, most cryptographic 422 algorithms of this type are vulnerable to plaintext attacks.

423
The impact of a chosen-plaintext attack (CPA), in which 424 the adversary obtains the cipher text of a chosen plaintext, 425 is increased by making all elements of the plaintext distinct 426 from one another (input difference). [44] demonstrated 428 that the minimum constraint on the number of selected plain-429 text to completely extract the fundamental permutation pat-430 tern is (Log r l), where r seems to be the number of potential 431 intensity.

432
It was hard to calculate how often these known plaintexts 433 are required to properly break the fundamental permutation 434 pattern in the case of a targeted attack (KPA), which rep-435 resents an attack model that varies from CPA only in the 436 implication that the adversary cannot choose the plaintext 437 arbitrarily.

438
In general, L is much larger than r in multimedia data. 439 According to the pigeonhole principle, certain values in 440 0, 1, . . . , r 1 must occur more than once. The same pixel 441 quantity of 0 should occur approximately 512 times within 442 the permutation encrypted ciphered image when a known 443 plainimage of size (512x512) has a uniform distribution. As a 444 result of witnessing this clear picture and the accompanying 445 cipher one, there must be ([512]!) possibilities for one item 446 in the permutation pattern whose pixel value corresponds to 447 zero [54].   The model computes the difference between various inputs 515 and outputs based on the dataset, using numerous parameters 516 such as batch size and pixel values, and the present key char-517 acteristic is the correlation between adjacent pixels in two-518 dimensional space.The decryptors' objective is to extract the 519 visual difference between the inputs and the outputs, which 520 would be formally defined as follows: where Input 1 and Input 2 are two distinctive plainimages 523 and ⊗ denotes to the dissimilar function.

524
Because we work in a two-dimensional space, the dis-525 similar function represents the distance between two 526 images.For more details, if we place two images that 527 have the same size on top of each other, the function 528 represents the number of pixels in the same position in 529 the two images that have different colors and this can be 530 noted as follows: 531 Input 1 = Img 1 and Input 2 = Img 2 .

532
The dissimilar function represents the number of pix-533 els P ij with the condition P1 ij =P2 ij and P1 ij ∈ Img 1 , 534 P2 ij ∈ Img 2 . In the case of a non-unified distribution, every differential 549 direction must have a specific probability of holding: . We show in this 552 paper that, even with the absence of data distribution, our 553 neural decryptor successfully employs aspects of ciphertext 554 pairs that are not addressed by the previous differential works. 555

556
Test evaluation is conceptualized as a regression problem 557 for a supervised model in which layers of the model are 558 trained by many characteristics such as variations between 559 input and output, number of iterations, and P-box generation 560 patterns.

561
Deep learning algorithms are used to find a decryptor 562 because they can detect hidden structures in digital informa-563 tion besides the need for explicit intentional feature extraction 564 engineering.   To get the highest accuracy and learning speed, we investi-570 gated the width (number of neurons for every layer) and depth 571 (number of hidden layers) of the latter. used to mimic such complex inverse characteristics. In 629 Fig.5, the system is split into convolutional and decon-630 volutional groups.

631
• The input is encrypted images specifically mentioned 632 as X in convolutional groups, and we start generating 633 six convolutional layers to quantify input image compo-634 sition that gets low-dimensional characteristic features, 635 with the operating condition described as Y = O(X ).

636
• All these characteristic features will be used to define 637 the dense layer parameters for profound understanding 638 to detect hidden features in data sets without the need for 639 intentional feature selection. When deploying machine learning algorithms, the hyper-650 parameters that make the biggest difference for a particular 651 task must be chosen. These parameters are often determined 652 experimentally by analyzing multiple network topologies and 653 adhering to best practices. There are automated ways of 654 tuning the hyper-parameters [60], but they demand significant 655 resources that can be difficult to replicate. Following that, 656 we provide the results of the manual architectural search.

657
The remaining hyper-parameters that were correctly 658 applied in our interesting experiments are listed below: where Ds is a uniform distribution and Prv is the dimen-669 sion of the preceding layer (the number of columns 670 in W ).

671
• Optimizer: As an optimizer, we used the Adam algo-672 rithm [58]. Since it slightly differs from the classical 673 gradient descent we presented before, we give a brief 674 explanation here. We denote two sequences: x t and y t are respectively 1 st order (mean) and 2 nd order 678 (variance) gradient estimates. 679 94026 VOLUME 10, 2022 where θ(t) represents as before our trainable parameters, 681 E is our loss function and γ 1 , γ 2 are constants.
A negative residual indicates that the desired value was 731 too great, whereas a positive residual indicates that the 732 value obtained was too lower. A regression line's goal is 733 to minimize the sum of residuals.

734
For calculating residuals, recognizing that:r i = y i −ŷ i 735 and understanding that the regression contains the equa-736 tion: The residual of observation is calculated as follows: 738 • Activation function: The linear activation function was 740 chosen for any situation in which activation is roughly 741 proportional to the input. It is also known as ''no activa-742 tion'' or ''identity function''.

743
The function makes zero variations to the weighted com-744 bination of the parameters; it really just returns the value 745 that was therefore provided. In our situation, using this 746 function well preserves the parameters generated by the 747 Adam optimizer and strengthens the effectiveness of the 748 convolution features.

750
• The first important Conv-2D measurement is the total of 751 filters that the convolutional layer should receive.

752
• The depth of the kernel, which is a 2-tuple indicating the 753 size of the 2D convolution frame, is the next essential 754 factor that must be supplied to the Conv-2D class. The 755 kernel size must be an integer value as well.

756
• The strides configuration is a pair of integers that 757 describes the movement of the convolution along the 758 input volume's x and y dimensions.

759
• The padding argument of the Conv-2D class could have 760 one of the two possible paratetrs: valid or the same. 761 By using the valid measurement, the entry dimension 762 is not zero-padded, so the spatial perception has been 763 restricted naturally through the use of convolution.
764 Figure 5 illustrates the model architecture and tables 1, 2 765 and 3 represent the convolutional groups parameters, Dense 766 layer and De-convolutional groups parameters respectively. 767

768
One of the main goals of image encryption algorithms is 769 to break down the correlation between adjacent pixels as 770 VOLUME 10, 2022    The logistic map is utilized in this study case to produce a 826 series of numbers, However, any discrete chaotic map can 827 also be employed in the same manner.

828
After sorting these values ascendingly, the scoring system 829 for every integer in the sorted series is used to fill the permu-830 tation P-BOX.

831
The standard logistic map with parameter λ looks like this: 832 The discrete chaotic system was iterated ( MN spc ) rounds for 834 P-Box of size MN , where spc is the value of algorithm output 835 parameters and it represents the lowest integer higher than or 836 equal to ( MN spc ).

837
2) CONTINUOUS CHAOS 838 The Lorenz system is utilized in this study case to produce a 839 series of numbers, However, any continuous chaotic system 840 VOLUME 10, 2022 can also be employed in the same manner. To begin, the three output frames are modified to remove RAM employing Python 3.7.13, TensorFlow 2.8, and Keras 889 API,The source code is available from GitHub. 1 890 We were using the Keras checkpoint called Call Backs 891 to preserve the much more intended results during every 892 iteration, as well as the weight and bias of the CNN model. 893 To demonstrate the scope of training a machine learning-894 based decryptor by exploiting significant differences between 895 (Plainimages) and (cipherimages), we set up an experiment 896 in which DL-decryptors are trained in a single round, eight 897 rounds, and sixteen rounds with the following parameters: 898 First, we trained the model on data from the Mnist data set 899 and the results are presented as following.     difficult ciphertext. The specifications designed to automati-959 cally generate the first round of permutation keys are the same 960 as for the first experiments, but it should be highlighted that 961 in the case of multiple rounds, the key generation stage will 962 be conducted in accordance with the number of rounds. In the 963 case of eight rounds, the first key is created from the baseline 964 parameters specified above, the second key from the first one, 965 the third key from the second one, and so on until the last 966 round.

967
This experimentation is divided into these main phases:

968
-After employing the encryption approach given in Table 5 969 we get:

970
-The generation of cipher Mnist training dataset based on 971 generated keys: 60,000 samples for training and 10,000 sam-972 ples for model validation.

973
-Generation of ciphered Fashion Mnist encryption training 974 datasets based on the same produced keys for the four per-975 mutation patterns: 60,000 samples for a transfer learning test 976 and 10,000 samples for each model's prediction.

977
-Training the four models with the ciphered mnist data sets 978 and saving the best results for each one.

979
-The reusing of the four models trained by encrypted mnist 980 images as deployment models for prediction images from the 981 ciphered fashion mnist data set.

986
To conduct a more in-depth investigation and better test our    way that they seem random (pseudo-random generators), but 1020 these patterns simultaneously allow for the inverse opera-1021 tion, which is decryption without loss of data. The model is 1022 not distinct in itself, but it improves in the identification of 1023 decryptors . In other terms, a model trained on data encrypted 1024 with one round CML is distinguishable from a P-box-based 1025 CML one-round encryption algorithm. All of the models 1026 have the same architecture, layers, and hyper parameters, but 1027 the key difference between them is the parameters acquired 1028 during the training process (weights and bias).

1029
After training the four models with the Mnist data set, 1030 we attempted to employ learning transfer by using the weights 1031 and bias of the models from the first model trained by the 1032 Mnist data set for one, eight, and sixteen rounds as deploy-1033 ment models for the Fashion Mnist models for one, eight, and 1034 sixteen rounds with the same permutation patterns and the 1035 same algorithms parameters, respectively.

1036
The most remarkable conclusion is that, without any train-1037 ing, the assessment process converges toward desirable find-1038 ings and the error function is reduced.

1039
It should be highlighted that by combining transfer learn-1040 ing with optimal prior cryptographic competence, it is 1041 also possible to develop acceptable decryptors from the 1042 ground up by utilizing the transfer learning techniques 1043 described in this paper. The results obtained are presented in 1044 tables 9, 10 and 11 respectively. Furthermore, the experience 1045 of learning transfer reusability to improve model performance 1046 94032 VOLUME 10, 2022  is the best proof of the concept distinguishability emphasized 1047 in this research. sets. It has a precision of (98.05 %) for Fashion Mnist and 1080 (99.00 %) percent for Mnist, and its design is fairly simple. 1081 It is also suitable for implementation as an experimental 1082 investigation. Figure 16 represents the architecture of this pre-1083 trained model. 1084 We examined the model's prediction performance on the 1085 original MNIST and Fashion MNIST test sets first and then 1086 used it to monitor the effectiveness of our predicted encrypted 1087 images. Figure 17 and 18 demonstrate the visual results of the 1088 quantitative prediction analysis.

1090
The first factor we saw was that when the number of rounds 1091 increased, so the effectiveness of the deep learning attack 1092 decreased. But this degradation is relative to several parame-1093 ters and it differs from one permutation alternative to another. 1094 Discrete chaos permutation patterns, for example, are more 1095 strong to attacks than continuous chaotic, and coupled map 1096 lattice is more secure and robust than the Gray code based 1097 permutation technique.

1098
As a result, we find that discrete chaos is more resistant to 1099 our attack when the number of rounds increases, followed by 1100 some little resistance from continuous chaos; the scientific 1101 interpretation of the resistance is the discrete generation of 1102 permutation patterns, which makes the attack more difficult 1103 by more efficiently destroying the correlation between the 1104 swappable atoms. 1105 VOLUME 10, 2022  It should also be highlighted that our work is sensitive to 1115 the kind of data. For example, if we use an image with all 1116 of the intensities and a total number of pixels equal to zero or 1117 any other value between zero and 255, the model cannot learn 1118 anything as well as the loss function, which is represented by 1119 the limitation of differential cryptanalysis in the case of zero 1120 difference.   weakness: the uniform distribution of colors in KPA pairs. 1152 Among the restrictions, we may mention that all of these stud-1153 ies are based on the black box attack with classical research 1154 approaches and various optimization methods, and none of 1155 these works discussed the scenario of applying many rounds 1156 of permutation. Because of the unavailability of uniform dis-1157 tribution in real data, their reuse in a process of evaluating the 1158 permutation approach remains a concern in real-world practi-1159 cal scenarios. Table 12 provides a comparison between all of 1160 these works. In our contribution, we examined the problem 1161 from a different perspective than that taken by the previous 1162 research. As the main viewpoint, we considered the absence 1163 of uniform color distribution as a focus point. We also dis-1164 cussed this topic in terms of the number of rounds and key 1165 generation strategies. We consider that our technique has the 1166 first advantage of being easily reusable. This technique can 1167 be used to test the strength of image encryption algorithms 1168 during the deployment phase or to select the best permutation 1169 strategy during the development phase.   which are used to build deep learning models with different 1192 more suitable layers. to the needs of the designer, as long as 1193 the hyper-parameters are preserved.

1194
Despite that, our approach necessitates preprocessing 1195 dataset procedures and a large number of computational 1196 resources, as well as computation time and a vast number of 1197 experiments in the search for the desired model, highlighting 1198 the technique's limitations.

1200
In this research, our findings provide an innovative method-1201 ology for leveraging deep learning to identify decryptors 1202 on symmetric permutation primitives. Our approaches are applicable to any number of (non-zero) input variations. At its heart, we adopt frequent dissimilarities to solve the challenge 1205 of discriminating in two-dimensional space.

1206
The presented research is intended to be used separately 1207 from the operational mode of cryptography implementations.
cipher architecture, it can be used to examine the strongest 1210 permutation mechanism to be used.

1211
Otherwise, it can be implemented to assess and compare 1212 different permutation patterns algorithms with a scientific 1213 hypothesis. However, the time required to calculate those 1214 assessments is a significant factor influencing their utility. 1215 We do not claim that deep-learning tools will eventually 1216 replace classical cryptanalysis. However, we believe that our 1217 findings demonstrate that deep learning models are able to 1218 be trained to do cryptanalysis at a level that is attractive to 1219 cryptographers and that deep learning approaches can be a 1220 helpful addition to the arsenal of cryptographic assessors.