Bayesian Modeling for Differential Cryptanalysis of Block Ciphers: A DES Instance

Encryption algorithms based on block ciphers are among the most widely adopted solutions for providing information security. Over the years, a variety of methods have been proposed to evaluate the robustness of these algorithms to different types of security attacks. One of the most effective analysis techniques is differential cryptanalysis, whose aim is to study how variations in the input propagate on the output. In this work we address the modeling of differential attacks to block cipher algorithms by defining a Bayesian framework that allows a probabilistic estimation of the secret key. In order to prove the validity of the proposed approach, we present as case study a differential attack to the Data Encryption Standard (DES) which, despite being one of the methods that has been most thoroughly analyzed, is still of great interest to the scientific community since its vulnerabilities may have implications on other ciphers.


I. INTRODUCTION
Among the many different encryption methods adopted by the modern systems, algorithms operating on fixed-length blocks of bits are still one of the most popular. The strength of these methods is constantly being studied by means of approaches that aim to assess their robustness to specific attacks, or the presence of vulnerabilities to generic threats. In this context, differential cryptanalysis is one of the most effective and relevant approaches. The idea at the basis of differential cyrptanalysis is to evaluate how any change in the plaintext impacts the ciphertext. Then, the results of the analysis can be used to estimate the set of the most probable keys.
In this paper we present a Bayesian framework for modelling differential attacks to block cipher algorithms; in particular, given the importance of the Data Encryption Standard (DES) in the design of many block cipher algorithms, a case study focused on the cyrptanalysis of DES is addressed.
The Data Encryption Standard (DES) [1] was the first symmetric cipher heavily adopted all over the world and it The associate editor coordinating the review of this manuscript and approving it for publication was Xiali Hei .
was the most used cipher up to the beginning of 2000s. Deep analyses of DES led to the definition of several cryptanalysis techniques, and many results achieved for DES are also valid for the wider class of block ciphers.
Today, the limited size of the secret key adopted by DES (56 bit) and the computational power of modern computers entail that DES is not considered secure for ciphering sensitive data. Nevertheless, DES is still widely adopted in various scenarios, such as those characterized by low security requirements, if resource-constrained devices are required to implement security mechanisms, or when huge amount of data have to be protected. The authors of [2], for instance, propose the adoption of DES to ensure privacy in a graduate project management system. Similarly, the need to protect a large amount of data while keeping the computational costs low moved the authors of [3] to choose DES for data encryption in an ERP. DES is often exploited to protect data exchanged between Internet of Things devices, which are characterized by severe resource requirements [4], [5], [6]. DES could also be employed as a tool for providing companies with proper data protection policies that represent a fair trade-off between security goals and computational costs [7]. Moreover, DES is a building block of Triple DES [8], [9], VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ a solution adopted to overcome the limitations imposed by the DES key size. Thus, it is interesting to investigate the vulnerabilities of DES also for possible implications on other block ciphers. Several works in the scientific literature identified and analyzed some of the main vulnerabilities of DES, through the definition of new cryptanalysis techniques. One of these approaches is the differential cryptanalysis [10], a chosen plaintext attack designed for iterated cryptosystems, which analyzes how the difference between two plaintexts propagates in the resulting ciphered texts when using the same key. The differential cryptanalysis focuses on the S-Box, the unique non-linear component of DES, and allows to reduce the computational cost in comparison with an exhaustive key search. Differential Cryptanalysis has been adopted also to perform attacks to other symmetric cyphers, such as AES [11]. Moreover, several machine learning approaches have been adopted in recent years to improve differential cryptanalysis, or to provide a new perspective on it. The authors of [12], for instance, proposed the adoption of neural networks to attack DES, and evaluated the performance by using different network structures. In [13], several metaheuristics, such as genetic algorithms and simulated annealing, are exploited to formulate a differential attack on DES. The experiments performed on a DES reduced to six rounds demonstrates the suitability of the approach. The authors of [14] and [15] relied on deep neural networks to design a differential distinguisher to attack different block ciphers based on the Feistel network.
We propose an original formalization of the differential cryptanalysis based on the adoption of Bayesian Networks (BN), a probabilistic graph model framework that uses Bayesian inference to perform probability computations. We aim to describe the statistical behavior of S-Boxes when a pair of plaintexts, with a given difference, is provided for ciphering. The diagnostic inference enabled by BNs, allows a probabilistic estimation of the secret key, by considering the difference between plaintexts and the difference between the corresponding ciphered texts. Such formalization, preliminary described in [16], eases the definition of an algorithm for attacking the DES, based on the differential cryptanalysis.
The paper is organized as follows. In Section II, a brief description of DES is provided, in order to introduce the adopted notation. Section III describes some related works presented in the literature. In Section IV, the original formulation of the differential cryptanalysis is introduced. Section V describes the proposed Bayesian model of the DES differential cryptanalysis. Finally, Section VII states our conclusions.

II. THE ADOPTED DES NOTATION
DES is a symmetric cipher which transforms a 64-bit plaintext P in a 64-bit ciphertext T . Such mapping is parameterized by a 64-bit key, reduced to a 56-bit key because of the use of 8 parity bits. It is an iterated cipher based on the  Feistel scheme, which processes plaintext through a series of transformations named rounds, as showed in Fig. 1. The encryption process consists of 16 rounds, which are preceded by an initial permutation and followed by the corresponding inverse permutation. Each round is parameterized by a 48-bit subkey S K X ∈ Z 48 2 , depending on the round X and the initial key K .
At each round X , the 64-bit input is divided into two parts, left and right, which are processed separately. The right part becomes the left one of the next round without any further processing. Both halves are processed according to the Feistel scheme, in order to produce the right part of the next round, as showed in Fig. 2. In particular, for each round X = 2, . . . , 16, the following equations hold; where F, named Feistel function, determines the non-linear behavior of DES.  The Feistel function implemented by DES (see Fig. 3) is defined as follows: where E, S and P represent respectively the expansion function, the substitution performed by the S-box and the permutation function.
Since the only non-linear component of the DES F-function is the S-box, it constitutes the main contributor to DES security. One of the properties of S-Box is the uniform distribution of the probability of producing a given output. Nevertheless, authors of [10] have shown that, taken two different inputs for a given S-Box characterized by some known difference, then the probability distribution of the difference between the corresponding outputs is not uniform. Differential cryptanalysis [10] exploits such a vulnerability in order to reduce the computational effort for determining the secret key, and the same idea underlies the approach described in the present work.

III. RELATED WORK
As previously mentioned, a deeper comprehension of S-Box behavior could make the whole cipher more vulnerable. Many works in the literature analyze properties of S-Boxes in order to find DES vulnerabilities and to define design criteria for strong block ciphers. Authors of [17] analyze properties of S-Boxes with respect to the statistical distributions of produced output and the statistical dependence of output bits given the knowledge of one or more input bits. In [18] some general criteria to design S-Boxes are discussed. Authors analyzed both static and dynamic properties. Static properties impose that partial information about input and output does not reduce the uncertainty of unknown input or output, and guarantee the maximum output uncertainty. Dynamic properties impose that partial information about changes in input and output does not reduce the uncertainty of unknown inputs or outputs. Authors stated that the uncertainty should not be reduced when the attacker has information about the past history of S-Boxes processing. Other studies indicate that the latest approaches [19], [20], [21], also known as strong S-Boxes, are vulnerable due to the adoption of fixed point or reverse fixed point, which can be an exploitable weakness in cryptography. Authors of [22], for example, address the exploitable weakness of fixed point and reverse fixed point contained in many S-Boxes. Then, they designed a S-Box construction algorithm based on ICQM that eliminates the weakness through backtracking.
On the basis of the properties discussed so far, many cryptanalysis methods were proposed in the literature to violate S-Boxes. An algebraic approach is proposed in [23], which defines the set of criteria to determine the set of non-linear algebraic constraints which describes the I/O relationship of S-Boxes. Exploiting this set of constraints, the whole cipher is described as a system of multivariate non-linear equations, that can be solved through the algorithm proposed in [24]. It should be noted that the equations representing S-Boxes are exact, i.e. not approximated. On the contrary, the author of [25] proposed a linear approximation of S-boxes and DES, which is valid with some probability. This method is an example of stochastic attack.
Instead of focusing on the behavior of a single S-Box, authors of [26] focus on the probabilistic behavior of pairs of adjacent S-Boxes. They found that input bits of two adjacent S-Boxes are strictly related by some bits of the key, due to the expansion phase. Thus, the probability distribution of the output of these two adjacent S-Boxes, conditioned on key bits is not uniform. On the basis of such vulnerability, authors proposed an attack with computational complexity comparable to the exhaustive key search.
Authors of [10], which propose the differential cryptanalysis, studied how input differences affect the resulting output difference. Their attack traces differences through the transformations, discovers where the cipher exhibits nonrandom behavior, and exploits such properties to recover the secret key. Another interesting work discussing the differential cryptoanalysis is presented in [27]. Here, the authors study the propagation of differences from round to round to find specific differences which propagate with relatively high probability. The cryptanalysis technique is applied to DES reduced to i-rounds, with i ∈ [3,8] and, for each, the differentiation between wrong and right pairs is made to get relevant key bits and retrieve the secret key.

IV. ORIGINAL FORMULATION OF DIFFERENTIAL CRYPTANALYSIS
The vulnerability at the basis of the differential cryptanalysis [10] originates from the non-uniform distribution of the difference between two outputs, given the difference between VOLUME 11, 2023 FIGURE 4. Notation adopted for describing differential cryptanalysis. two inputs, for different keys. Nowadays, it is a technique adopted to breach many reduced-round block cyphers, such as SPECK [28], LEA [29], GIFT [30], and Midori64 [31].
This section summarizes the original formulation of the differential attack, with the notation showed in Fig. 4.
Let S EX and S * EX be two outputs from the expansion function at round X , S IX and S * IX the following two inputs to the S-Box S(·), and S OX = S(S IX ) and S * OX = S(S * IX ) the resulting outputs from the S-Box. The differences between S-Box inputs and outputs are obtained through the bitwise xor and are indicated as follows: The vulnerability exploited by the differential attack is that the probability distribution of the difference between two outputs, conditioned by the difference of the two corresponding input, i.e., p(S ′ OX |S ′ IX ), is not uniform. This characteristic makes S-Boxes weak from a dynamic point of view, according to analysis proposed in [18].
Let's consider N pairs of output from the expansion function characterized by the same difference. As showed in Fig. 4, the relationship between each pair of outputs from the expansion function, the subkey, and the S-Box inputs is expressed by the following equations: Consequently, the difference between S IX and S * IX is equal to the difference between S EX and S * EX : Thus, given the knowledge of the expanded pairs (S EX , S * EX ), it is also known the difference between S-Box inputs, i.e., S ′ IX , without knowing separate values. This knowledge does not allow to foresee the difference between S-Box outputs. Indeed, due to the non-linear behavior of S-Boxes is not obvious that two input pairs with the same difference produce the same output difference; on the contrary many values for the output difference are possible.
The critical point is that only some output differences are possible starting from a given input difference, and the probability distribution of these values is not uniform. For each pair (S EX , S * EX ), it is possible to observe the following output pair (S OX , S * OX ), and to compute the differences between inputs and between outputs, i.e., S ′ IX and S ′ OX , according to Eq. 3 and 4. Moreover, it is possible to select the set of possible keys which can produce the observed differences, by exploiting the equation S KX = S EX ⊕ S IX . Thus, each pair (S EX , S * EX ) produces a set of candidate keys, and the true secret key belongs to the intersection of these sets. Consequently, it is necessary to repeat this evaluation until such intersection is a singleton.
The logic behind the differential cryptanalysis attack can be described through the following simplified pseudocode:

V. BAYESIAN NETWORKS MODELS
Bayesian networks (BN) [32] are a graph-based formalism capable of expressing probabilistic cause/effect relationships between random variables. Such framework is adopted in machine learning for performing probabilistic inference.
In this work, we model through BNs the statistical dependence driven by the secret key between input differences and output differences, as found in [10], and we exploited it to determine the secret key. In the graphic model adopted by BNs, nodes represent random variables and directed links represent the cause/effect dependence between two nodes. BNs allow to represent the joint probability distribution of several variables through a set of conditioned probability distributions, each associated to a link, and a set of a priori probability distributions, for nodes without antecedents.
In this section, we will present the BNs which model a single S-Box, the Feistel function and the whole DES, and then we will present the algorithms for attacking such elements through exact inference, and analyze their computation complexity. We will prove that the exact inference for attacking the whole DES has a high computation cost, and consequently we will propose an algorithm based on approximate inference.

A. SINGLE S-BOX ATTACK
For the construction of the BN for attacking the S-Box, the original notation reported in [10] is adopted.
It is useful to recall that the S-Box consists in a set of eight S-Boxes, indicated as Si(·) with (i = 1, . . . , 8), each of which accepts 6 bits as input and produces 4 bits as output. So, the input to a S-Box can be considered divided into eight 6-bit blocks. According to the adopted notation, (Si EX , Si * EX ) indicate the two i-th 6-bit blocks of two different outputs from the expansion function, Si KX indicates the i-th 6-bit block of the subkey, (Si IX , Si * IX ) represent the two inputs to the i-th S-Box Si(·), Si ′ IX represents the difference between the two inputs to the i-th S-Box, and finally Si ′ OX indicates the difference between the two 4-bit outputs from the i-th S-Box.
The probabilistic inference exploits the known value of some random variables, named evidence, and infer the probability distribution of a set of unknown random variables, named target nodes.
Since the differential cryptanalysis exploits a chosen plaintext attack, i.e. a circumstance where the adversary is capable to trigger the encryption of arbitrary messages and to observe the corresponding plaintext-ciphertext pair, the set (Si EX , Si * EX , Si ′ IX , Si ′ OX ) constitutes the evidence and the key blocks Si KX represent the target nodes.
In order to build the BN we complied with the following assumptions: • The Si KX , Si EX and Si * EX variables are not influenced by other random variables, thus they are represented as nodes without antecedents; their a priori probability distribution is considered as uniform.
• The input to the i-th S-Box, Si IX , depends only on Si EX and Si KX , according to Eq. 4, which are the sole parents of the Si IX node (analogously for Si * IX ).  thus the Si IX and Si * IX nodes are the only parents of the Si ′ OX node. The resulting BN, named SBox-BN, is showed in Fig. 5. The full definition of the BN requires the formalization of (i) the a priori probability distributions for nodes without parents and (ii) the conditioned probability distributions for other nodes.
Let us represent as δ n (X ) the Kronecker delta applied to a n-bit string, taking value one if and only if all bits of its argument are equal to zero. Then, the probability distributions of the SBox-BN are expressed as follows: 2 , because of the hypothesis of uniform distribution; • p(Si IX = si ix |Si EX = si ex , Si KX = si kx ) = = δ 6 (si ix ⊕ si ex ⊕ si kx ), ∀si ix , si ex , si kx ∈ Z 6 2 , because of Eq. 4; ox ∈ Z 4 2 and si ix , si * ix ∈ Z 6 2 , because of Eq. 3. The flow of the probability distributions through the Bayesian Network depicted in Fig. 5 is summarized in Fig. 6, where the three plots show the most significant distributions within the SBox-BN. The probabilities of all nodes at level 0, e.g., Si EX , are uniformly distributed (see the plot in the upper left corner); that is all outcomes are equally likely with a probability of 1/2 6 . The two nodes at level 1, as well as their child Si ′ IX , are characterized by a distribution in which the probability of most configurations is zero, while the remaining possible hypotheses have constant probability values. The 3D-plots in Fig. 6 represent this probability; the axes refer to the variables involved in the probability distribution equation, while the points indicate where the probability assumes nonzero values. By observing the plot for Si ′ IX , it can be noticed that only a subset of keys, characterized by an extremely regular pattern, is retained over all possible combinations. The bottom right plot shows the probability distribution of the node Si ′ OX , which is characterized by the lack of a regular patterns because of the non-linearity introduced by the S-Box. Under such BN model, given the two outputs si ex and si * ex from the expansion function and the corresponding output difference si ′ ox from the S-Box at the round X , the most probable secret key corresponds to the greatest conditioned probability among keys that produce si ′ ix = si ex ⊕ si * ex as input difference and si ′ ox as output difference, as follows: (7) By applying rules for manipulating probability expressions in BNs, it is possible to obtain the explicit formulation of such conditioned probability: where η 1 is a normalization factor which makes 1 the sum of all terms of the probability distribution, and σ 1 is the set of all (Si IX , Si * IX ) pairs obtained through the XOR of the possible secret key with the given input evidence: It is worth noting that, since Si IX and Si * IX are restricted to a single value, the sum in Eq. 8 corresponds to a single value, as expressed by the following equation: For the sake of brevity, we omitted the detailed proof, that nevertheless can be found in [33].
In order to narrow down the set of possible keys, it is possible to evaluate a non-normalized version of Eq. 10, by ignoring the normalizing factor η 1 . Indeed, the probability distributions describing the SBox-BN are expressed through the Kronecker delta; thus, Eq. 10 can provide only two values: 0 for all keys that have been excluded, and a constant value η 1 for all keys that are still possible. Such a value can be Algorithm 1 -prob_key_SBox_attack -Algorithm for Computing the Probability That a Key Block Is Correct by Attacking the i-Th S-Box Data: i: the index of the selected S-Box : a set of multiple evidences = si ex , si * ex , si ′ ix , si ′ ox ; Result: p: the array of 2 6 values, representing the non-normalized probability distribution over the set of possible key blocks. begin p ← new array [2 6 ]; for k ix = 0 : determined by imposing that the sum of all the residual probabilities is equal to 1. However, since the purpose of Eq. 10 is merely to identify the residual set of keys, the computing of a specific value for η 1 is irrelevant.
The sets of possible keys obtained by attacking a S-Box with two different evidence sets may be different. Since the true secret key belongs to each of these sets, their intersection is never void. With a sufficient quantity of data, by performing multiple attacks with different evidences, the repeated intersection of the obtained key sets produces the singleton containing only the secret key.
The assumption of the independence of the evidence sets allows to express the probability distribution of the secret key conditioned by all the evidence sets as the product of the probability conditioned by each single evidence set: where η 2 is a normalization factor and is the set of multiple evidences: In order to find the most probable key, it is possible to evaluate the not normalized version of Eq. 11 and Eq. 8 as described by the Algorithm 1.

B. FEISTEL FUNCTION ATTACK
The same approach of the previous section can be generalized in order to attack a single instance of the Feistel function, by analyzing its output (named Y X in Fig. 3). Under the hypothesis of chosen plaintext attack, it is possible to select a pair of inputs to the Feistel function, Z X , Z * X ∈ Z 32 2 , and then 4814 VOLUME 11, 2023 observe the difference between the corresponding outputs, Y ′ X , obtained according to the following equation: where P, S and E are respectively the permutation function, the substitution performed by the S-box, and the expansion function. By observing that S IX = E(Z X ) ⊕ S KX and S * IX = E(Z * X ) ⊕ S KX , and by exploiting the linearity property of the permutation function, it is possible to obtain the following system of equations: Each variable in such system can be considered as a random variable, and their relationships can be represented through the BN showed in Fig. 7, named Feistel-BN. Its probability distributions are expressed as follows: 32 , ∀z x , z * x ∈ Z 32 2 , because of the hypothesis of uniform distribution; • p(S KX = s kx ) = 1 2 48 , ∀s kx ∈ Z 48 2 , because of the hypothesis of uniform distribution; 32 2 , because of the first part of Eq. 14; , ∀z x ∈ Z 32 2 and ∀s ix , s kx ∈ Z 48 2 , because of the second part of Eq. 14; , ∀z * x ∈ Z 32 2 and ∀s * ix , s kx ∈ Z 48 2 , because of the third part of Eq. 14; , ∀s ix , s * ix ∈ Z 48 2 and ∀y ′ x ∈ Z 32 2 , because of the fourth part of Eq. 14; The goal of the attack on the Feistel function is to find the most probable set of keys, given the known evidence, obtained by maximizing the following likelihood: Albeit the construction of the probability distribution over a 48-bit key, by expanding Eq. 15, requires 2 48 steps, it is possible to reduce the computational complexity by  exploiting the linearity of the P(·) and E(·) functions. Let us recall that the XOR between the output of the expansion function, E(·), and the secret key, is the concatenation of the inputs to eight S-boxes, and that the input to the permutation function P(·) is the concatenation of the outputs from the eight S-boxes, as expressed by the following equations: Then, the Feistel function can be violated by attacking each single S-Box and then by obtaining the full 48-bit key by concatenating the partial results: Thus, the actual computational cost for attacking the whole Feistel function is eight times the cost for attacking a single S-Box, since the following equation holds: The Algorithm 2 describes how to perform the attack. Its computational cost is dominated by the evaluation of the probability distribution for key blocks. Namely, the other VOLUME 11, 2023 components, i.e., the separation of the evidences in blocks and the composition of the whole probability distribution, may be easily optimized, although in the pseudocode they are described in an extended form for the sake of readability.

C. DES ATTACK
In the following we describe the BN which models the attack on the whole DES. We show that, differently from the attack on the Feistel function, it is not affordable to attack the complete DES through exact inference since the computational cost grows exponentially. We propose, hence, an algorithm for attacking DES through approximate inference.
In the following description we neglect the initial and final permutations, since they do not affect the probabilistic analysis. Let P and P * be two plain texts input to DES, P ′ be their difference, and (L ′ , R ′ ) the left and right parts of P ′ , each constituted by 32 bits. Let us indicate the difference between the two outputs from DES as T ′ , and (l ′ , r ′ ) its left and right parts. Moreover, let us assume that the two plain texts are independently chosen.
The relationships among variables involved in the first round of DES are described by the following equations: The difference between the inputs to the second round can be obtained by considering the variables involved in the first round, as follows: The iteration of such procedure leads to the formulation of the following system of equations, that expresses relationships among the variables involved in all the rounds of DES: These relationships are graphically represented by the Bayesian Network, named DES-BN, showed in Fig. 8. It is worth noting that the structure of the DES-BN is based on the simplifying assumption that subkeys are mutually independent, as also proposed in [10], since such assumption allows to simplify the evaluation of the BN conditioned probabilities. The goal of the attack on the whole DES, given a single evidence set = (P, P * , T , T * ), is to find the set of keys that maximizes the following likelihood: where N = 16 is the number of rounds. It is possible to prove that such likelihood can be expressed as follows: ; // Attack the Feistel function: p =prob_key_attack_Feistel( ); S K [X ] = argmax (p); // Update the evidence sets: for all i = (P, P * , T , T * ) ∈ ϒ do temp l = l; temp l * = l * ; l = r ⊕ F(l, S K [X ]); l * = r ⊕ F(l * , S K [X ]); r = temp l ; r * = temp l * ; udpate ← (P, P * , (l, r), (l * , r * )); Break 3-round DES through exact inference; The research of the optimal key by exploiting the Eq. 23, through a backward exact inference process requires an high computational cost, that makes infeasible such an approach.
Instead, it is possible to exploit the forward inference in order to estimate the most probable difference propagation through different rounds, and then exploit a statistical sampling technique, as described in [32], to estimate the subkey for each round.
In particular, for the last round N , the following relationships among variables hold: Algorithm 4 -sample_Z ′ X -Algorithm for Drawing a Set of Samples for the Z ′ X Variables of All DES Rounds, Given a Single Evidence If Z ′ N −1 were known, the best way to obtain the subkey S KN should be to compute the value of Y ′ N through Eq. 24, and then use the attack on the Feistel function of the last round, by using Z ′ N and Y ′ N as input. Unfortunately, such piece of information is not available, and its exact inference through the BN would be computationally too expensive. We propose to sample the DES-BN in order to estimate the most probable value of Z ′ N −1 by exploiting the structure of the DES-BN and the only exact information available, i.e., the given evidence. Given the estimated value of Z ′ N −1 it is possible to backwards iterate the same procedure for the remaining N −1 rounds until the construction of a probability distribution for all subkeys. Such attack is described by Algorithm 3. At each round, multiple evidences are exploited to attack the Feistel function, in order to find the most probable subkey.
The algorithm to sample an objective node consists in sorting all nodes of the Bayesian Network according to its topology, and then sampling the probability distribution of all nodes that precede the objective node and finally sampling the objective node. In order to estimate the most probable value of Z ′ X , in a given round X , our algorithm starts from the exact knowledge of P and P * and follows all causal links in the path to the Z ′ X , drawing a random value for each unknown parent node. This procedure allows to obtain a possible value for Z ′ X . The iteration of such procedure produces a set of samples of Z ′ X , and by analyzing the resulting histogram it is possible to select the most frequent sample. With an adequate number of samples the histogram approximates the probability distribution of Z ′ X , thus the most frequent sample can be considered an approximation of the most probable value. This sampling strategy is described in Algorithm 4.

VI. PERFORMANCE EVALUATION
A first assessment of the performance of the proposed approach concerned the evaluation of the complexity of the four algorithms it consists of.
The computational complexity of the algorithm to attack a single SBox (Algorithm 1) is O SBox = O (2 b | |), where | | is the number of exploited evidences, and b is the number of bits composing the key block accepted as input by one of the eight S-Boxes, i.e., b = 6.
The evaluation of the probability distribution during the attack to the Feistel Function, according to Algorithm 2, has a complexity O Feistel = O(n s * 2 b * | |), where | | is the number of exploited evidences in the attack on a single S-Box, n s is the number of S-Boxes, and b is the number of bits of the key block used by one of the eight S-Boxes, i.e., n s = 8 and b = 6.
The complexity of the sampling procedure (Algorithm 4) depends on the number of samples required to obtain the convergence of the probability distribution, i.e, M , and on the round to be sampled, i.e., X . The upper bound of such complexity is determined by considering the last round, i.e., X = N , as in the following equation: It is worth noting that the samples generated during the graph descent can be reused during the backtracking, thus obtaining a more efficient procedure than the expanded Algorithm 4.
The computational cost for attacking the whole DES (Algorithm 3) is expressed by the following equation: where |ϒ| is the number of elements constituting the evidence set, M is the number of samples required by the sampling algorithm, N is the number of round of DES, and b si the number of input bits to a single S-Box. Since b = 6 and N = 16, it follows that 2 b and N 2 can be considered as constant. Consequently, the computational complexity can be expressed as follows: Such result is coherent with the expected complexity of a chosen plaintext attack, which directly depends on the number of plaintext-ciphertext pairs. Another set of experiments was run in order to find the number of plaintext-ciphertext pairs needed to attack each of the 8 S-boxes by means of the Algorithm 1. Tests were executed on a multi-core server equipped with 4 Intel Xeon 2.00 GHz by reporting the time-to-succeed (in milliseconds) when using 10 different random keys. Results, shown in Table 1, indicate that on average 3 plaintext-ciphertext pairs are needed to accomplish the attack on every S-box. It can be observed that the time-to-succeed (TTS) are in general very low, and no noticeable variations are evident as the random keys and the S-boxes vary. This aspect was further inspected by evaluating the average TTS (see Fig. 9) and the  corresponding variance values, which are about 10 −3 ms for each experiment.
Finally, we extended the experimental evaluation to four distinct versions of DES reduced to three, four, five, and six rounds, respectively. Results we obtained (see Table 2) are comparable with the performance of the original differential crypyanalysis [10], [34]. In particular, for each variation of DES, we considered the average execution time (Time), the number of chosen plaintext-ciphertext pairs (Texts), and the number of required samples obtained by the sample_Z ′ X algorithm (Samples). It is worth noticing that changing the number of available plaintext-ciphertext pairs significantly impacts on the number of samples required to accomplish the attack. The values reported in Table 2 are those that minimize the computational complexity of the whole attack (Eq. 25).
This preliminary assessment leads us to conclude that the number of plaintext-ciphertext pairs required to attack a full 16-round DES is not lower than the threshold of 2 47 that exists for the standard differential attack approach.

VII. CONCLUSION AND FUTURE WORK
In this paper, we proposed a new formulation of differential cryptanalysis through Bayesian networks, a framework for performing probabilistic inference that is widely adopted in the field of machine learning. Exploiting such model we designed an algorithm for attacking DES through approximate inference on such Bayesian Network model. Our preliminary experimental evaluation, performed on a version of DES with a reduced number of rounds, showed that the proposed method is equivalent to the original differential cryptanalysis, with respect to required input data and convergence time. Beyond its effectiveness, the computational aspect represents the main limitation of the approach. Indeed, the Bayesian framework, in its current form, does not perform significantly better than other traditional cryptanalysis approaches. However, the formulation of the attack using Bayesian Networks gives several insights for improvement. To be more specific, we plan to evaluate more advanced forward sampling techniques, such as importance sampling, in order to verify the possibility to reduce the convergence time and to minimize the sample inputs. Furthermore, since multiple evidences are mutually independent, the reduction of the convergence time can be achieved by exploiting a massive parallel architecture. Finally, although the hypothesis of mutual independence of subkeys allows to reduce the computational cost, it introduces many contradictory hypothesis about subkeys of different rounds. In a future work we will investigate the introduction of a new BN model modeling the linear relationship among subkeys. Open Access funding provided by 'Università degli Studi di Palermo' within the CRUI CARE Agreement