Efficient Image Encryption Based on New Substitution Box Using DNA Coding and Bent Function

This study contributes to creating an unbreakable S-Box based on a strong bent function expanded by DNA sequences and investigates and analyzes the strength of the proposed S-Box against major standard criteria and benchmarks, such as interpolation attacks, algebraic attacks, avalanche effect, nonlinearity, and period. The outcome of the tests shows that the proposed S-box has good security, as well as it is passed all the randomness tests. On an average, the results after the tests applied have been come with SAC = 0.50122, NL = 112, BIC = 103.40625, and an iterative period with a maximum value of 256. The complexity of the proposed S-Box increased with an algebraic expression of 255 terms, which implies an algebraic attack resistance of $2^{160}$ . Based on the proposed S-Box, a candidate image-enciphering scheme is suggested to prove the strength of the S-Box. The analysis of the experiments that applied two modes of images, grey and RGB images, supports the scheme’s robustness against different differential and statical attacks using standard criteria such as correlation coefficient analysis, information entropy, histogram analysis, unified average change intensity, number of pixels change rate and many others. This enforces its capability for use in modern-day cryptosystems that are utilized in multimedia data exchange.


I. INTRODUCTION
A. BACKGROUND Modern-day information technologies are in acute need to be protected against different security threats. With the significant development of these technologies, complex security issues have always been present. The information privacy/data must be protected by keeping it secret, which can be achieved by converting it into an unreadable form [1]. Cryptography is a well-known science that is responsible for fulfilling this process. It aims to protect this data from exploitation, alteration, or being missed and make sure that the intended receiver can comprehend the message [2].
For the pre-mentioned purpose, different symmetric and asymmetric ciphers have been designed. The symmetric The associate editor coordinating the review of this manuscript and approving it for publication was Jun Wang . ciphers which are used in a large domain fall into two primary categories: stream ciphers and block ciphers. In the former, the plaintext is encrypted in a bit-by-bit way, but in the latter, the plaintext block with a fixed size of a number of bits is encrypted simultaneously [3] For any cryptographic algorithm, it is important to have the confusion property in the ciphertext, which is related between ciphertext and plain text. One of the known techniques used to provide this is the Substitution Box (S-Box) [2]. The S-Box, known as the nonlinear transformation, is of the utmost importance in all different types of symmetric encryption algorithms [4]. There is a candid link between security and confusion as the confusion level in ciphertext indicates its robustness [5].
The National Institute of Standards and Technology (NIST) has admitted several criteria to judge the strength of S-Box, such as the strict avalanche criterion, non-linearity, and bit independence criterion [6]. Most of the properties depend on linear components that are composed of n-parameters called boolean functions, which have several methods to be calculated, like Univariate Polynomial Form (UPF), Minterms, and Algebraic Normal Form (ANF) [7].
As the S-box design criteria are vulnerable to the different newly invented attacks, the most important challenge that has been concentrated on by the researchers is exploring new techniques to get better performance. This has prompted researchers to use the concept of DNA computing. DNA cryptography, the arising direction of information security, is considered a promising technology for unbreakable algorithms. It is a branch of biology with great potential for storing data based on DNA biology. It contains information about living organisms. DNA is an abbreviation for (Deoxyribose Nucleic Acid) which is a genetic substance of an organism that plays a role in passing genetic traits from the parents to offspring [8]. Organisms possess their own DNA information. DNA is a polymer composed of several units of monomers called nucleotides. Each nucleotide is made up of three components: phosphate group, deoxyribose sugar, and nitrogen bases [9], [10].

B. DEOXYRIBO-NUCLEIC ACID (DNA)
DNA is considered the genetic pattern of living creatures. All cosmetic cells contain a complete set of DNA that is unique to every creature. Small units, called monomers, are combined together to form a DNA polymer. These units are deoxyribose nucleotides. Nitrogen bases, one of the nucleotides' basic components, are Adenine (A), Cytosine (C), Guanine (G) and Thymine (T) [3]. Binary numbers 00, 01, 10 and 11 are used to encode the binary data using four bases (A, C, G, and T). According to this coding, we can replace every eight binary bits with only four characters in DNA coding. Therefore, we must deeply study DNA components/properties in order to be able to analyze its computations. [3] DNA is the cell's memory as it is responsible for retaining all the information that's formed based on the coding of the four characters. Watson Crick proposed a complementary DNA structure. This structure is essentially used for DNA calculations to obtain the base pairs. T and A complement each other, and G and C also complement each other. Each base combines with one sugar molecule and another phosphate molecule. The arrangement of these bases creates the uniqueness of the DNA, which determines the manner of the creature.
The eight conventional rules are shown in Table 1.
The addition and subtraction rules for DNA nucleotides are listed in Table 2 and Table 3, respectively.
In this research, these rules are used while expanding the S-box process.
The remainder of this paper is structured as follows: Section II explains the steps followed to get the proposed S-Box and the analysis of its performance using NIST tests is illustrated in Section III. Section IV presents the proposed    scheme based on the proposed S-Box to protect multimedia data, and its subsections that illustrate the analysis against various known types of attacks.

II. PROPOSED NEW S-BOX
In this section, a new highly non-linear S-Box is generated depending on high non-linear bent functions. The S-Box is a one-to-one function that substitutes a byte with its corresponding one. It is an invertible function that can be obtained using a few transformations.
1. An affine transformation is applied, which is defined by: a 4 a 3 a 2 a 1 a 0 a 7 a 6 a 5 a 5 a 4 a 3 a 2 a 1 a 0 a 7 a 6 a 6 a 5 a 4 a 3 a 2 a 1 a 0 a 7 a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 a 0 a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 1 a 0 a 7 a 6 a 5 a 4 a 3 a 2 a 2 a 1 a 0 a 7 a 6 a 5 a 4 a 3 a 3 a 2 a 1 a 0 a 7 a 6 a 5 a 4 2. The multiplicative inverse of the result computed Y : Y = Y −1 inGF 2 8 , that's defined as follow: 3. Apply affine transformation in 1 for the second time: a 4 a 3 a 2 a 1 a 0 a 7 a 6 a 5 a 5 a 4 a 3 a 2 a 1 a 0 a 7 a 6 a 6 a 5 a 4 a 3 a 2 a 1 a 0 a 7 a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 a 0 a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 1 a 0 a 7 a 6 a 5 a 4 a 3 a 2 a 2 a 1 a 0 a 7 a 6 a 5 a 4 a 3 a 3 a 2 a 1 a 0 a 7 a 6 a 5 a 4 The generated S-box is presented in Table 4. 4. Now, the values are converted into a binary form, and its length must be a multiple of 8. Otherwise, zeros are added to the left to adjust the number. 5. The next step is to replace each double bit with one DNA code, i.e., in code 8, 00 is substituted with T, 01 with G, 10 by C, and 11 by A. 5. Using the eight aforementioned codes, we can get the following different eight-S-boxes written in the tables of VI. Appendix from Table 25-Table 32.

Input
Read a, b, c, and IP Output S-Box of size (8 × 8 ).
Apply affine to number i 3 Substitute affine of i in Equ.1: Repeat step 3 to get new Y value using the same values of a, b, c. 7

S-Box[i] =Y 8 End For 9 Return S-Box
In this step, the DNA addition operation is used based on the additional rules in Table 2. The addition VOLUME 10, 2022  operation is implemented between every two characters in DNA sequence 1 and DNA sequence 2 resulting from the previous step. In this step, the DNA sequence size is reduced to analysis the S-Box.

III. THE PROPOSED S-BOX PERFORMANCE ANALYSIS
The analysis of the S-box is proceeded by using some wellknown tests such as NL, SAC and BIC. These tests are dynamic properties that address the relationship between plaintext and ciphertext changes. The ANF method, which is used to get the Boolean function, is represented as a polynomial in n-variables, the input binary bits, with terms of their input bits and then these terms are bitwise summed. Each of the aforementioned tests is performed based on the Boolen function and will be illustrated in brief as in the following.

A. THE ALGEBRAIC EXPRESSION
The security of the standard AES S-Box is questionable owing to its such low complexity. To eliminate the weakness of these simple algebraic expressions which its reason   was illustrated in [11], the proposed S-box was improved by applying multiple steps of transformation not only one. In the proposed S-box, by using the irreducible polynomial P (x) = x 9 + x 4 + x 3 + x + 1, the affine transformation matrices and affine constants, we notice that the complexity of the algebraic expression is increased from 9 to 255 terms, which has the same ability to resist differential cryptanalysis. The workload of grade 255 is considered to be very large. The simplest and most common method is to replace the 256 S-Box values in Table 5 with the Lagrange interpolation formula: and substitute the middle-value is in the equation.
All coefficients of the algebraic expression of the improved S-box can be resolved.
The relationship that links between the coefficients of the proposed AES S-box algebraic expression and Data E shown in Table 5 is defined as follows: The algebraic complexity of the proposed S-Box has multiple terms up to 255. This reinforces the security and complexity.

B. THE ALGEBRAIC CRITERION OF THE BOOLEAN FUNCTION
A good S-box meets a number of criteria, as its non-linear properties determine the performance of the entire block cipher [12], [13]. Therefore, the S-box is considered the core of the entire block cipher. It is worth checking whether the improved algorithm can meet the required performance or not [14]. Different cryptanalysis methods guarantee the resistance of a single S-box cipher with good cryptographic characteristics; therefore, any shortcomings in the S-box can impair cipher security. The S-Box is an 8 × 8 logic functions that functions interact and influence each other. Although these have certain properties simultaneously, S-box reasoning does not have the same properties. Therefore, it is necessary to analyze the algebraic properties of the S-box function.

1) THE ALGEBRAIC ATTACKS RESISTANCE
This quantity reflects the resistance of the proposed S-Box against various algebraic attacks.
Theorem 1 [15], [16]: Given l equations of k terms in GF(2 8 ), the algebraic attacks resistance (AAR) is denoted by and is defined as follows: It was claimed in [17] that should be greater than 2 32 to avoid the shortcomings of the S-box. For the proposed S-box, l = 255, k = 510 terms, and n = 8, we obtain = 2 160 for the proposed S-Box, which explains how much the strength of S-Box is against algebraic attacks.

2) ITERATIVE PERIOD OF S-BOX
The iterative period the of S-Box can be defined as follows: Theorem 2 [18], [19]: Assume that the S-box bent function is denoted by B (n). B(n) fulfills the periodicity if B m (n) = n such that m is any positive.
For every n ∈ GF(2 8 ), let the equation B m (n) = n, the iterative period is deduced for the standard AES S-box to have the results shown in Table 6. Note that the iterative periods obtained were 2, 27, 59, 81 and 87. These periods fulfill 2+27+59+81+87 =256, so no intersection occurs among   the period orbits. It is obvious that the standard S-box has short periods and inadequate distribution, which can result in some hiatus.
For the proposed one, the iterative period is increased to its maximum value until it reaches 255 for any positive number of GF(2 8 ). VOLUME 10, 2022

3) STRICT AVALANCHE CRITERION
The SAC concept was introduced by Webster and Traverse that reflects the variance in the output bits when one input bit is changed. Approximately half of the output bits change when only one input bit is complemented.
Theorem 4 [20]: . . , f m (x)) from GF (2) m to GF (2) m is a Boolean function of multiple outputs. the distance to SAC is denoted by DSAC(F) and it is defined as follows: When DSAC = 0, this implies that F (x) fulfills the SAC. The existing S-Boxes do not satisfy SAC.   Table 7, and then its DSAC is obtained to have DSAC (proposed S-Box) = 316

The SAC of the proposed S-box function
According to previous results, however, the SAC is not satisfied, but the rate of changing in the output bits is acceptable as it has bounds near 0.5 * 2 m = 128 bit.

4) BIT INDEPENDENCE CRITERION
The BIC parameter was introduced by Webster and Traverses. It is used as a standard to check the level of security of the S-Boxes against different attacks [4], [21], [22]. Theorem 5 [18]: . . , f m (x)) from GF (2) m to GF (2) m is a Boolean function of multiple outputs, The BIC computation is made by getting m×m -dimensional matrix BIC (F) = b lk such that l, k, then b lk is defined to be:

5) NON-LINEARITY
Nonlinearity (NL) is one of the most important criteria in the cryptosystem, which was introduced for the first time in the 1980s by Meier and Staffelbach and later in the early 1990s by Nyberg. As it is known, the S-Box is the non-linear part of the cryptographic algorithm that gives it the ability to withstand differential and linear cryptanalysis. A higher nonlinearity value is an indication of its resistance against differential and linear attacks. Mathematically, nonlinearity is calculated using Walsh's spectrum [3]. Theorem 6 [18]: . . , f m (x)) from GF (2) m to GF (2) m is a Boolean function of multiple outputs, the nonlinearity that is calculated for m-bit Boolean functions as NL(f i ) is expressed as follow: where u ∈ f m 2 .
where the linear functions from GF (2) m to GF (2) m is defined by L n [x].

IV. PROPOSED IMAGE SECURE SCHEME USING S-BOX
In this section, the proposed encryption scheme based on the prementioned S-Box, presented in Table 3, is illustrated. It is used to encrypt images in two modes: gray scale and RGB images. We employed our S-box to execute the permutationsubstitution operations based purely on the S-box.
The proposed encryption scheme based on the generated S-Box is illustrated below.
Input The Plain-image P of size 3×α× β in RGB mode Output The Enciphered image Proposed Scheme 1 Read the generated S-Box (S) mentioned above in Table 4 as LUT.
2 Split the RGB image into three α × β components. 3 For each frame in P 4 Temp = Key of component (K r , K g , K b ) 5 For i = 0 : α − 1 6 For End For 10 End For 11 End For 12 Combine three components again to get the encipheredimage C

A. STATICAL ATTACK ANALYSIS 1) CORRELATION COEFFICIENT ANALYSIS
A pixel is the base unit of any image. Each pixel can be represented by a value depending on its resolution. The pixel resolution is the number of bits used to define its value; so, the pixel resolution here is 8. As the correlation is the mirror of the image meaningful, whenever the correlation is high, it is an indication of understanding/ having a meaningful visual image. It expresses the relationship between any neighboring pixels, even they are horizontal, vertical or diagonal [39]. For meaningful images, it's said that the neighboring pixels are almost the same. On the other hand, it is desirable to have poor/ low correlation for enciphered images and that's our target [40].
Any coefficient can be computed using the following expression. where α and β represent the width and height of the image, respectively. C ij and P ij are the pixel positions in the cipherimage and their corresponding in the plain-image with i th column and j th row, respectively. P and C are the mean values of P and C.

2) INFORMATION ENTROPY
The information entropy was reported by Shannon in 1948. It is considered a basic concept/ feature in statics [41]. This is a way to measure the randomness nature in the information of the encrypted/ciphered image. The pixel resolution is the ideal value of this criterion, so, in our case, the optimal entropy value is 8 [42]. This can be mathematically calculated as follows: where P(x j ) is the occurrence repetition of each possible color level/ pixel value, L expresses the countable color level for each frame/color, m represents the pixel resolution. From the previous results, it is deduced that the information entropy value of the encrypted image is very close to 8 which is the ideal value.

3) HISTOGRAM ANALYSIS
The histogram shows the distribution of the color levels using the pixel values throughout/within the image plane. It reflects the resistance of an image, especially enciphered ones, against statical attacks [43].
The histograms of both the plain images and their corresponding enciphered ones are shows below. It is clear that the histogram for the enciphered images in all frames is flattened, implying that the equality of the pixel values is repeated. Table 7 and Table 8 illustrate the histogram for images in Gray and RGB modes, respectively.

B. DIFFERENTIAL ATTACKS
One of the attackers' known behaviors to discover the enciphering scheme is to make changes in the plain message and have their corresponding ciphered message. Therefore, the target was achieved after analyzing the data pairs [38]. Therefore, it is important to guarantee that this method is not applicable. This can be achieved when the scheme depends on tiny data exist in the image, so we can be sure that the system VOLUME 10, 2022  is against differential attacks. In order to decide whether our scheme has this feature -dependence on tiny data-or not, a number of tests are taken places.
These techniques check the scheme behavior against a onebit difference in plain-images.

1) UACI AND NPCR
One of the highly recommended tests is the Unified Average Change Intensity (UACI) and the Number of Pixel Change Rate (NPCR). UACI aims to calculate the average difference in intensity between two ciphered images [38]. The higher the value, the better the scheme. The expected theoretical value of the UACI is 33.4635%. This is mathematically computed as follows: where C 1 (i, j) and C 2 (i, j) are the enciphered images and their corresponding plain-images are the same but with a bit change in one of them. The NPCR is denoted to the percentage of different pixels between two encrypted images [44]. The higher the value is, the better the scheme. The expected theoretical value is 99.6094%. This is mathematically computed as follow: C. DATA LOSS During data transmission in a noisy medium, data corruption is a natural behavior that occurs in the cipher-image. It's essential to an have an enciphered-image that's not the same as the plain image.

1) MSE AND PSNR
Mean Square Error (MSE) is a check between the plainimage and cipher-image to determine the encryption level [40], As the larger the value of MSE is, the higher distortion/ error between plain images and its enciphered one. MSE is defined as: Peak Signal to Noise Ratio (PSNR) is a robustness measure of the encipher scheme in noise medium.
where P max is the expected maximum value of the pixel. It is deduced that the smaller the PSNR value is, the higher the difference between the images occurs.

2) MEAN ABSOLUTE ERROR
Average difference in color intensity between the cipherimage and the plain-image. Whenever the higher that value is, this is an indication for the high security of the proposed scheme. MAE is defined as follows:

D. OCCLUSION ATTACK
During the digital transmission process of data through public channels, the data are exposed to be missed. The stolen password of the image is applied by a data-loss attack in which the attackers seek to remove parts of the data [45]. So, various data loss sizes were made to test the level of recovery of the enciphered images. A Raccoon face image was selected as the plain image, and the results after applying the attack are shown in Figure 9.

E. SPEED ANALYSIS
With the current development in data transfer, it is become so important to concentrate on finding enciphering schemes that are able to generate the encrypted data in low computational time, which benefits the real-time applications. In this study, the proposed algorithm with their analytical criteria were implemented using python programming language on Windows 10 OS with Intel (R) Core TM i5-CPU @

V. CONCLUSION
A methodology to generate a robust S-Box based on a strong algebraic base was introduced in this study. The quality of S-Box is augmented to the optimum level by the action of a powerful permutation of S 256 . The features of the proposed S-boxes are compared against a number of recent S-boxes. It is found that our proposed S-box has excellent performance strength compared with almost all other parameters especially DSAC which has a great value, equals to 316, that not another S-Box has. In our upcoming research, we aim to use another bent function, as we have a large number of functions that's reaches 886 various ones, in order to minimize the DSAC value. Because no one has it until now has the optimal value of DSAC that is equal to zero. The proposed S-box is expanded by DNA sequence, and it's    planned to use RNA sequence in the future work in a trial to improve the proposed one. Based on the aforementioned S-Box, a proposed encryption scheme was used to encrypt some standard plain-images to evaluate their encryption performance. The results show that they are sufficiently suitable for use in secure multimedia applications as well as its low    computational time. Its performance is a good response to use it in a live stream secure application like military field that requires high security such as unmanned Aerial vehicles.

ACKNOWLEDGMENT
The authors would like to express their gratitude to Prof. Dr. Alaa Kadhim Farhan for his valuable comments that helped enhance the presentation of this work.