An Efficient Audio Encryption Scheme Based on Finite Fields

Finite fields are well-studied algebraic structures with enormous efficient properties which have applications in the fields of cryptology and coding theory. In this study, we proposed a lossless binary Galois field extension-based efficient algorithm for digital audio encryption. The proposed architecture hired a special type of curve in the diffusion module which depends on efficient elliptic curve arithmetic operations. So, it generates good quality pseudo-random numbers (PRN) and with slight computational efforts, it produces optimum diffusion in the encrypted audio files. For the confusion module, a novel construction mechanism of block cipher has been employed which includes prominent arithmetic operations of binary Galois field inversion and multiplication operations. The suggested scheme generates multiple substitution boxes (S-boxes) by using a higher-order Galois field. Thus, the replacement with multiple S-boxes generates effective perplexity in the data and provides additional security to the ciphered audio. The investigational outcomes through different analyses and time complexity demonstrated the ability of the technique to counter various attacks. Furthermore, as a consequence of a rapid and simple application of the binary finite field in hardware and software, the proposed scheme is more appropriate to be applied for data security.


I. INTRODUCTION
In recent decades, due to the speedy development in science and digital technologies, the role of multimedia data in social life has been increased. Multimedia data are used in various fields such as education, engineering, mathematics, art, advertisement, military, medicine, scientific research, and many more. This excessive growth of multimedia data boosts the importance of multimedia data processing tools and digital documentation. This access to multimedia data through the internet has created inappropriate prospects which are hazardous for the confidentiality and integrity of the multimedia data. To encounter these threats, the domain of multimedia data security gains broad attention. A considerable number of algorithms have been established to protect personal information over open networks. The most prominent field to provide security is cryptography which can be further divided into symmetric and asymmetric key methods. Some of the prominent algorithms such as data encryption standard (DES) [1], international data encryption standard (IDES) [2], triple data encryption standard (TDES) [3], advanced encryption standard (AES) [4], and RSA are widely used for the security purposes and considered as well protected and reliable. Since multimedia data contains a large amount of data that is highly correlated, therefore, the only dependency on the algorithms like AES, RSA, and DES is not good enough for multimedia data security. Since multimedia data contains a large amount of data that is highly correlated, therefore, the only dependency on the algorithms like AES, RSA, and DES is not good enough for multimedia data security. In literature, a considerable number of encryption methods for the security of multimedia data have been introduced. For instance, the encryption algorithm for a digital image depending upon chaotic systems and finite algebra of Galois fields is given in [5][6][7][8][9][10][11][12][13][14]. In addition, more complex algebraic structures such as elliptic curves are widely used for digital image security [16,17]. Recently, Hua, Zhongyun, et al. introduced novel color image encryption schemes that are based on orthogonal Latin squares and parallel compressive sensing and adaptive thresholding sparsification [17,18]. The audio files contain massive data capacity and are somehow different from the other multimedia data. Therefore, there should be a separate algorithm for the protection of digital audio.

Literature Review
In the literature, numerous digital audio encryption algorithms are presented. De Martin and Servetti [19] proposed an encryption algorithm for the encryption of telephonic speech relying based on the perception method in 2002. The author recommended two techniques for the encryption of partial speech. The first scheme was envisioned to have a high bit of rate and low-security capability. Consequently, the cryptanalysis could easily reveal the ciphered speech. But, the second scheme is considered to encrypt additional bitstream, thus provides more security to the ciphered audio. Thorwirth et al. [20] gave an algorithm for the selective encryption technique of perceptual audio coding based on the standard compression in which the author's main focus was on examining the encryption of the encoded MP3 files. Subsequently, Servetti et al. [21] proposed an MP3 audio selective partial encryption algorithm; the suggested algorithm has considerably low time complexity and also preserves the contents of the audio information but unfortunately compromises the quality of the original audio sequences to preserves the perceptual information. Next, in 2004, Bhargava et al. [22] proposed four fast encryption algorithms for MPEG video, where a key is used to randomly change the sign bits of the Discrete Cosine Transform (DCT) coefficients and/or the sign bits of motion vectors. These schemes put on a small overhead to the MPEG codec. Grange et al. [23] introduced a new framework that relies on randomized arithmetic coding for the security of multimedia data. In the recommended framework, the security purpose of multimedia data was achieved by producing some randomness in the arithmetic coding process. In 2008, Yan et al. [24] introduced progressive multimedia data security by scrambling audio data in a compressed domain. In the proposed scheme, the secrete MP3 audio was twisted via a shared secret key before transmission. However, Au and Zhou in [25] showed that the Yan scheme is conquerable against key search attacks. In [26], Neto and Lima presented an encryption scheme for digital audio that relies on cosine number transform. The encryption procedure recursively applies to a block of uncompressed audio data and uses simple overlapping to select the block and produce diffusion in the encrypted data.

Motivation
The majority of these audio encryption techniques have a deficiency of cryptanalysis and insufficient security estimations were achieved to confirm the permanency of these cryptosystems to counter the malicious attacks. For this reason, a strong algorithm is required to enhance the security of audio data against different attacks. Moreover, the development of cryptographic applications on hardware attempted to take benefit of the comfort implementation of Galois fields to boost the performance and abbreviate the costs. These properties of finite fields attract us towards the development of one new algorithm for digital audio data security based on a Galois field.

Our Contribution
In this manuscript, we designed a novel lossless audio data encryption scheme based on arithmetic operations of an elliptic curve over a finite field ℤ and binary Galois filed (2 ). The basic aim of this scheme is to provide a strong algorithm to ensure authentication and integrity. The arithmetic operations in the elliptic curve are performing efficiently, so in the begun of the proposed scheme special type of curve based on the elliptic curve, operations are used to generate a good quality sequence of random numbers. The generated sequence is subsequently used to defuse the matrix of the audio data. The confusion module of the scheme is executed through multiple substitution boxes having higher nonlinearity. The experimental results demonstrate the efficiency of the proposed scheme against various attacks. The rest of this paper is structured as: We introduced the basic notion of the elliptic curve and finite extension field in section 2. The methodology of the proposed encryption technique is presented in section 3. Section 4 represents the simulation and performance results of the proposed scheme. In the last section, we concluded the findings.

A. CONSTRUCTION OF GALOIS FIELD
Let ( , +, • ) be a commutative ring with identity. An ideal is a subring of the ring satisfying the condition ⊆ for every element in the ring . An ideal is said to be a maximal ideal if it does not properly contain in any other proper ideal of . [ ] is a polynomial ring in one indeterminate having coefficients from the field . The ring [ ] is, in fact, a Euclidean domain and hence a principal ideal domain (PID). A polynomial ℎ( ) in [ ] is said to be irreducible if it cannot be written as the product of non-unit polynomials in [ ]. Accordingly for a finite field and maximal ideal < ℎ( ) > generated by degree primitive irreducible polynomial ℎ( ) ∈ [ ], the quotient is a field known as the Galois field, an extension of the field and it is denoted as ( ). The nonzero elements of ( ) forms a multiplicative group known as Galois cyclic group.

B. ELLIPTIC CURVE
An elliptic curve over a finite field is a plot of equation solutions : 2 = 3 + + ( ), where a, b ∈ , satisfy the equation (4 3 + 27 2 ) ≠ 0 ( ). All these points (solutions) with point of infinity (neutral element) form an abelian group, which is denoted by ( ). The formation process of the group is as under.
Furthermore, define + = for all on (5) On the above footprints, one can easily show that ( ) is an abelian group with an identity element . is an isomorphism.

III. AUDIO ENCRYPTION SCHEME
The audio technology is used to store, manipulate, reproduce, and generate the sound using the arrays of the audio signals encoded in digital format. Digital audio also refers to the sample of discreet sequences, which are choosing from the audio wave format. The digital audio data is virtually consisting of discreet sockets that indicate the amplitude of the wave of digital data. In this study, we manipulate the discrete sockets of the digital audio and encrypt the original content of the original audio. The proposed encryption technique is planned to protect the uncompressed digital audio integer 16 (int16) format. We represent the matrix set of the plain audio by of dimension × for ∈ {1, 2}. In the next subsection, we discuss step by step procedure of the encryption scheme in detail.

A. PROPOSED RANDOM NUMBER GENERATOR
The generation of random numbers plays a significant role in various multimedia data security applications. The debauched research comes up with numerous random number generation schemes. An elliptic curve is also the widely used generation of random numbers. In general, the elliptic curve-based random number generation procedure utilizes group law and the arithmetic operation of the elliptic curve. In this section, we are giving an efficient scheme for the generation of random numbers based on the elliptic curve operation. The proposed scheme generates distinct random numbers with enough long periods. At the begun of the encryption procedure, generate a sequence of distinct pseudo-random numbers with a long period greater than the length of the audio data. Along this select a large prime . Then generate the curve : 2 = 3 + through brute force technique. Subsequently, use the following map to transmute points of the curve ( , ) into the field .

Defined by
Where ∈ * is the squared element such that 2 = and ( , ) is the element of the curve . The map is the isomorphism between and by Theorem 2.2.1. The consequential set ( ) is a sequence of random numbers in the field . Afterward, use the sequence of random numbers to shuffle the matrix and get a new data set . In this study, we fixed the elements = 2 and = 99991 and generate a sequence of random numbers by using the above procedure. The generated sequence is then analyzed by the NIST test, the results are tabulated in Table 4.

I. MULTIPLE S-BOXES CONSTRUCTION SCHEME
The S-box plays a significant role in symmetric key cryptography. In general, S-box uses in the substitution module of the cryptosystem and produces confusion in the cipher data. Therefore, the confusion creating the capability of the cryptosystem depends on the quality of the S-box. Since audio contains a large amount of data. So, in the proposed cryptosystem we used multiple S-boxes to produce more randomness in the encrypted data. To construct multiple S-boxes, we introduced a novel S-box construction scheme based on Galois field (2 ). The traditional S-box construction schemes are based on the finite field of order 256. However, the proposed construction scheme for multiple S-boxes generations is based on the Galois field of order greater than 256. Here we discussed the general idea of the construction scheme. Initially, define a bijective map from the Galois field (2 ) onto (2 ). The mapping is defined as follows. : In equation (8) ̇,,̇ and ̇ are the elements of the Galois field (2 ). Afterward, define an inclusion map from the Galois field (2 ) onto (2 ). The mathematical representation is given as follows. : (2 ) ⟶ (2 ) Define as Where ≥ 2 and is strictly greater than . The composition map generates × S-box. With this process, one can generate − number of S-boxes.

II. PROPOSED ALGORITHM
Step 1. Generate a binary matrix having dimension × to identify the location of the negative integers in the matrix of the original audio.
Where , indicates is the sample of the audio data at ( , ) position. The aim of generating binary matrix is to specify the position of the negative samples.
Step 2. Select a prime number > × and generate a sequence of random number via the proposed random number generator we have discussed in section 3.2.1. Afterward, reduce the length of the sequence and then use the new sequence and shuffle the matrix of the original audio. The equation is given as follows.
Where , denote the position of the integer value , in the newly shuffled matrix . The waveform and the spectrogram graph of the shuffled audio are shown in Fig  1(b) and Fig 2(b) respectively. From the figures, one can observe that the permutation step caused optimum disruption in the plain audio.
Step 3. Next, we will convert the entries of the matrix by using the absolute function from the set in the range {−2 15 , 2 15−1 } to the elements of the Galois field (2 15 ). Consequently, get a new matrix .
Step 4. Subsequently, convert the elements of the Galois field (2 15 ) to the elements of the Galois field (2 8 ) and Galois field (2 7 ) by using the following map. : Defined by Where ∈ {0, 1}, by using the above mapping split data in the matrix into two matrices 1 and 2 containing elements of the Galois fields ( 2 8 ) and (2 7 ) respectively, to reduce the time complexity.
Step 5. Divide the blocks 1 into four subblocks. Then generate four 8 × 8 S-boxes using the proposed S-box construction method, which we have discussed in subsection 3.2.2. Afterwar substitute each subblock with a different Sbox and then combine all the subblocks. Similarly, divide the block 2 into four subblocks and generate four 7 × 7 Sboxes using the proposed S-box construction method. Then substitute each subblock with a different S-box and combine all the substituted subblocks. Consequently, get new blocks 2 and 2 .
Step 6. Combine the resultant matrices 2 and 2 using the inverse map of the map, which we have discussed in step 4. The inverse map is given as follows. As a result of the above map, we get a new matrix 1 containing elements of the Galois filed (2 15 ).
Step 7. Then mask each element of the matrix 1 and produce more diffusion in encrypted audio. Firstly, generate a sequence of the nonrandom number of lengths × . Subsequently, use mode operation and convert the elements of the sequence into the elements of the Galois field (2 15 ). 2 ( , ) = ( 1 ( , ) + ( ( , ) −1 ) Where 1 ( , ) and ( , ) are the elements of the Galois field (2 15 ) and ( , ) signify integer position in the matrix. Because of the equation (18) get a new matrix 2 .
Step 8. Eventually, use the binary matrix and convert the entries of the matrix 2 from the Galois field (2 15 ) into the set of integers sixteen {−2 15 , 2 15 − 1}. The mathematical representation is given as follows. The resultant matrix is then converted into the Audio file which is our ciphered audio file. Our new proposed encryption technique is functional to many audio files of various sizes and different characters. The waveform of the encrypted audio is given in Fig. (1). From the figure, it is evident that the waveform of the encrypted is uniform. Accordingly, the proposed technique is proficient in safe the actual content of the original audio. The decryption process of the scheme is the same as the encryption.

IV. SECURITY ANALYSIS
It is mandatory for a standard encryption scheme to counter different kinds of attacks that try to hit the confidentiality, integrity, non-repudiation, and authentication of the data. Here, we evaluate the strength and robustness of the proposed technique against different malicious attacks. These all analyses are performed by using MATLAB 2019(b) on a personal computer. To scrutinize our scheme, we amalgamate different audio models with music, speech, and other characters and encrypt these models with our proposed technique using multiple keys. The wave version of plain, encrypted, and decrypted files is represented in Fig.1. From Fig.1 it is obvious that the amplitude of original and encrypted audios has no resemblance with each other as the encrypted audios are uniform in nature. This depicts that audio is properly encrypted. In addition to this, the waveform of decrypted and the original audio is also similar as made known in Fig. 1 (d). In upcoming sections, the proposed scheme undergoes different analyses which include histogram analysis, keys space analysis, key sensitive analysis, and Correlation.

FIGURE 1 Waveform of the (a) original Audio (b) Permuted Audio. (3) Encrypted Audio (d) Decrypted Audio
A. SPECTROGRAM ANALYSIS To perform the spectral analysis of sound, it is recommended to use spectrogram analysis. This analysis is demarcated as two-dimensional graph and different colors represent its third dimension. It is considered as the pictorial illustration of the frequency of the spectrum that fluctuates concerning time. The third-dimension color identifies the amplitude or loudness of the sound at a precise time. The low amplitude is specified by using red and blue colors whereas the bright color indicates the stronger amplitude. The results of the spectrogram analysis of our encryption scheme are given in Fig.2. The spectrogram graphs of original and encrypted audio files are represented in Fig. 2(a) and Fig. 2(c) respectively. The audio file is effectively encrypted which is evident from the uniformity of the spectrogram graph of the encrypted audio files. This encrypted audio file has a strong amplitude and an altogether different spectrogram from the original audio.

B. HISTOGRAM ANALYSIS
To assess the quality of any encryption scheme against statistical attacks, it is recommended to perform histogram analysis. It is most likely that cryptosystems change the original information into noise and generate randomness in the data. It is observed that in an efficient cryptosystem most likely the encrypted data does not offer any information which helps to decipher the encrypted data free from the requirement of the confidential key. In such cryptosystems, the original data is encrypted with similar possible values. Figure 3 represents the outcomes of histogram analysis of our encryption scheme. The histogram of the original audio is graphically represented in Fig. 3(a) and Fig .3(c) and the histogram of the cryptographed audio is made known in Fig  3(b) and Fig 3(d). One can see that the original audio signal histogram is haphazard and heading towards a single point, but the histogram of the encrypted audio file is uniform. It concludes that our technique is shown strength to counter any statistical outbreak and it's extremely hard to extract info from the encrypted information.

C. CORRELATION ANALYSIS
The correlation coefficient is one of the analyses which are performed to evaluate the ability of any cryptosystem to fight against various statistical attacks. As data is strongly correlated in multimedia applications so, a robust cryptosystem must intrude on the correlation among the segment of the data. In this analysis, the focus is to observe the correlation between identical sections of the data. The correlation coefficient is given by: And ℰ( ) = 1 ∑ =1 (24) Where sample at ℎ the position is signified by and indicates the equivalent adjacent sample. Commonly, correlation analyses of the data are performed for horizontal, vertical, and diagonal directions but as our scheme is dealing in audio data so for the single string data only the horizontal direction is taken for correlation analysis. The outcomes of the correlation analysis are shown in Table 1. It indicates that the original audio correlation is equivalent to 1, which depicts the sections in the audio data have a strong correlation. On the other hand, the correlation analysis for the ciphered audio is nearly a value of 0, i.e., the proposed technique analytically intrudes the correlation of the audio segment. Correlation analysis of the original and the encrypted audio is represented in figure 4. It establishes that our scheme gradually minimizes the intercorrelation of the audio file. For this reason, our proposed technique is robust against malicious statistical attacks.

D. INFORMATION ENTROPY
For coded information, the amount of uncertainty is measured by using information entropy analysis. The entropy is directly proportional to the rate of uncertainty i.e., higher uncertainty in encrypted audio files depicts that it has the higher entropy. We can represent entropy as Where ℒ directs the grayscale value of the audio file and ( ) implies the probability of the presence of the greyvalue . For our case, the audio file has a value of 16 in correspondence to the theoretical value of . So, the cryptosystem is considered to be well-secured if the information entropy of the ciphered file is exactly 16. We examine our new proposed scheme by using information entropy analysis and the outcomes are organized in Table 2. It is obvious from the Table that the information value of our Ref. [27] 0.001699 13 Ref. [28] 0.0119 14 Ref. [29] 0.0263 proposed technique is almost equal to 16 for all ciphered audio and hence formed ideal vagueness in the audio file. So, our scheme has the ability to resist entropy attacks.
E. DIFFERENTIAL ANALYSIS For differential attacks mostly, we consider two analyses i.e., the number of pixel change rates (NPCR) and Unified Average Changing Intensity (UACI). They calculate the sensitivity regarding the cryptosystem. A quality cryptographic algorithm must have sensitivity so a minor alteration in the original data produces a massive variation in the cipher data. Both NPCR and UACI analysis tend to assess the sensitivity of the cryptosystem. NPCR and UACI can be given as.
In the above equation, represent the cardinality of the audio data set and ℬ( , ) is given by UACI can be represented as where 2 designates the order of bits in the audio data set.
The satisfactory values of NPCR and UACI rate of the algorithm are nearly equal to 100 and 33.3333 respectively. We gauge the proposed audio encryption technique by using NPCR and UACI analysis and the outcomes are shown in Table 3. Table 3 predicts that the proposed technique tends to negate differential attacks. .

F. SIGNAL TO NOISE RAT IO (SNR)
In [58][59], the signal quality is measured with the help of the Signal to Noise Ratio (SNR). The signal will be more than noise once the value is bigger than 0 dB. The SNR can easily be calculated provided that host and encrypted audio files are available. The formula for the SNR is given as: Where represents the number of samples. Moreover, and are the trials of the host and encrypted audio samples. Table 4 depicts the SNR outcomes of the tested audio files. The negative value of the SNR indicates the strength of the scheme. Table 4 indicates that our scheme has improved negative SNR and hence has a better resistance against malicious attacks.

G. PEAK SIGNAL TO NOISE RATIO (PSNR)
In order to calculate the mean squared error for two vectors namely and can be computed by using the formula: If shows the host audio file and represents its coded audio file, the PSNR can be obtained as follows: where is the maximum value of the stream. The PSNR values for the encrypted tested audio files are computed and listed in Table 4. It is noted that the values are small. Lower values of PSNR is desired for encrypted audio files as it refers to the high level of noise in the encrypted audio files and so strong resistance against attacks.

H. ROOT MEAN SQUARE (RMS) & CREST FACTOR (CF) VALUE
For an audio signal, the average amplitude value is calculated by using the Root mean square (RMS). The RMS is alike standard deviation provided that the input signal mean value equal to zero and it is calculated as follows: The ratio of the peak values to the effective value is named as Crest factor (CF) and this is the waveform parameter. Its main purpose is to find the extremeness of the peaks in a waveform minimum possible value. CF of ratio 0 dB indicates no peaks as DC. Higher CF means peaks. It is given as follows: For the proposed algorithm,

A. NIST STATISTICAL TEST
For cryptographic applications, we studied the sequence created by the proposed random number generator to assess the random number generator. To examine the randomness of this generated sequence, we first change the random sequence into binary as the NIST test is valid for binary data. The NIST statistical test involves sixteen different tests as presented in Table 4. The generated sequence conceded all the randomness tests, which shows that our proposed technique engenders quality random sequences that have compatibility with different audio encryption applications.

V. CONCLUSION
In this manuscript, we offered a lossless audio encryption technique that depends on the arithmetic operation of the elliptic curve and Galois field. Initially, we introduced a novel random number generator scheme, which is used to generate quality random numbers and passed all the NIST tests successfully. The generated random sequence is then used to shuffle the original audio data set. In the confusion phase of the idea, a new S-box construction scheme is deployed, which generates multiple S-boxes without much computational effort. The S-boxes are then used to substitute the shuffled audio. The substitution with multiple S-boxes produced optimum confusion in the encrypted and make capable the scheme robust against differential attacks. The scheme was thoroughly securitized over various simulation analyses. The results of the simulation experiment evidenced that the proposed scheme is secure against various cryptanalysis methods. Accordingly, the proposed scheme is secured and suitable for audio encryption applications.