HMNT: Hash Function Based on New Mersenne Number Transform

In the field of information security, hash functions are considered important as they are used to ensure message integrity and authentication. Despite various available methods to design hash functions, the methods have been proven to time inefficient and have security flaws (such as a lack of collision resistance or susceptibility to birthday attacks). In the current study, we propose a novel hash function scheme based on a new Mersenne number transform. The suggested hash function called Hash Mersenne Number Transform (HMNT) takes an arbitrary length as input to generate a hash value with variable lengths (128, 256 and 512-bits or longer). The proposed scheme is evaluated in terms of the sensitivity of the hash value to the message, secret key and image, distribution of hashes, confusion and diffusion, robustness against collision and birthday attacks, alongside flexibility. Based on the simulation outcomes, the suggested scheme possess high sensitivity to the original message, the secret key and images, along with strong collision resistance. In conclusion, the proposed hash scheme is simple and efficient compared with the existing hash functions, making it viable for practical implementation.


I. INTRODUCTION
Cryptographic hash functions, one of the most important cryptographic primitives can be used to ensure the security of many cryptographic applications and protocols, including message authentication code, integrity, digital signature and random number generation [1], [2]. A hash function is able to take a message of arbitrary length to produce a fixed-length code (or hash value) [3].
To ensure efficiency, a hash function must satisfy three security properties, namely: (i) collision resistance (i.e. it is computationally infeasible to find any two different input messages m and m' with the same output hash value, h(m) = h(m')); (ii) preimage resistance (i.e. it is computationally infeasible to find any input message which is hashed to the given output hash value); and (iii) second preimage (i.e. it is computationally infeasible to find any second input that has the same output as any specified input [4], [5]. Among the many algorithms designed for the implementation of the hash function, the Race Integrity Primitives Evaluation Message Digest (RIPMD), Message Digest 5 (MD5), The associate editor coordinating the review of this manuscript and approving it for publication was Chien-Ming Chen . Secure Hash Function 1 (SHA-1) and Secure Hash Function 2 (SHA-2) are the most preferred. The strength of these algorithms is based on the use of block ciphers, logical operations and the number of rounds [6], [7]. However, the abovementioned algorithms have been found to be vulnerable to collision and other types of attacks. For example, Wang et al. in 2004 discovered that MD4, MD5, RIPEEMD, and HAVAL-128 were weak against collision attacks [8]. In addition, successful collision attacks against SHA-1 were also reported [9], where large companies such as Google and Microsoft announced plans to abandon SHA-1 as a result of the collision attacks [10]. Neither MD5 nor SHA-1 is robust against collision attacks, therefore, National Institute of Standards and Technology (NIST) announced gradual elimination of SHA-1 [10] and its replacement with the SHA-2 family, a collection of several different hash functions (i.e. SHA-224, SHA-256, SHA-348 and SHA-512) [4].
Contrary to the presumption, the occurrences of collision and some partial attacks were identified in SHA-2 [11], [12]. Hence, these algorithms were not preferred to ensure integrity since they are not as time-efficient as SHA-1 [13]. Following several demonstrations of successful collision attacks, NIST reported a standard secure hash function called Secure VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Hash Function 3 (SHA-3) in 2015. This function is based on the KECCAK algorithm, which was selected as the winner of the SHA-3 cryptographic hash algorithm competition by NIST [14]. Unfortunately, the first potential collision attack on SHA-3 has been recently presented [15] and the time required for hashing was greater than that of SHA-2 [3].
Since the traditional hash functions are no longer safe, the need to design a novel hash function has become a priority in the information security research. Although many hash function algorithms were proposed by researchers, some are based on chaotic maps where most of these algorithms use floating point representation for their digital chaotic maps. These algorithms also have high computational complexity, as they have also been detected to suffer from interoperability problems [16]. Unfortunately, despite the effort in designing an efficient hash function, some of the current chaos-based hashing scheme were proven to be insecure and less efficient compared to the classical hash function [17], [18].
Therefore, this study aims to design a hash function scheme based on a new Mersenne number transform (NMNT) that is both statistically secure and reasonably efficient.
This paper is organized as follows: Section II provides a brief introduction to the NMNT. Section III explains the transform parameters. Next, section IV describes the steps of implementing the proposed hash function scheme. While section V presents the security and performance evaluation of the proposed hash function. Then section VI compares the proposed scheme with other hash functions. Finally, section VII presents our conclusion.

II. THE NEW MERSENNE NUMBER TRANSFORM
The NMNT is part of the number-theoretic transform (NTT) family of algorithms and was introduced by Boussakta and Holt in 1995 [19]. It is defined as the modulo of the Mersenne numbers, where arithmetic operations are simple equivalents to ones' complement. NMNT also possesses the property of cyclic convolution used for fast calculation of error-free convolutions [20]. Furthermore, it has a long transform length with a power of two facilitating fast algorithms. On the other hand, NMNT can be used in one or several dimensions. Moreover, NMNT has several inherent advantages, such as its sensitivity to slight input variation, the long transform length and a variable block size [21]. These properties can be exploited to design a hash function that is more secure and efficient.
The forward one-dimension NMNT, X (k) of an integer sequence x (l), with a length equal to L is defined as in (1): where Mp is the Mersenne prime in the form of; Mp = 2 p − 1 (for p = 2, 3, 5, 7, 13, 17, 19, 31, . . .), and; The values of β 1 (lk) and β 2 (lk) in (3) and (4) are calculated for a maximum transform length L = 2 p+1 . For transform length less than that, their values can be obtained using the following (7) and (8): where Re() and Im() are the real and imaginary parts of the enclosed term, respectively, d is an integer with a power of two.

III. CALCULATION OF THE TRANSFORM PARAMETERS
The calculation of the transform parameters starts by choosing a prime number (p). The value of the prime number depends on the desired transform length and dynamic range. For example, let us choose for simplicity a prime number p = 5. The modulus for the chosen prime is M p = 2 5 −1 = 31 and the maximum transform length, L max = 32,q =2 p−2 =2 3 = 8. Using (5) to calculate α 1 and α 2 as in (9) and (10) respectively: The initial values of α 1 and α 2 can be (8,20), (−8, 20), (−8, −20), (8, −20). These values are for the transform length, L =2 p+1 = 64, and they vary according to the transform length. Selecting pair (8,20), the corresponding α 1 and α 2 for transform length 2 p = 32, can be calculated as in (11) and (12): Hence, β 1 (lk) and β 2 (lk) can be computed by (13) and (14) as follows: The same procedure can be replicated to estimate other transform for different transform size and moduli.
This numerical example illustrates all the required calculations to transform a string of numbers from one form to another. Let's assume an input array X containing four elements, each element is presented in decimal, X = [72 105 32 32]. This array can then be transformed to another form using the NMNT.
The first task is to choose the modulus which should be a Mersenne number in the form of Mp =2 p −1. The modulus should definitely be higher than any elements in the input array X. Thus, the minimum Mersenne prime number that can be selected in this example is p = 7 which makes the modulus Mp = 127. The input array is 4 and hence, the transform length is chosen to be L = 4, then, α 1 = 0,α 2 = 1. The next step is to compute β(l) using (3) and (4).
Then, applying (1) to the input (X = [72 105 32 32]), the array elements will be transformed in the following way: The first element is: The second element is: The third element is: The fourth element is: Finally, the transform output of the input array is given by:

IV. DESCRIPTION OF THE PROPOSED HASH FUNCTION SCHEME
In this section, the proposed hash function scheme is described in detail. The proposed hash function in this study is called HMNT. It takes an input message M of arbitrary length to generate a variable hash value H. Usually, HMNT supports three lengths of hash values, i.e. H = 128, 256 and 512 bits or longer. The HMNT process consists of the following steps: Step 1: Convert the input message M into corresponding ASCII code value.
Step The shortage in the last block is padded with the equivalent number of space characters in the ASCII code, which is 32.
Step 3: The secret key K is a series of characters, that modify the input message M . These characters also convert into corresponding ASCII code values. If the character length is less than the length of the hash value (n), the block is padded with the equivalent number of space characters in the ASCII code. Then, elements are added one by one to each block of the input message M .
Step 4: Upon modifying the input message using the secret key K , NMNT (a mathematical formula that performs mathematical operations to transfer each block of the message to the transform domain) is applied to each block in the input message as explained in Section III.
Step 5: The final hash value H of message M is obtained by the summation (element-by-element addition) of transform output NMNT to each block. The structure of the proposed HMNT hash function is illustrated in Fig.1.

V. SECURITY AND PERFORMANCE EVALUATION
In this section, the proposed hash function HMNT is evaluated in terms of the sensitivity of the hash value to the message, secret key and image, confusion and diffusion, the distribution of hash value, its collision resistance, its resistance to birthday, exhaustive key search and meet-in-themiddle attacks along with its flexibility. The results obtained from the simulation are then compared with some of the existing hash functions.
In order to evaluate the performance of our proposed scheme, we adopted methods that were used in previous literature [4], [6], [22]- [24]. The randomly used input message M and the secret key K in this evaluation are as follows: M : ''Cryptographic hash functions is one of the most useful primitives for data security, which offers message authentication, data integrity, and digital signature.'' K : ''abcdefghigkl12345''.

A. SENSITIVITY OF HASH VALUE TO THE MESSAGE
Theoretically, a good hash function must be sensitive to slight changes in the input message. In particular, any small change in the input message should lead to a 50% difference in the hash value, i.e. a Hamming distance of approximately n/2 (where n is the length of the hash value) between the two hash values. Therefore, to demonstrate the sensitivity of the proposed hash function scheme, we constructed several different messages by modifying the input message M given in Section V. We calculated the hash values of all resulting messages and compared under the following five conditions. C1: The original message M is the same as that given in Section V; C2: The first character ''C'' in the original message is changed to ''c''; C3: The full stop ''.'' at the end of the original message is changed to a comma '',''; C4: The word ''hash'' in the original message is changed to ''Hash''; C5: Space is added to the beginning of the original message. The secret key, K given in Section V is fixed for all conditions.   Table 2 and Table 3 summarize the results obtained for the corresponding lengths of 128, 256, and 512-bits hash values, respectively, in hexadecimal format for each condition. The resulting values of different bits are compared to the hash value obtained for C1, as well as the percentage of number of bits changed.
Based on the hexadecimal representation of the obtained hash values, a slight difference in the original message M causes a huge change in the hash value by about 50%. Thus, proving that the sensitivity of this proposed scheme is very high. The original input message M given in Section V is fixed for all conditions. Table 4, Table 5 and Table 6 present the results obtained for the corresponding lengths of 128, 256, and 512-bits hash values, respectively, in hexadecimal format for each condition.
The tables also included the different bits of the hash value compared with the hash value obtained for C1 and the percentage of number of bits changed.
The results summarized in Table 4, Table 5 and Tables 6, clearly demonstrate that a small change in the secret key causes a significant change in the hash value. Hence, indicating that the proposed hash function is very sensitive to the secret key.

C. SENSITIVITY OF HASH VALUE TO THE IMAGES
In order to evaluate the sensitivity of the hash value to the image, a gray-scale Lena image with 256 × 256 image size (Fig. 2) is applied under three different conditions as follows:  The secret key, K given in Section V is fixed for all conditions.
The corresponding hash values of length 128, 256, and 512 bits in hexadecimal format for each condition; the different bits of the hash value compared with the hash value obtained for C1, and the percentage of the number of bits changed are listed in Table 7, Table 8 and Table 9. Based on the hexadecimal and binary representations of the obtained hash values, any modification in the original image leads to significant differences in the hash value. This observation yet again proves the sensitivity of our proposed hash function, HMNT.

D. DISTRIBUTION OF HASH VALUE
Uniform distribution of hash value which is regarded as the most important properties of a hash function is directly related to the security of the hash function. In this section, we used two-dimensional graphs to present the distribution VOLUME 8, 2020   The extreme case of an ''all zero'' message with the same length is selected for a comparison in this study.
The distributions of the message are observed in Fig. 5(a) and the distributions of the corresponding hash values in Fig. 5(b), 5(c) and 5(d) are also uniform.
In short, based on the simulation results, no information about the message remained following the confusion and diffusion.

E. STATISTIC ANALYSIS OF CONFUSION AND DIFFUSION
In cryptography, confusion and diffusion carry their own distinct definitions. Confusion defines the relationship between a message, whereby its corresponding hash code is complex and difficult to predict. Diffusion, on the other hand, explains that the hash value is extremely dependent on the message [23]. Therefore, for good diffusion, a one bit modification on an original input message will lead to a 50% change in the probability of each output bit. In order to analyze the confusion and diffusion capabilities of the proposed hash function, we have performed the following experiment. First, a random message M was selected and its hash value was calculated. Secondly, a single bit in the message M was randomly chosen and toggled to compute a new hash value. Finally, the two hash values were compared bit by bit, where the count number of the changed bits were marked as B i . This experiment was repeated N times with a different length of hash values (128, 256 and 512) respectively. The evaluation of diffusion and confusion capabilities usually require six statistics that are defined as follows: Minimum number of bits changed: Maximum number of bits changed:  Mean number of bits changed: Bi (17) Mean changed probability: Standard variation of the changed bit number: Standard variation of the changed probability: where B i denotes the changed bit number in the i th test, N indicates the total number of the experiment and h represents the length of hash value. Table 10, Table 11 and Table 12 were obtained by changing the single bit in original message M (given in Section V) and by executing the proposed hash function N times (for N = 256, 512, 1024 and 2048) in order to generate hash values with different size (128, 256, 512-bits). The number of changed bits between the original hash value and a new hash value is computed every time.  According to the data presented in the Table 10,  Table 11 and Table 12, we can conclude that the proposed hash function generated the mean number of bits changedB and the mean changed probability P values are very close to the ideal values (i.e. 64, 128, 256 bits, half of the length of hash value) and 50%. Furthermore, the standard variation of B and P are very small indicating storing capabilities for confusion and diffusion of the proposed hash function.

F. RESISTANCE TOWARDS COLLISION ATTACK
Two different input messages producing the same hash value is called a collision. In general, a property for a good hash function must possess is collision resistance. Hence, to analyze the collision resistance of the proposed scheme, the following experiment was carried out. First, a random message was generated and its hash value was calculated. Followed by a random modification of one bit of the original message to estimate the new hash value of the modified message. Then, both the hash values were transformed into hexadecimal format. Both hash values were compared and the number of equal two-hexadecimal characters at the same location was estimated. This count is referred to as the number of hits, N h .
For example, the hexadecimal hash value of ''abcdefghijk'' is ''19 0f 17 0b 14 62 18 40 05 4e 07 d4 14 5a 00 1d''. If 'a' (01100001) is changed to 'A' (01000001) the hexadecimal hash value of the new message ''Abcdefghijk'' becomes ''0e c4 1b ac 00 90 01 62 19 9e 1a 75 19 e8 06 ed''. Since there are no equal two-hexadecimal characters at the same location, N h = 0. This test was repeated N = 2048 times for hash values lengths of 128, 256 and 512-bits. Table 13 tabulates the numbers of hits, N h where the two-hexadecimal characters are equal at the same location. The maximum number of equal two-hexadecimal character in proposed hash function at 128 and 256-bits is only two, while at 512-bit it is three. These numbers indicate a very low collision probability. In addition, the absolute difference, d, between the 128, 256 and 512-bits original and modified hash values were also calculated using the following (21): where a i and b i are the ASCII characters of the original and the new hash values at position i respectively. The function  Table 14 respectively. The mean values of d are very close to the theoretical values [25], suggesting strong collision resistance capability of the proposed hash function, HMNT.

G. RESISTANCE TOWARDS BIRTHDAY ATTACK
A birthday attack is a type of attack that is independent of the construction and can be applied on any hash function algorithm [26]. The attacker in this attack seeks to find two distinct messages (M , M ') that have the same hash values h within fewer than 2 n/2 trials (where n is the length of hash value) [4]. Thus, for n = 128, 256 and 512-bits, the proposed VOLUME 8, 2020

I. RESISTANCE TOWARDS EXHAUSTIVE KEY SEARCH ATTACKS
An exhaustive key search attack can be applied to any hash function that employs a secret key as an input. In a keyed hash function, if the attacker has access to a message/hash value pair, then the key can be found through exhaustive searching. So, on average, the attacker needs 2 k−1 tries, where k is the size of the key. The proposed scheme is flexible, allowing the size of the secret key to be tuned. If the size of the key is set to 512 bits, the difficulty of the attack is 2 512 . Since k = 512 bits, the proposed scheme is immune against this kind of attack.

J. FLEXIBILITY
On the other hand, the proposed scheme is also designed to manage problems including the length of the hash and resistance against common attacks like a collision. Since the proposed scheme is highly flexible, it can be used to produce hashes with the length of 128, 256 and 512-bits or longer, unlike the traditional fixed-length hash functions such as MD5 and SHA-1.

VI. COMPARISON WITH OTHER HASH FUNCTIONS
This section discusses the assessed comparison between the proposed hash function with existing and standard hash functions such as SHA-2 and SHA-3, that is based on statistical performance, collision resistance and speed. Table 15, Table 16 and Table 17 present the comparison of statistical performance between the suggested hash function and recent hash functions. The results outlined in Table 15 are based on N = 2048 random test and 128 bits hash value, while Table 16 for N = 2048 random test and 256 bits hash value, and Table 17 focuses on N = 2048 random test and   512 bits hash value. The analyses revealed that the statistical performance of the proposed hash function is very close to that of an ideal hash function algorithm. Furthermore, in comparison with the existing hash function, the proposed scheme performs statistically better than most of the hash algorithms presented in all the tables discussed in this section.

B. COMPARISION OF SPEED ANALYSIS
In order to evaluate the speed of the proposed hash function with varying hash lengths (128, 256 and 512-bits) for different sizes of input message, we implemented the proposed hash function in C# programming language on a device with an Intel Core i5-3110M CPU, 2.4 GHz 4 GB RAM and Windows 7 OS to calculate the hashing time (HT) in millisecond and the hashing-throughput (HTH)(Mb/s). Besides that, the number of cycles to hash one BYTE NCpB (Cycles/Byte) is estimated using the formula in [29] as follows HTH (MBytes/s) = Message size(MBytes) Average hashingtime(s) NCpB(cycles/Byte) = CPU (Hz) HTH (Byte/s) The outcomes obtained are summarized in Table 18. Furthermore, the speed of the performance is compared between the proposed hash function with some of the recent hash functions, in terms of the NCpB along with their specified platforms as depicted in Table 19. Based on the results, the NCpB of the proposed HNMT is much faster than that of the NCpB obtained by Abdoun's [29] and the standard hash function SHA-2. Table 20 and Table 21 represent the comparison between the proposed hash function with existing hash functions in terms of the total number of position where the equal characters are identical in the 128 and 256-bits hash values when N = 2048.

C. COMPARISION OF COLLISION RESISTANCE
As described in these tables, the values yielded by the proposed hash function are in agreement with some of those presented by existing hash schemes. Hence, the proposed hash function has very low collision.
Next, Table 22 summarizes the comparison between the proposed hash function with some of the recent hash functions available in the literature. The comparison is made in terms of the mean value of d of the two hash values for    128 bits where N = 2048. The finding indicated that the mean of absolute difference for the proposed hash function is very close to the ideal value given in [25] than other hash functions. Hence, the proposed hash function yields a stronger collision resistance than most of the schemes used in this comparison.
In addition, Table 23 presents a comparison between the proposed hash function with the standard hash functions  namely SHA-2 and SHA-3, in terms of hash size, block size and collision occurrence.
Based on the values recorded in Table 23, the properties of the proposed hash function are comparable with those of other modern hash functions namely SHA-2 and SHA-3. Hence, the proposed HMNT is a viable candidate for a new hash function.

VII. CONCLUSION
This paper proposed a novel hash function scheme based on NMNT, called HMNT. It took an arbitrary input message to generate hash values of 128, 256 or 512-bits. The proposed hash function was evaluated in terms of the sensitivity of the hash value to the message, image and secret key, the distribution of hash values, statistical performance, the resistance of the scheme to birthday attacks and collision, along with the comparison with available hash functions. The results indicated that the suggested hash function scheme has a higher sensitivity to the original message, image and the secret key, and strong collision resistance. Moreover, the results also demonstrated that the proposed HMNT is flexible and efficient, hence, can be applied for authentication to ensure data integrity.