Synchronization of Tree Parity Machines using non-binary input vectors

Neural cryptography is the application of artificial neural networks in the subject of cryptography. The functionality of this solution is based on a tree parity machine. It uses artificial neural networks to perform secure key exchange between network entities. This article proposes improvements to the synchronization of two tree parity machines. The improvement is based on learning artificial neural network using input vectors which have a wider range of values than binary ones. As a result, the duration of the synchronization process is reduced. Therefore, tree parity machines achieve common weights in a shorter time due to the reduction of necessary bit exchanges. This approach improves the security of neural cryptography


I. INTRODUCTION
Secure key agreement is one of the basic steps in secure channel establishment.The algorithms responsible for the key ex-change must ensure that no eavesdroppers are able to reproduce the secure key.Applied key agreement protocols are based on mathematical operations which have no computationally efficient inversion, e.g.factorization of large number problem or other derived problems.
Quantum computing poses a real threat to applied cryptography systems.Currently used algorithms, based on public-key cryptography approach, offer conditional security.Efficient derivation of a secure key from exchanged fragmentary information may break the security of the key agreement protocol.Currently, there is one known algorithm -Shor's algorithmcapable of factorizing large numbers.Hence, it can extract exchanged keys and break all applied asymmetric cipher cryptography [1].However, a successful implementation of this algorithm requires a quantum computer with the sufficient number of qubits.Some modern cryptography techniques -such as quantum cryptography and neural cryptography -are able to overcome this problem and provide a variety of quantum-proof algorithms.The tree parity machine (TPM) is one such solution.It achieves a key agreement functionality by mutual learning of two artificial neural networks.This paper introduces an accelerated key exchange process of two TPMs by utilizing non-binary vectors at the input.
The paper is structured as follows.Section II presents the architecture of TPMs, the process of mutual learning, secure key agreement protocol, exchanged key length and security of TPMs.Section III describes entropy and its appliance in terms of quality assessment of exchanged key.Section IV explains the methodology of the performed simulations.Section V presents an analysis of the gathered results.

II. TREE PARITY MACHINE
Artificial neural networks (ANNs) are increasingly popular, finding application in fields including security.In [2] the authors introduce a novel approach for the key agreement functionality implemented with neural networks.Such an approach can be also used for error correction in quantum cryptography systems [3].

A. Tree parity machine architecture
A tree parity machine is a two-layered perceptronstructured artificial neural network with discrete weights, binary input and binary output [4].The input vector X = [x 11 , x 12 , . . ., x 1n , . . ., x k1 , . . ., x kn ], K, N, k, n ∈ N ∧ k ≤ K ∧ n ≤ N has KN elements, where K denotes the number of inputs for each neuron in the first layer, and N indicates the number of neurons in the first layer.Every element x kn of input vector X can have one of two possible values, either −1 or 1.
The first layer consists of neurons similar to the McCulloch-Pitts model [5].Every input x kn is connected to the kth neuron and has its corresponding weight.The values of the weights are the only difference from the former model.Every weight w ij can take a value between −L and L, where L ∈ Z is the parameter of the TPM and denotes the minimum/maximum possible weight value of the input neurons.
The output of the aforementioned neurons is based on the slightly changed signum function σ.The formula of the function is presented in (1).It differs from thr regular signum function in that it never returns zero.The value of 0 is mapped either to 1 or −1, based on whether the side is the sender or the recipient of the communication [6].The recipient and sender side is denoted by r and s, respectively.The parties decide beforehand which side is the sender and the recipient.
The argument for the neuron's activation function is the sum of the products of the input vector's elements with corresponding weight.The exact formula is presented in (2).
The final result O of the TPM is the product of each of the outputs from hidden neurons from the first layer (3).
The overall architecture of the TPM is shown in Figure 1.

B. Key agreement protocol
The parties performing the key agreement execute the protocol which results in the secure shared key known only to the participating parties.This is usually achieved through the exchange of some information through an unsecured channel and by performing mathematical operations whose results are only known to the authorized parties [7].The first and most popular key agreement protocol was proposed by Diffie and Hellman [8].
TPM offers functionality which can be adopted for key exchange purposes.The protocol for two parties consists of the following steps [6]: 1) both participants must agree on all the parameters for the TPM (K, L, N ) and initialize them with random weights; 2) the key agreement participants publicly exchange a previously chosen random input vector X; 3) each party computes the output from their TPM and publishes the results; 4) if the outputs match, both participants apply the appropriate learning rule which updates the weights of TPM accordingly; 5) steps 2-4 are repeated until full synchronization of both TPMs is achieved.
The full synchronization is equivalent to every corresponding weight of both TPMs being equal to each other, at which point both TPMs are the same.
The aforementioned learning rules are responsible for updating the weights of each TPM in such a way that the synchronization process finishes in finite time [9].There are three different learning rules which can be used in the process of updating weights [10]: • Hebbian learning rule w kn (t + 1) = w kn (t) + O(t)x kn (t)Θ(y k (t), O(t)) (4) • Anti-Hebbian learning rule • Random walk learning rule, where Θ(a, b) denotes the function returning 1 if a = b and 0 otherwise, and parameter t denotes the iteration in the key agreement algorithm.
The synchronization process of two parity machines is not an deterministic algorithm.The number of iterations is not fixed and depends on the size and parameters of the TPM.However, it is shown that the time is finite and can be easily estimated by users [11].The process takes longer for larger TPM sizes (K and N ) and maximum weight value (L).Other factors which affect the number of iterations required for two TPMs to finish mutual learning include distribution of initial weights and learning rule [12].

C. Security of Tree Parity Machines
Security of key agreement protocol is crucial for communication.Any eavesdropper being able to reproduce the key based on the messages exchanged between parties or any other source breaks the security of the channel.Subsequently, such a situation depreciate the secure key exchange protocol.Hence, it is crucial to assess security of any novel algorithm or protocol.
TPMs have been studied extensively.In [13] and [14] the authors identify four distinct types of attacks that TPM may be vulnerable to: • brute force attack -research shows that it is impossible to find the exact key as a result of a brute force attack against TPMs in polynomial time; • genetic algorithm for weight prediction: it has been shown that only TPMs with a single neuron in the second layer are vulnerable to this type of attack; • an-min-the-middle interception attack -studies show that on average 60% of weights were synchronized in the eavesdropper's TPM; • sign of weight classification using neural networks -in [14] authors demonstrate that classification using artificial neural networks has near 100% accuracy in determining the sign of the weight in the TPM, which reducing the time needed by the brute force attack by almost half.
The studies show that, by utilizing these attack vectors, it is possible to gain some information about the key.Hence, cryptosystems should be aware of this threat and counteract it in order to minimize the likelihood of key reconstruction.

D. Man-in-the-middle attack
Synchronization of two TPMs without additional layers of security is a process prone to man-in-the-middle attacks.This attack relies on the possibility of placing a node C between parties A and B performing a key agreement.The node eavesdrops on all the messages shared between A and B. Based on information collected, node C may be able to gain unauthorized access to information sent between A and B.Moreover, if the nodes are not mutually authenticated, the adversarial party may be able alter the messages accordingly to attempt an attack with a higher probability of success.
In terms of TPMs, man-in-the-middle attacks come down to capturing all the input vectors X and outputs of parties being intercepted.An adversarial TPM performs the learning process on acquired data.There are three scenarios to be considered while intercepting the key exchange.Let A, B be the parties wishing to exchange the key and let C be an intruder able to perform a man-in-the-middle attack.The last scenario brings the adversarial party closer to obtaining the exchanged key.Hence, this situation should be avoided at all costs.

III. ENTROPY
The quality of random numbers generation has a significant impact on the final security of the cryptosystem.A true random number generator produces every available output with equal probability.Unfortunately, computers are incapable of generating fully random numbers.Frequently, numbers are generated based on a pseudo-random number generator.This requires a seed supplied beforehand which is the starting point of the pseudo-random number sequence, and each further number depends on it.Many contemporary implementations lack important features like good mathematical foundations, lack of predictability and cryptographic security [15].
Entropy is one of the measures which assesses the quality of the generated numbers.Let us assume the random source generates I different numbers α 1 , α 2 , . . ., α i with corresponding probabilities p 1 , p 2 , . . ., p i .Entropy for such a defined source is presented in (7) [16].
The base of logarithm j denotes the units in which entropy is measured, e.g. for 2 and e units are bits and nats respectively [17].
Let us consider a random source which produces two outputs with, either 0 or 1 with corresponding probabilities P (X = 0) = p and P (X = 1) = 1 − p.The entropy for the described source is presented in (8) [17].
Figure 2 shows the plot of the entropy of the aforementioned two-value random source.The maximum of the function is reached for p = 0.5 where H(p) = 1 which is the equal probability for values 0 and 1.Hence, entropy values increase as the probability distribution of X gets closer to the uniform distribution.This can be generalized for sources producing more outcomes.The entropy function can be used later to assess the quality of the keys generated by different types of TPM.Taking into account (9) the effective length of a key should depend on the entropy of the synchronized weights (not just their values).

IV. NON-BINARY INPUT VECTORS
The TPMs uses binary vectors X for input [2] during the synchronization process.There are other approaches presented in [18]- [20] which use complex-valued, vector-valued and chaos generated input vectors accordingly to improve the learning process.Additionally, in [21] authors propose whale optimization-based synchronization which results in reduction of the learning process duration.
This paper introduces a new approach: non-binary input vectors used to synchronize TPMs for a secure key agreement protocol.The authors propose that the mutual learning process which uses the vectors with a greater range of possible values of every element influence the synchronization time of two TPMs.Simulations performed in the next section verify this proposition and indicate that this approach can significantly increase the security of neural cryptography.
A. Non-binary vector tree parity machine architecture So far, the exact TPM was defined by parameters K, L, N .In this paper the authors introduce a new parameter M , denoting the minimum/maximum value of each element of input vector X.Hence, the input vector will have the following form: X = [x 11 , x 12 , . . ., x 1n , . . ., x k1 , . . ., x kn ], where Thus, during the synchronization process the entities can use non-binary input vectors, instead of binary vectors which are currently used in practical implementations.
Introducing the M parameter does not affect the architecture of the TPM or the learning process.The formulas shown in Section II are still valid despite more divergent values of the learning vectors.However, simulations presented in Section V show that as the input vectors are more differentiated, the distribution of settled keys is less similar to the uniform distribution.Therefore, an unbiased estimation of key length is required.

B. Agreed key length
After the synchronization process both parties share identical keys.The keys are distilled from weights of the TPM which are the same after the mutual learning process.The key length depends on the size of the TPM as well as the parameter L which indicates the minimum/maximum value the weights may reach during synchronization.Assuming the ideal uniform distribution of the weights, the key length is equal to K • N • log 2 (2L + 1).However, the distribution of the weights differs from the uniform distribution [10].Hence, the entropy should be used to measure the quality of the key exchanged between the parties.The updated key length is defined as follows: E(W ) indicates an average entropy of the weights.Entropy itself is presented in Section III.However, the exact distribution of weights is not known beforehand.Taking this fact into consideration, equation ( 9) should be updated.The estimated effective key length shown in equation (10) uses the estimated entropy based on the simulation results.Additionally, we propose using the floor function in the equation since the unit of effective key are bits. length It should be noted that equation (9) indicates the theoretical maximum key length which can be extracted from mutual weights.However, a dedicated algorithm which equalizes the probability can be used to obtain a cryptography key from an unevenly distributed numerical sequence.This algorithm must be deterministic, since both parties retrieve the cryptographic key from weights simultaneously.

V. VERIFICATION
This section presents the impact of the new parameter M , indicating the maximum/minimum value of the input vectors during the synchronization process and how it affects the required iterations in the learning process and the quality of the output key.

A. Methodology
The quality of the output key is measured in its effective length.The effective length is calculated on the basis of (10).Further, simulation scenarios cover multiple sets of TPMs sizes.For each scenario statistical analysis was prepared based on 1000 simulations.The presented confidence intervals are calculated with a 95% probability.These scenarios include all possible combinations of parameters N ∈ {40, 50, 60} and M ∈ {1, 2, 3, 4, 5}.For all simulation scenarios, parameters K and L are equal to 3 and 5, respectively.Synchronization time, entropy and effective key length are measured in order to compare the chosen scenarios.Furthermore, we performed man-in-the-middle attack scenarios during which we measured the average synchronization score of the malicious TPM.

B. Results
The synchronization process becomes longer as the size of the TPM increases; it also generates a longer key for cryptographic purposes.However, simulations presented in Table I reveal that the TPM size and parameters are not the only elements that have an impact on the duration of the synchronization process.Multiple simulations were performed with a different values of parameter M .An increase of the parameter M value which limits the maximum and minimum possible values x i of the input vector X reduces the synchronization time significantly.The synchronization time in Table I is expressed as a number of output bits exchanged between the parties to achieve full synchronization between the two TPMs (learning iterations).Thus, the volume of data exchanged between the parties performing key agreement decreases as the value of parameter M increases.It should be noted that faster synchronization increases security.This is because as the value of parameter M increases, the key agreement process takes less time, hence a longer and more secure key is obtained in a shorter period of time.This makes this solution more competitive among other key exchange protocols.

C. Extrema values effect
Numerous simulations of the TPM learning process using non-binary input vectors led to the discovery of an effect named by the authors as the extrema value effect.Similar effect is shown in [20], however, only binary input vectors are considered in this paper.
Faster synchronization times and lower numbers of messages exchanged between users have an impact on the distribution of weights.As the minimum/maximum x i increases, the probability P (w kn = M ) and P (w kn = −M ) also increases.As a result, the probability distribution of weights becomes less similar to the uniform distribution.Hence, every weight of the TPM carries less random information.The exact distribution of weights is presented in Figure 4.  II.To visualize the proportion between effective key length, the results for the considered M and N parameters are presented in Figure 3.

D. Susceptibility to a man-in-the-middle attack
Many research considerations address TPM vulnerability to man-in-the-middle attacks.Therefore, simulations with adversarial TPMs have been conducted while utilizing learning by non-binary input vectors.
We assumed the worst-case scenario in which the adversarial neural network was able to eavesdrop on all of the data exchanged between the parties performing the key exchange.During the simulations, the final synchronization score S score was gathered for the adversarial neural network.The synchronization score measures the similarity between two TPMs.The  more common weights there are, the higher score value is assigned.Hence, the formula needs to return higher values with the progress of the learning process.The formula for calculating the end score is presented in equation (11).In the following equation, w A denotes weights of adversarial TPM and function Θ(a, b) is defined in Section II.
In terms of security, the attacker's TPM should have the lowest synchronization score possible.The synchronization score of adversarial TPMs are presented in Table III.Additionally simulation results are shown in Figure 5 to visualize the relationship between scenarios with different TPMs.Increased values of parameter M result in higher median synchronization scores, hence the TPM is more prone to man-in-the-middle attacks.When parameter M was equal to L, we observed situations where the synchronization score was equal to 1.This means that the relationship between parameters M < L should be preserved to ensure security.Additionally, the median is inversely proportional to the number Fig. 4. Probability distribution of weights in TPMs using non-binary input vectors of inputs N, therefore the impact of non-binary input vectors on the synchronization score is less clear for larger TPMs.Furthermore, the confidence intervals are considerable.This variability makes it difficult to predict the attacker's malicious TPM weights.This article proposes an improved way of learning TPMs.A significant acceleration of the key agreement process was achieved by utilizing a non-binary input vector.This reduces the volume of data exchanged between the parties performing key agreement.Faster synchronization increases security levels; in particular, it mitigates the risk of the key being obtained by an intruder using a man-in-the-middle attack.However, the speeding up the process results in an unequal distribution of weights in the TPM.This was measured by calculating the effective key length based on the entropy of each weight.The proposed solution was also verified in an insecure environment in which two TPMs are a subject to a man-in-the-middle attack.
We envisage that future work will explore the development of a secure key exchange protocol using non-binary input vectors in TPMs during mutual learning.This work will be focused on studying the extrema values effect thoroughly and minimizing the reduction of effective key length.

Fig. 1 .
Fig. 1.Architecture of the tree parity machine TPMs are synchronized during this step.• If Π A = Π B = Π C -only TPMs A and B are synchronized, while TPM C (attacker) does not update its weights.• If Π A = Π B = Π C -all the TPMs update their weights accordingly.

Fig. 2 .
Fig. 2. Entropy of the source generating two different values with the same probability

Fig. 3 .
Fig. 3. Effective key length of TPMs wth different parameters The unequal distribution of weights in the TPM results in a reduction of the effective key length since as entropy value becomes lower.Entropy values and effective key lengths are presented in TableII.To visualize the proportion between effective key length, the results for the considered M and N parameters are presented in Figure3.