High-Speed Privacy Amplification Scheme Using GMP in Quantum Key Distribution

Privacy amplification (PA) is the art of distilling a highly secret key from a partially secure string by public discussion. It is a vital procedure in quantum key distribution (QKD) to produce a theoretically unconditional secure key. The throughput of PA has become the bottleneck of most high-speed discrete variable QKD (DV-QKD) systems. Although some Toeplitz-hash PA schemes can meet the demand of throughput, their high throughput extremely depends on the high cost platform, such as MIC or GPU. From the comprehensive view of development cost, integration level and power consumption, CPU is a general low cost platform. However, the throughput of CPU based PA scheme is not satisfactory so far, mainly due to the conflict between the intrinsic serial characteristic of CPU and the parallel requirement of high throughput Toeplitz-hash PA scheme. In this paper, a high throughput modular arithmetic hash PA scheme using GNU multiple precision arithmetic library (GMP) based on CPU platform is proposed. The experimental results show that the throughput of our scheme is nearly an order of magnitude higher than the comparative scheme on the similar CPU platform, which is 135 Mbps and 69 Mbps at the block sizes of <inline-formula><tex-math notation="LaTeX">$10^6$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$10^8$</tex-math></inline-formula> on Intel i3-2120 CPU respectively. Moreover, our scheme can provide the best throughput among DV-QKD PA schemes, which is 260 Mbps and 140 Mbps at the block sizes of <inline-formula><tex-math notation="LaTeX">$10^6$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$10^8$</tex-math></inline-formula> on Intel i9-9900k CPU respectively.


I. INTRODUCTION
Quantum key distribution (QKD) is a notable technique which exploits the principles of quantum mechanics to perform the theoretically unconditional secure key distribution between two remote parties, named Alice and Bob.The first practical QKD protocol is proposed by Bennet and Brassard in 1984 [1].Many protocols have been proposed since then.These QKD protocols can be divided into discrete variable QKD (DV-QKD) and continuous variable QKD (CV-QKD) protocols [2]- [4].Since DV-QKD protocols is proposed earlier, many high-speed DV-QKD systems have been developed recently [5]- [7].Therefore, we focus on the DV-QKD system in this paper.A general structure of the DV-QKD system is indicated as Fig. 1.
There are two major parts in a DV-QKD system: the quantum subsystem and the post-processing subsystem.The operation of a quantum subsystem is quantum state preparation and quantum state measurement through a quantum channel to produce the original key.A post-processing subsystem includes a public classical channel and four parts [8][9]: a. Sifting: filtering out the unusable key with wrong measurement mode, as the preparation and measurement of quantum state is random in QKD.b.Error reconciliation: correcting error bits in a key and getting an identical corrected key with exposing information of the key as little as possible.c.Privacy amplification: shrinking the exposed information of the corrected key on the quantum channel and the classical channel to almost zero [10], [11].d.Authentication: verifying the uniformity of the public information on the classical channel.
Privacy amplification is the last link of producing the unconditional secure key in QKD, which has great influence on the security of the final key.Bennet and Brassard prove that mapping a long corrected bit string to a much shorter final key via universal 2 hash function families can satisfy the security requirement of privacy amplification based on the information theory [10], [11].Then Renner et al. prove that the same operation can ensure the security based on the universally composable security, even if the eavesdropper can acquire and store the quantum information instead of the classical information [12].Nevertheless, the security proof above is accomplished in the condition of asymptotic optimality, meaning the input size of privacy amplification at a time is infinite.According to [13], the input block size of privacy amplification should be at least 10 6 , and the security of the key will be enhanced as the increase of the input block size.The large block size leads to large amount of computation, resulting in low real-time processing speed of PA.Therefore, PA has become the bottleneck in many QKD systems.Universal2 hash function is the kernel of PA, and the selection of universal2 hash function families is playing the decisive role in the Fig. 1.A general structure of the DV-QKD system computation burden [14].Toeplitz hash function family is the most popular choice in PA [15].J. Constantin et al. and S. S. Yang et al. respectively present block parallel algorithm to accomplish Toeplitz-hash PA, and implement it on fieldprogrammable gate array (FPGA) [7], [16].Other than block parallel algorithm, fast Fourier transform (FFT) algorithm is an efficient method for Toeplitz-hash PA due to computational complexity of O( log ) n n .B. Liu et al. implement Toeplitz hash PA using FFT algorithm first on a central processing unit (CPU) platform [17].Q. Li et al. implement modified Toeplitz hash PA using FFT algorithm on a FPGA platform, which makes the processing speed of PA over 100Mbps at the block size of 10 6 [18].R. Takahashi et al. utilize number theory transform (NTT) algorithm, a similar algorithm of FFT in number theory with the same computation complexity O( log ) n n , to implement Toeplitz hash PA on a CPU with a coprocessor platform.This scheme achieves 108Mbps at the block size of 10 8 [19].X. Y. Wang et al. further improve the speed of Toeplitz hash PA with FFT on the GPU platform [20].However, GPU platform is more suitable for CV-QKD on account of its volume and power.Summing up above schemes, Toeplitz hash PA methods are insatiable for the rapid developing DV-QKD system.Therefore, it is necessary to study more universal2 hash families other than Toeplitz hash family.Modular arithmetic is another kind of universal2 hash families.It is structured by modular arithmetic instead of matrix multiplication like Toeplitz hash family.C. M. Zhang et al. once propose an optimal multiplication algorithm for Modular arithmetic hash PA, but the speed of this scheme is relatively slow [21].Therefore, our research focus on the modular arithmetic hash function family, laying emphasis on the acceleration of large module multiplication in this hash computation.GNU multiple precision arithmetic library has effective combinatorial optimization on the speed of large module multiplication [22].A modular arithmetic hash PA scheme is presented using GMP library in this paper.This scheme is implemented on different CPU platforms to test the throughput.The actual test results indicate that the throughput of this scheme is higher than that of the recorded PA scheme at the block size between 10 6 to 10 8 .The application value of modular arithmetic hash function in PA is proved by this scheme.
The rest of this paper is organized as follows.Some related works are described in Section 2 as the basis.In Section 3, the presented modular arithmetic hash PA scheme using GMP is introduced in details.In Section 4, the experiment results and analysis are given.In Section 5, some conclusions are drawn.

A. Privacy Amplification
Privacy Amplification (PA) in QKD is to allow two parties, Alice and Bob, to distill a secure final key from a partially secure bit string.Privacy amplification defined by universally composable security is convenient to analyze the influence of finite resource and the quantum state hold by the eavesdropper, Eve, on the secure key in QKD.The definition of universally composable privacy amplification is given below.Before PA procedure in QKD, the input of PA of two parties is an identical random n-bit binary string X from error reconciliation.
The eavesdropper, named Eve, learns quantum information ρ about X, where ρ is a random quantum state.Alice and Bob wish to publicly choose a compression function

{ } (
) [ ] ( ) where (X) S represents the von Neumann entropy of X.According to the finite resource analysis, the final key of PA is close to the infinite situation when the input block size is over 10 6 .The large block size leads to large computation of universal2 hash function.Therefore, a less computational universal2 hash function is extremely essential to implement high-speed PA.

B. Universal2 Hash Function
Universal2 hash function is the kernel function of privacy amplification.A δ-function is defined to evaluate universality of hash function.Definition 1.If g is a hash function from A to B and , x y A ∈ , then If element , or x y g is replaced in ( , ) g x y δ by a set.For example, g is replaced by a collection of hash functions G and x is replaced by the set A ,then ( , ) on this basis, the definition of universal2 hash functions is given as follow.
, ( ) ( mod 2 ) 2 The multiplication of large numbers (over 10 6 bits) is the most complex part of , c d G function computation.Many efficient algorithms have been presented to accomplish the multiplication of large numbers, e.g., Karatsuba, Toom-Cook, Schönhage and Strassen algorithms.GMP library is an arithmetic library for multiple precision data, which can accomplish the multiplication of large numbers very fast with optimization of above algorithms.Therefore, we present a , c d G hash privacy amplification scheme using GMP library.

III. , c d G HASH PRIVACY AMPLICATION SCHEME USING GMP
GMP library is an arithmetic library written in C for arbitrary precision data of integers, rational numbers and floating points.In privacy amplification, the size of the operand is always over millions of bits.The advantage of GMP is the automatic selection of the suitable algorithm according to the operand size and the assembly code level optimization for the maximum operation speed.Therefore, a , c d G hash privacy amplification scheme using GMP is presented in this paper.The procedure of the scheme is indicated in Fig. 3.

A. Data Import
The input data of privacy amplification is usually a bit string, while the operand of GMP multiplication is a single datum, whose data type is mpz instead of array or string.The format conversion process can be time-consuming in PA.An efficient method for the format conversion is storing the bit string into an array of unsigned long data, and then import it to a mpz data with the function mpz_import.The data import procedure is indicated as fig.4.
The parameter order can be 1 for most significant word first or -1 for least significant first.

B. Multiplication and addtion with GMP
Multiplication is the most complex computation in , c d G hash privacy amplification.The principles and advantages of GMP multiplication of large numbers is selectively analyzed.
Seven multiplication algorithms can be selected by GMP multiplication based on the multiplication size N.The relationship of the algorithm selection and the size N is indicated as fig. 5.
As the multiplication size N in privacy amplification is over 10 6 .The multiplication algorithm for large size, i.e., Schönhage and Strassen algorithm, is emphasisly analyzed.
The main principle of Schönhage and Strassen algorithm is using FFT to transform the polynomial multiplication to the pointwise multiplication.FFT in number theory, NTT, is adopted in GMP to eliminate the influence of truncation error.The procedure of the algorithm is indicated as follow, which contains seven steps: fill zero, split, evaluate, pointwise multiply, interpolate, combine and carry bit.N of multiplication may be still very large, so GMP multiplication will be used again.Then Schönhage and Strassen, Toom-Cook other algorithms will be selected based on the length ' N .
In the design of implementation scheme, the above algorithm can be used by calling the function named mpz_mul.While the function mpz_addmul is adopted in this scheme, because it can accomplish one addition and one multiplication at one time.It is more efficient to implement y cx d = + in this PA scheme.

C. Module by 2 n
In this PA scheme, a modular arithmetic by 2 n is necessary.Other than a normal modular arithmetic needs division operation, a 2 n modular arithmetic can be accomplished only by bitwise operation.Therefore, a specialized function named mpz_mod_2exp is adopted instead of mpz_mod to accelerate the PA process.

D. Data Export
As similar with Data Import, data export is to transform the large integer result in type mpz to an array of the type unsigned long with the function mpz_export.

E. Export r bits final key
A transformation from the unsigned long array to a bit string is implemented in this step, and only the most significant r bits of the bit string can be exported.The value of r is calculated by the conclusion of security theory presented by D. Gottesman et al. based on the practical system parameter [23].

IV. RESULTS
The , c d G hash PA scheme above is implemented on two different CPU platforms.The implementation on an intel-i9 CPU is to test the maximum throughput of the scheme.The implementation on an intel-i3 CPU is to test the scheme in the limited resources situation.These two implementations with three existing high-speed implementations are listed in Table .1 with the platform parameter of implementations.The throughput of these two implementations is tested and compared with the existing implementation at the usual size 10 6 to 10 8 .The result is indicated as Fig. 6.
The throughput of scheme (1) reaches 262.13 Mbps at size 10 6 , 177.8 Mbps at size 10 7 , 140.96 Mbps at size 10 8 .These results indicate the throughput of our scheme achieves the maximum throughput of exiting PA scheme on CPU, even better than the scheme on hardware platform.The throughput of scheme (2) reaches 135.56 Mbps at size 10 6 , 85.95 Mbps at size 10 7 , 69.57Mbps at size 10 8 .These results indicate that our scheme is still efficient in the limited resources situation.The comparison of scheme (1), (2) and scheme (3), (4) results indicates that modular arithmetic based , c d G hash can reach or even better than matrix multiplication based Toeplitz hash in PA.The comparison of scheme (1)(2) and scheme (5) results

V. CONCLUSION
The exciting PA schemes in QKD is analyzed in this paper.Targeting at the problem that Toeplitz hash PA schemes are unable to satisfy the throughput demand of the developing QKD system, we focus on modular arithmetic hash function to design a PA scheme.Then a , c d G hash privacy amplification scheme using GMP is presented.This scheme is implemented and tested on the CPU platform.The results indicate that this scheme 1) reaches the maximum throughput of existing PA schemes on CPU platform to our knowledge even better than hardware PA schemes; 2) verifies the modular arithmetic hash also adapting to high-speed PA other than matrix multiplication hash; 3) verifies the high efficiency of using GMP library in modular arithmetic hash PA.
→ and compress the string X to generate the final key Y, i.e., Y g(X) = .If the final key Y satisfies the the inequality relation in (1), the final key Y is ε-secure.means the non-uniformity of Y, i.e., the trace distance between Y and a uniformly distributed random string U under the condition of Z. { }⊗ g ρ means the sum quantum information of the quantum information ρ owned by Eve and the compression function g.Such procedure is indicated as Fig. 2.It has been proved that universal2 hash function family can be the compression function in PA.According to the conclusion of composable security, the compression rate R of the final key length r and the input key length n must satisfy the following condition in (2).

Fig. 2 .
Fig. 2. the Procedure of Privacy Amplification in QKD

Definition 2 .G
Let G be a class functions from A to B ., A B denotes the number of elements in , A B .The hash function family G is universal2 if for any , hash family is an universal2 hash family based on modular arithmetic.The definition of , c d G is as follow.Definition 3. If the input size of ,

Algorithm
Schönhage and Strassen algorithm Input: a (N bits) and b (N bits); Output: c a b = × .

Fig. 5 Fig. 3 .
Fig. 5 the algorithm selection based on the size N

Fig. 6
Fig. 6 the throughput comparison of PA schemes

TABLE I .
KEY PARAMETERS OF EXISTING SCHEMES