Overflow-Detectable Floating-Point Fully Homomorphic Encryption

A floating-point fully homomorphic encryption (FPFHE) is proposed, which is based on torus fully homomorphic encryption equipped with programmable bootstrapping. Specifically, FPFHE for 32-bit and 64-bit floating-point messages are implemented, the latter showing the state-of-the-art precision among FHEs. Also, a ciphertext is constructed for checking if an overflow has occurred or not while evaluating arithmetic circuits with the proposed FPFHE, which is useful when the message space or arithmetic circuit is too complex to estimate a bound of outputs such as some deep learning applications. Also, homomorphic algorithms, which are crucial components of overflow detectable (OD)-FPFHE, are constructed. First, a state-of-the-art bootstrapping method of TFHE is extended to bootstrap larger messages by using NTT-friendly integer modulus. Second, a subgaussian analysis method is proposed without assuming independent heuristic on AP/GINX-bootstrapping even if the deterministic gadget decomposition is used. Third, the blind rotation algorithm of TFHE is modified such that any secret key having finite non-zero values can be used while keeping the number of NTT operations the same as when the binary key is used. Fourth, various homomorphic algorithms are proposed such as evaluating min and max, lifting a constant message to the monomial exponent, counting the number of consecutive zeros from the most significant in the fraction, and performing carryover after homomorphic operation of floating-point numbers. Finally, 32-bit and 64-bit OD-FPFHEs are implemented and simulation results are provided to confirm that they work well even for extreme cases. Also, it is verified that homomorphic overflow detection is well-operated.


I. INTRODUCTION
Since Gentry's seminal work on fully homomorphic encryption (FHE) [2], various FHEs such as BGV/FV [3], FHEW/TFHE [4], [5], and CKKS [6] have been proposed and intensively studied.FHE is a powerful methodology for evaluating arithmetic circuits while keeping the privacy of data.As applications of FHE with boolean circuits, private information retrieval (PIR) [7], [8] and private set intersection (PSI) [9] have been studied.Also, homomorphically evaluating deep learning models [10] have been mostly The associate editor coordinating the review of this manuscript and approving it for publication was Sedat Akleylek .
studied by using CKKS.Since CKKS deals with a ciphertext packed with a large number of messages, CKKS is the most suitable for evaluating circuits with extensive parallel data.
complex and accurate operations such as calculating satellite collision probabilities [12].Therefore, an accurate floatingpoint FHE (FPFHE) is required, which can achieve almost similar results to the corresponding plaintext operations.
Furthermore, the efficiency of the floating-point and fixedpoint number systems varies for each application.The Minkowski message space is a fixed-point number system which is an effective number system for computations when the range of input data is limited (e.g. when data are aligned between 0 and 1).However, in cases where the size of a user's input data is unknown, the floating-point number system is the efficient choice since it offers a broader range of represented numbers and relative error in the results.In other words, for achieving a general-purpose FHE system, research in floating-point FHE is essential.
Moreover, overflow occurrences are another problem For example, demands of verification and testing validation of streaming data have been increased [13].Apparently, if the server homomorphically manipulates ciphertexts of data having an unbounded size, the results may contain overflows.As another example, evaluating a deep arithmetic circuit may return irrelevant results when any of the intermediate results take a value out of the message space.
However, in contrast to the evaluation with plaintexts, such overflow cannot be detected by a user in the ciphertext domain.If a range of circuit output is not bounded or an input dimension is too large, overflow frequently occurs and thus any input cannot be ensured whether it causes an overflow or not.Also, when the past data is used to update the circuit, e.g.privacy-preserving federated learning [8], inaccurate updating using meaningless values due to overflow ruins the circuit performance.However, to the best of our knowledge, an overflow detection method for FHE has not been proposed.

B. CONTRIBUTIONS
Our main contributions are divided into two parts.First, we propose FPFHE with homomorphic normalization for the first time, which effectively resolves the error problem from encoding floating-point numbers and makes every operation on ciphertexts with FPFHE synchronized to the corresponding operation on plaintexts from floating-point message space.Moreover, we implement FPFHE with single (32-bit) and double (64-bit) precision, which shows a much wider range of numbers (up to 2 127 and 2 1023 , respectively) and exact significant digits.Note that the message space and operations of CKKS only guarantee fixed-point precision.
Second, an effective overflow-detection (OD) method with the proposed FPFHE is constructed, which can check whether an overflow occurs or not during homomorphic operations.By properly combining these two schemes, we construct OD-FPFHE.
Also, we propose homomorphic algorithms to handle the following technical issues, which are important components of OD-FPFHE and can also be used for other FHEs.
• Sequential bootstrapping on shared primes: FHE requires bootstrapping for reducing the amplified error from homomorphic operations.We extend the bootstrapping method in [1] to bootstrap a large number of message bits by using integers modulo shared primes Q which is number theoretic transform(NTT)-friendly.This algorithm enables the proposed FPFHE to bootstrap more message bits by using NTT, which is a solution for removing errors generated from fast Fourier transform (FFT), listed as an open problem in [1].
• Modified blind rotation for GINX-bootstrapping: This algorithm keeps the number of NTT operations the same for any secret key having finite non-zero values and hence improves running time compared to the state-of-the-art GINXbootstrapping [5].
• Error analysis without independent heuristic: This analysis is applicable even when deterministic gadget decomposition is used and makes it possible to choose small lattice parameters for enhancing running time.

• Various non-linear homomorphic algorithms:
We propose various homomorphic algorithms such as evaluating min and max, lifting a constant message to the monomial exponent, counting the number of consecutive zeros from the most significant in the fraction of floating-point message until non-zero value occurs, and performing carryover after homomorphic operations.Note that they are run by using sequential bootstrapping.

C. RELATED WORKS
To the best of our knowledge, [14] suggests homomorphic operations on an approximated rational number, however, it is not the exact implementation of the floating-point number system.Reference [15] also implements a floating-point FHE.However, since they do not normalize the results after homomorphic operations (See Section II-F), the operation error may grow rapidly after consecutive homomorphic operations.Moreover, they suffer from slow operation time because they only directly add floating-point operations to the existing FHE schemes using gate operations such as TFHE and BGV/FV.

II. PRELIMINARIES
This section introduces mathematical backgrounds and some fully homomorphic encryption schemes.The main reference and notation of algebraic and statistical backgrounds are followed from [11], [16], and [17].

A. NOTATION
Let N, Z, Q, and C denote the sets of natural, integer, rational, and complex numbers, respectively.Let Z q ∼ = Z/qZ be an integer ring Z modulo qZ for some q ∈ N, and let Z q [X ] be a polynomial ring Z[X ] modulo qZ[X ].We use [n] to denote an index set of 0, 1, . . ., n − 1 for n ∈ N.More generally, [n 1 , n 2 , . . ., n m ] = m i=1 [n i ] is used as a product index set for n 1 , . . ., n m ∈ N.
We use boldface a= (a i ) i∈ [n] to denote row vector where a i is the i-th element of a. Analogously for any polynomial a(X ) ∈ Z[X ], we use a i to denote the coefficient of X i in a(X ).For a polynomial vector a(X ), a i (X ) denotes the i-th polynomial of a(X ) and a i,j is the j-th coefficient of a i (X ).
As a magnitude of an element, we always use We use the following index function θ for mapping 2-D indices of the lower triangle part of a matrix into integers such as which is used for indexing tensor product keys.Note that θ −1 denotes its inverse function, and θ −1 ).We represent a non-negative rational number x by (x n .xn−1 . . . where 0 ≤ x 0 , . . ., x n < β are non-negative integers and β is called a radix.We use Big Oh and Omega notation as O(•) and (•), respectively.
having the property that a i and b i for all i ∈ [N ] appear only once in the expression of each coefficient of a(X )b(X ).
For the above algebraic structure, there are two types of Chinese remainder theorem(CRT).First, if Q is a prime number satisfying 2N |(Q − 1), then Z Q has a primitive 2N -th root of unity ζ ∈ Z Q such that there is an isomorphism by the CRT as follows: N with relatively prime integers Q i , there exist the following canonical isomorphism: In this paper, a particular pair of primes are used.Let two NTT-friendly primes Q 0 and Q 1 be called shared primes if they share the same scaling factor ν ∈ N such that Q 0 = ν2 η 0 + 1 and Q 1 = ν2 η 1 + 1.Note that, the product of shared primes is used as an integer modulus Q as in Section III-B.

C. STATISTICAL BACKGROUND
Let (Ω, F, P) be an ambient probability space and X : Ω → R be a random variable.When the co-domain of X is defined over Z Q , we always define X as a function Next, we briefly introduce subgaussian random variable and its properties.
Definition 1 (Subgaussian [16]).A random variable X is called a subgaussian random variable with a standard parameter σ ≥ 0, denoted as X ∼ subG(σ ), if E[X ] = 0 and the moment generating function M X (t) is bounded as follows: Proposition 1 (Error bound [16]).For any random variable X ∼ subG(σ It is also known that the space of subgaussians forms an Rvector space with properties: ).Since (iii) is the best analytical result for the sum of two subgaussian random variables in terms of minimizing variance, we will focus on the conditions for ) and the conditions for satisfying Pythagorean additivity have been investigated as follows.

Corollary 1. Let Y i ∼ subG(B Y ) be mutually independent B Y -bounded random variables for all i ∈ [n], and let X i ∼ subG(B X ) be B X -bounded random variables for all i ∈ [n]
where X i depends only on X j and Y j for all j < i.Then X i Y i for all i ∈ [n] have Pythagorean additivity.
Corollary 1 ensures that although all X i Y i 's are dependent on each other, they can have Pythagorean additivity similar to the case when X i Y i 's are mutually independent.This is useful for analyzing a sum of large numbers of dependent random variables, e.g.analyzing an error bound after bootstrapping in FHE.However, Corollary 1 implies that X i and Y i must be bounded with zero mean since X i and Y i are subgaussian.A similar analysis under more relaxed conditions is performed in Section IV-A.

D. LWE/MLWE SYMMETRIC ENCRYPTION AND GADGET DECOMPOSITION
In this section, widely used lattice-based crytosystems are introduced.First, we recall LWE symmetric encryption [19].For given q, n, t ∈ N, t-bit message space Z 2 t , a scaling factor 1 ≤ ≤ ⌊q/2 t ⌉, and a secret key s= (−s 1 , . . ., −s n , 1) ∈ Z n+1 q , a ciphertext for the message m ∈ Z 2 t is obtained as follows: where m is chosen from {0, . . ., 2 t − 1}, T denotes the transpose of matrix, a∈ Z n q is chosen uniformly at random, e is sampled from a centered discrete Gaussian distribution χ e on Z with standard deviation σ , and s is sampled from a distribution χ s .We adopt a ternary secret key s, which is widely used for FHE [20].In addition, s is called h-sparse if the number of non-zero elements of s is h.
The phase function ϕ s and the decryption function φs of ct s are defined as: Therefore, if |e| < /2, then the message m is correctly extracted by the decryption function φs and we call such ct s [ m] a valid ciphertext.
If the structure Z (n+1)×1 q of LWE ciphertext in (4) is replaced by R N ,q , such ciphertext is called a module-LWE (MLWE) ciphertext CT s(X ) [ m(X )] for the message m(X ) ∈ R N ,q encrypted by the secret key s(X ) ∈ R K N ,q .In addition, we use a generalized MLWE of sample extraction [1], [4], which generates LWE ciphertexts ct[ m i ] from MLWE ciphertext CT[ m(X )] as follows: where is a valid LWE ciphertext for the secret key s ′ = (s 0,0 ,. . .,s 0,N −1 , s 1,0 ,. . .,s K −1,N −1 ).In this paper, LWE and MLWE ciphertexts are denoted as ct and CT if they are clear in the context.
For a given N , a ciphertext ct is called squashed if the integer modulus used for ct is 2N .A squashed ct is used to explain FHEW/TFHE in Section II-E.

1) GADGET DECOMPOSITION
Next, an approximated gadget for decomposing LWE and MLWE ciphertext is introduced, which is used for constructing GSW cryptosystem.Definition 2 (Gadget [21]).For any finite additive group R, an R-gadget of size l, quality ρ, and precision ϵ is a vector g ∈ R l such that any element u ∈ R can be written as an approximated integer combination i g i • x i such that max|x i | ≤ ρ for all x i ∈ Z and the gadget error Proposition 2 (Deterministic (signed) gadget decomposition [4], [5]).Assume that two finite additive groups R N ,Q and Z Q , and gadget parameters B ∈ N and l ∈ N are given such that B l−2 ≤ Q < B l−1 .Then, there exists a gadget g= (B l−l , . . ., B l−1 ) and the deterministic gadget decomposition Q with size l = l − 1, quality ρ = ⌈B/2⌉ and precision ϵ = ⌈ l−l i=1 B i /2⌉.Note that if ϵ = 0, i.e., l = l, it is an exact gadget decomposition and if l < l, it is an approximated gadget decomposition.Next, module GSW (MGSW) cryptosystem [22] and external product of MGSW and MLWE ciphertexts are introduced based on gadget decomposition.

E. FULLY HOMOMORPHIC ENCRYPTIONS USED FOR CONSTRUCTING OD-FPFHE: (M)GSW, FHEW, AND TFHE
First, (M)GSW cryptosystem is briefly introduced.Let I n ∈ R n×n N ,Q be the identity matrix and ⊗ denote Kronecker product.Then for a message m ∈ {0, 1}, MLWE secret key s(X ), and in matrix form is obtained by using MLWE ciphertexts as follows [5], [22]: where S ′ (X ) = (I K +1 ⊗ g)s(X ).The external product for any given two ciphertexts CT s(X ) [m 1 (X )] and GCT s(X ) [m 2 ] is defined as: where G −1 is a gadget decomposition algorithm given in Proposition 2. The phase result of CT GCT is known as follows [4]: where e(X is the noise in GCT, e ′ (X ) is the noise in CT, and A(X ) is the gadget error from Definition 2.
Since |e i (X )| and |s i (X )| are small, the validity of CT GCT depends on ρ and ϵ of gadget decomposition.
For a given LWE ciphertext ct and an evaluating function f , bootstrapping in FHEW and TFHE performs the following operations: (i) the modulus of ct is reduced to 2N so that the valid squashed ct ′ [ m] for m = (m t−1 , . . ., m 0 ) (2) is obtained [5]; (ii) BlindRotate [4] is run with the accumulate polynomial ACCPoly(X ) ∈ R N ,Q which is converted from the look-up table f (( 14) in Section IV-B), and then the following MLWE ciphertext CT is obtained (iii) the CT in ( 8) is extracted to obtain a valid LWE ciphertext ct ′′ for the constant message of CT by using SampleExtract(CT, 0); (iv) the secret key of ct ′′ is switched to the secret key of ct by using KeySwitch [4].Therefore, if m t−1 controls the sign of the look-up table, or if m t−1 is zero, then (8) is correct.
Recently, a programmable bootstrapping (PBS) is proposed [23], which can evaluate any look-up tables.Moreover, without padding PBS (WoP-PBS) is proposed to make the above bootstrapping correct even when m t−1 = 1.In this paper, WoP-PBS is applied to R N ,Q when Q = Q 0 Q 1 with shared primes Q 0 and Q 1 , as explained in Section IV-C.

F. INTRODUCTION TO FLOATING-POINT NUMBER SYSTEMS
A floating-point number system can be defined as follows: Definition 3 (Floating point number system [24]).A floatingpoint number system is defined by four integers: (i) a radix β ≥ 2; (ii) a precision p ≥ 2; (iii) two extreme exponents e min and e max with e min < e max .Then a real number x is called a floating-point number if x has at least one representation (s, m, e) such that where s ∈ {1, −1} is the sign of x, m = (m p−1 .mp−2 . . .m 0 ) (β) is a rational number satisfying 0 ≤ m < β, and e is an integer satisfying e min ≤ e ≤ e max .It is denoted as (β, p, e min , e max ) floating-point number system.
We call m and e in (9) as fraction and exponent, respectively.Since Definition 3 does not guarantee uniqueness of (s, m, e), normal form1 is defined as: Definition 4 (Normal form [24]).For the (β, p, e min , e max )floating-point number system, (s, m, e) of x ∈ R is called a normal form if 1 ≤ m < β or if e = e min with 0 ≤ m < 1.
If x ≥ β e min , we can choose the unique fraction (m p−1 .mp−2 . . .m 1 m 0 ) (β) with m p−1 ≥ 1, and otherwise, we can uniquely choose m by setting e = e min .
The result of floating-point operation should be rounded to a nearer floating-point number and various rounding algorithms have been studied [24].In this paper, the rounding zero (RZ) method is used [24] because it is easily implemented by rounding down when the calculated number is represented in normal form.Let RZ be a function from any real number x to a floating-point number such that RZ(x) rounds down if x ≥ 0 and rounds up if x < 0 [24].Then a tight bound of rounding error is known as follows: Proposition 3 (Error bound [24]).Let ⊤ ∈ {+, −, •, /} be an arithmetic operation.If x, y ∈ R satisfying β e min ≤ |x⊤y| ≤ (β − β 1−p )β e max , then the following inequality holds If there is no overflow, Proposition 3 gives an error bound after floating-point operation ⊤ and RZ.However if normal form is not used, RZ cannot be implemented by rounding down at the p-digit point of fraction m, and hence a precision degradation occurs meaning that Proposition 3 cannot be applied.Consider x = 0.1 • 10 1 in the (10, 2, 0, 2) floatingpoint number system.If x 2 = 0.01 • 10 2 is rounded down on the second digit, the result is 0 but RZ(x 2 ) = 1 and hence the error is 1 which is larger than 2 −1 calculated by (10).

III. FLOATING-POINT FULLY HOMOMORPHIC ENCRYPTION
In this section, FPFHE and OD-FHE are defined and a cryptosystem dealing with floating-point messages is proposed, which is used for constructing FPFHE.

A. DEFINITIONS
We formally define FPFHE and OD-FHE by extending the definitions in the homomorphic encryption standard [20], and we will focus on constructing a cryptosystem called OD-FPFHE using the operations O = {+, −, •}.

Definition 7 (OD-FHE). For a message space M with norm | • | and a depth-bounded circuit family C, a (ξ, C)-overflowdetectable fully homomorphic encryption is an FHE with a homomorphic algorithm such that for any ciphertext ct of m ∈ M and any f ∈ C, it returns a ciphertext for the message m ′
= 1 if m is ξ -overflow over f , and m ′ = 0 otherwise, except negligible probability.
For CKKS with message space M C ⊂ C N and ξ = max x∈M C |x|/2, it clearly has ξ -overflow numbers over f for any depth-bounded circuit f because M C is finite.If the message space M does not have a norm such as a ring with positive characteristic, e.g.BGV/FV and TFHE, ξ -overflow number is hard to define.However, if BGV/FV are used to encrypt a subset of Z, then ξ -overflow numbers can be defined and exist as well as CKKS.
Generating a proof for overflow in CKKS and BGV/FV is a complicated problem because fixed-point operations are used and extra precision may be required to check if an overflow occurs.In the proposed FPFHE, however, only by inspecting the exponent of homomorphic operation result, we can easily check if an overflow occurs or not when ξ = β l for some l ∈ N and just one extra bit in the exponent is required to do that.
Note that the security considered in this paper is Chosen-Plaintext Attacks (CPA)-security.From that viewpoint, the overflow-detecting ciphertext does not give further information to the attacker having the encryption oracle.

B. FLOATING-POINT ENCODING AND DECODING
To implement FPFHE, we must have proper encoding and decoding algorithms between floating-point numbers and the corresponding message polynomials.We choose q ∈ N and shared primes Q 0 = ν2 η 0 + 1 and

and use the scaling factors
Then, for a normal form (s, m, e) of floating-point number x, Encode and Decode are defined as follows: ), e = M e (0), and s as the sign of M s (0).-Return (s, m, e).In this paper, M s (X ), M f (X ), and M e (X ) are called sign, fraction, exponent polynomials, respectively, and the following facts will be derived after analyzing bootstrapping error in Section IV-B: (i) The size of Q 1 controls the bootstrapping error; (ii) The rounding error after tensor product of two MLWE ciphertexts are relatively small when Q 0 and Q 1 are shared primes; (iii) Moreover, the size of Q 0 should be large enough to take messages increased by a carry and should be determined by depending on β, p, and the carry system which is analyzed in Section V-B.q is also used as integer modulus of LWE ciphertext and shared primes Q 0 and Q 1 are found by exhaustive search.

C. CONSTRUCTION OF FLOATING-POINT FULLY HOMOMORPHIC ENCRYPTION
In this section, a formal FPFHE is proposed and its essential homomorphic operations will be introduced in Section V. We adopt the state-of-the-art improved TFHE cryptosystem with tensor product [1] but we change its torus modulus to shared primes (See details in [1]).
Our proposed evaluation keys are almost similar to those in [1], but for the proposed OD-FPFHE, these keys have been renamed for better understanding of their roles.The evaluation keys P, BL, and Ten correspond to the key-switching key KSK, bootstrapping key BSK, and relinearization key RLK used in [1], respectively.Note that in [1] the term KSK is used for reverting to canonical LWE ciphertexts, in the proposed FPFHE, P, is used for denoting the operation of packing ciphertexts into a single RLWE ciphertext after bootstrapping.
We use additional key-switching key KS encrypted with h-sparse secret key to reduce the lattice dimension and error magnitude.Therefore, the improved TFHE [1] takes the key-chain of BSK and KSK, and the proposed FPFHE takes the key-chain of BL, P, and KS.Fig. 1 compares between the improved TFHE [1] and the proposed FPFHE in terms of key-chain of evaluation keys and bootstrapping procedure.In addition, an example of encoding, encryption, decoding, and decryption procedures of the proposed FPFHE is shown in Fig. 2.
Note that generating evaluation keys in FHEs assumes circular security and key-dependent message (KDM) security [26] and we also construct FPFHE under these assumptions.Then, an overall procedure of the proposed FPFHE is given as follows and the detailed procedures will be explained in the next sections.
Determine the dimensions N gct , K gct , N ct , K ct , and n.Choose t for the message space Z 2 t of LWE ciphertext.Choose h for the sparsity of secret key.
• KeyGen(1 λ ) Set the evaluation key as ev= (P, BL, KS, Ten) and a secret key as sk.
• Generate two ternary MLWE keys sk-bl, sk and a h-sparse LWE key sk-ks.
• Let S be the set of non-zero values of sk.Then, generate BL (s) and  s ∈ S where m (s) i is 1 if sk − ks i = s, and 0 otherwise.
• Enc sk (x) • Choose the normal form (s, m, e) of x and run Encode(s, m, e).
• Dec sk (FCT sk ) • Run φsk for all inputs to obtain M s (X ), M f (X ), and M e (X ).
MULT proposed in Section V.In Section VI, Q 0 and Q 1 are chosen near 40-bit shared primes and every element of However, to perform the original gadget decomposition G −1 , φ −1 (a, b) should be performed, which is the inverse function of (3) and requires 128-bit operation.
Therefore, we introduce an accelerating gadget decomposition G −1 crt using shared primes, which requires only 64-bit operation when νQ 0 and νQ 1 are less then 2 64 .
The correctness of G −1 crt in Algorithm 1 comes from the following property.For any (a, b) where the equalities hold in Z Q , which means x = (a − b) Q0 mod Q 1 .Then we can split c = y+2 η 1 z into lower significant η 1 -bit number y and more significant η 0 -bit number z without calculating the exact value of c, and run gadget decomposition twice by using 64-bit integer operations.
In the next section, error amplification after bootstrapping of the proposed FPFHE is analyzed.

IV. ERROR ANALYSIS AND SEQUENTIAL BOOTSTRAPPING
In this section, we revisit error analysis of bootstrapping in Fig. 1.Section IV-A proves a more generalized result of [18], which is used to analyze the error amplification without independent heuristic even when deterministic gadget decomposition is used for bootstrapping.Section IV-B performs an error analysis for the product of two fraction ciphertexts, which is the worst-case error amplification among homomorphic operations in Section V. Therefore, for a valid MLWE ciphertext from TensorProd, the server runs KeySwtich, BlindRotate, and Packing sequentially and returns a valid MLWE ciphertext which can be multiplied using TensorProd again, as in Fig. 1.Therefore, a server can run polynomial number of times due to this bootstrapping.In detail, the following algorithms are run: (i) A BlindRotate in Algorithm 2 runs to reduce error and raise modulus from q to Q; (ii) A Packing in Algorithm 4 runs to pack outputs of BlindRotate into two MLWE ciphertexts CT 1 and CT 2 ; (iii) A TensorProd in Algorithm 3 runs to product CT 1 and CT 2 , and p LWE ciphertexts (ct ′ i ) i∈[p] are obtained by using SampleExtract.(iv) A KeySwitch in Algorithm 5 runs to generate squashed LWE ciphertext from (ct ′ i ) i∈[p] .However, after multiplying two fraction polynomials, the message in each coefficient may exceed 2 t .To solve this problem, Section IV-C introduces a sequential bootstrapping with shared primes.

A. INVESTIGATION OF SUBGAUSSIAN RANDOM VARIABLES HAVING PYTHAGOREAN ADDITIVITY
For subgaussian random variables X and Y , Corollary 1 requires boundedness of both X and Y to show that XY is subgaussian.However, we will show that it is enough to require that one is bounded and the other is subgaussian.(11) it can be shown that (i) the k-th moment is bounded as ≤ lim where (a) holds from monotone convergence of measurable functions with boundedness of M XY (s) [16] Proof: By Lemma 3, □ Note that Corollary 2 does not require that X i 's are subgaussian, meaning that E[X ] does not have to be zero contrary to Corollary 1.Most of additional errors after bootstrapping are the product form of gadget decomposition and errors in ciphertext or secret key.Since the output of deterministic gadget decomposition relies on its input, the mean value of output may be non-zero, to which Corollary 2 can be applied.

B. ERROR ANALYSIS OF THE BOOTSTRAPPING ON A SHARED PRIMES
In this section, BlindRotate, Packing, TensorProd, and KeySwitch are sequentially run for bootstrapping as in Fig. 1 and its error analysis is performed.Since Packing and TensorProd have been widely studied in FHE with GINX-bootstrapping [1], [4], their detailed algorithms are provided in Appendix, and BlindRotate and KeySwitch [1], [4] are modified to properly operate the proposed OD-FPFHE.Moreover, overall error analysis is performed and the detailed proofs are provided in Appendix.

1) SETUP BEFORE RUNNING BLINDROTATE
We have valid squashed LWE ciphertexts ct 0 ,ct 1 ,. . .,ct 2p−1 , where p denotes the precision of floating-point number system, and the accumulate polynomial is given as where the scaling factor ′′ ≥ and the coefficients m out i are chosen by the server according to the target look-up table.
In this paper, three options of ′′ are considered: (i) ′′ = if the output of BlindRotate is used to run TensorProd or to generate a ciphertext of fraction polynomial; (ii) ′′ = 2 if the output of BlindRotate is used to operate with the output of TensorProd, which has the scaling factor 2 ; (iii) ′′ = ′ if the output of BlindRotate is a ciphertext of exponent message.For a given ct i [ m], let c ∈ N be the largest number satisfying 2 c |a j of ct i and 2 c |b of ct i for all j ∈ [n], and let s ∈ N be the bit length of m such that s + c ≤ t.In addition, we adopt multi-output PBS technique with 2 c multi-output in [1]: for given server's target look-up tables g j : Z 2 s → Z 2 t for all j ∈ [2 c ], we can convert them into polynomials ACCPoly(X ) in ( 13) with the coefficients Then by using BlindRotate with ACCPoly(X ) and SampleExtract(•, i), for all i ∈ [2 c ], the server can obtain 2 c LWE ciphertexts each of which has the message g i (m), i ∈ [2 c ]. Since one term in the summation of m out i in ( 14) becomes the message g j (m) in the output of BlindRotate as in (8), the server can get the information about message bit-length of output because of publicity of ACCPoly(X).

2) RUNNING BLINDROTATE
First, we propose Algorithm 2 which is obtained by modifying the blind rotation in [5].

Algorithm 2 BlindRotate
The difference between Algorithm 2 and the blind rotation in [5] is Line 3. When S = {−1, 1}, i.e. using ternary secret key in blind rotation, the algorithm [5] runs with ACC To calculate the external product , gadget decomposition should be run with G −1 crt ((X a i − 1)ACC) and G −1 crt ((X −a i − 1)ACC), for evaluating (6).However, in Algorithm 2, only G −1 crt (ACC) is required to evaluate (6), meaning that the proposed method does not increase the number of NTT operations even if the size of S increases.Therefore, the number of (I)NTT operations (K gct + 1)l bl + 1 are required compared to the number of (I)NTT operations |S|(K gct + 1)l bl + 1 in [5].
The error analysis for the output of BlindRotate is given as Proposition 5 in Appendix.

3) RUNNING PACKING AND TENSORPROD
After ct i 's are run by BlindRotate for i ∈ [p], every p resulting ciphertexts generate one MLWE ciphertext of fraction polynomial by using Packing (see Algorithm 4).If another MLWE ciphertext from Packing is given, these two ciphertexts are multiplied by using TensorProd (see Algorithm 3) and an MLWE ciphertext containing product of two fraction polynomials is obtained.Note that Packing and TensorPrd are widely used in many FHEs [1], [6] and error analysis for the outputs of Packing and TensorProd in our case is given in Propositions 6 and 7 in Appendix.

Algorithm 3 TensorProd Input
OUT += x v x (X )Ten k,x ▷ Ten k,x is the one of evaluation key generated by the user and l ten is its gadget parameter (See Section III-C) 6: end for 7: return OUT

Algorithm 4 Packing
Input: OUT += y∈[l pack ] v y P j,x,y X i ▷ P j,x,y is the evaluation key generated by the user and l pack is its gadget parameter (See Section III-C) 5: end for 6: return OUT However, the resulting ciphertext should be re-run by bootstrapping for two reasons.First, a large error is added after TensorProd so that if TensorProd is run with this error again, the output may not be valid.Second, some coefficient of message polynomial may contain a message larger than the radix β or the degree of message polynomial is greater than p − 1 (Recall that fraction polynomial should have p message coefficients), that requires homomorphic RZ operation.Therefore, SampleExtract is applied to the output of TensorProd to generate p LWE ciphertexts again and calculate carryovers to adjust fraction polynomial homomorphically.This process is explained in Section V-B and KeySwitch is again requied to make valid squashed LWE ciphertexts.Since Packing and TensorProd have been widely used and studied in FHEs [1], [4], [5], [6], they are listed as Algorithms 4 and 3 in Appendix.

4) RUNNING KEYSWITCH
For p ciphertexts from TensorProd, KeySwitch is run to reduce the modulus and dimension such that a valid squashed LWE ciphertext is obtained.
Let ct i [ 2 m] be the LWE ciphertext extracted from the ith coefficient message by SampleExtract(•, i), and let d be the number of zeros from LSB such that m d−1 = . . .= m 1 = m 0 = 0 for m = (m t ′ −1 . . .m 1 m 0 ) (2) .Then KeySwitch plays an important role in homomorphically generating the ciphertext for s-bit message (m d+s−1 . . .m d+1 . . .m d ) (2) from the ciphertext for the message m.Since c is chosen by the server, the server can select s to satisfy s + c ≤ t, and re-run BlindRotate.

Input ct[m
Q , the start index d, the number of desirable bootstrapping bits s, and the number of multioutput 2 c .Output out Calculate the bias = 2 d+η 0 −1 ν and add it to b of ct 3: ct ← ⌊ct/ν2 η 1 +s+c+1+d−q ⌉ mod 2 q 4: Set out= (0, 0, . . ., ▷ a ct,j is the j-th coefficient of a in ct 7: out += k v k KS j,x,k 8: end for 9: return out ← ⌊out/2 q−1−log N gct ⌉2 c mod 2N gct The overall error amplification of bootstrapping is given in Lemma 4 by combining the errors from BlindRotate, Packing, TensorProd, and KeySwitch.

5) IMPORTANCE OF USING SHARED PRIME
The reason for using shared primes is to mitigate the distortion in the messages that occur when the modulus is changed from Q 0 to 2 n for the purpose of performing bootstrapping.However, if we change the modulus Q 0 to 2 n , ⌊ct • 2 n /Q 0 ⌋ is calculated and hence the scaling factor becomes ⌊ • 2 n /Q 0 ⌋.Therefore, the inherent drawback of this method is that the scaling factor becomes distorted, while the bootstrapping algorithm requires a power of two scaling factor.However, if shared primes are used and the modulus reduction is performed as in Line 1 of Algorithm 5, the approximated scaling factor ≈ ν2 η 1 of the message and the approximated modulus Q 0 ≈ ν2 η 0 share the same parameter ν with negligible error amplification (which is shown in the proof of Lemma 4).Therefore, by dividing ct by ν2 i for some i < η 1 < η 0 as in Line 3 of Algorithm 5, and Q 0 are easily reduced to the power of two.
In addition, if a non-sparse ternary secret key is used for encrypting KS j,x,k , the third error term of ( 15) is changed to O( √ nv).Although the first and the second error terms of (15) can be reduced by increasing Q 1 = ν2 η 1 + 1 and q, O( √ nv) cannot be controlled and hence it makes an error floor.However, if a sparse ternary secret key is used for encrypting KS j,x,k , the third error term can be controlled by the sparsity h.

C. SEQUENTIAL BOOTSTRAPPING ON SHARED PRIMES FOR ACCOMMODATING LARGE NUMBERS AND EVALUATING LOOK-UP TABLES
We discuss how to bootstrap ciphertexts of large messages and how to evaluate arbitrary look-up tables (LUT) with the proposed valid FPFHE with t ≥ 2. Note that TFHE/FHEWbased bootstrapping can evaluate arbitrary LUTs having input with t bits and output with t bits, typically t = 4 in a practical manner.Apparently, this does not imply the ability to perform operations on arbitrary n-bit inputoutput LUTs.Instead of arbitrary n-bit input and n-bit output LUTs, if a such LUTs can be decomposed into product forms of t-bit input and t-bit output, it can be implemented using n/t rounds of bootstrapping in a parallel manner.
However, more challenges still exist.If each n-bit messages are encoded as numbers in the range [0, 2 n −1] and encrypted in each message coefficient of m(X ) of ciphertext CT, it is difficult to perform bootstrapping on them all at once due to obtaining sign-reversed bootstrapped when most significant bit is 1, as explained in (8) Note that an attempt to overcome this problem is explained in [1].Analogously, we explain how to bootstrap when the LUT is decomposed into t-bit input and t-bit output LUTs in this section, that is called sequential bootstrapping.In comparison to [1], the sequential bootstrapping performs over integers modulo Q = Q 0 Q 1 , which is greater than 2 64 and hence admits a lot of room for message space, while [1] is limited to a small modulus.In addition, sequential bootstrapping enables to bootstrap carry-over circuit after calculating arithmetic operations.
To explain sequential bootstrapping dealing with worst errors, consider an output MLWE ciphertext of TensorProd as an input to sequential bootstrapping as in Fig. 3. First, the operation of sequential bootstrapping is explained by using a simple example.
Suppose that LWE ciphertexts ct encrypting each message m i are given (the expression m i = (m i2 m i1 m i0 ) (2) is a binary representation).the precision of each message coefficient is 3 bits, and the server wants to evaluate look-up tables f i,0 : Z 2 s → Z 2 t and f i,1 : Z 2 s ′ → Z 2 t with s = 1 and s ′ = 2, where the input to f i,0 is an LSB of the message, and the input to f i,1 is the remaining 2 bits of the message.Then, the server can evaluate each f i,j on the ciphertext by sequential bootstrapping with ACCPoly 1, 2, and 3, and its equivalent look-up tables g 0 ,. . ., g 4 as follows: The sequential bootstrapping is performed as follows: (i) At Fig. 3 a), two ciphertexts are generated by using ACCPoly 1 and ( 8); (ii) These two ciphertexts are multiplied and ct[ 2 m i,0 ] is generated.Since it contains a large error, the server runs KeySwitch and BlindRotate with ACCPoly 2 at Fig. 3 b) to optain a partial result ct[ f i,0 (m i,0 )] and ct[ 2 m i,0 ] with small error; ] and runs KeySwitch and BlindRotate with ACCPoly 3 at Fig. 3 c) to obtain ct[2 f i,1 (m i,2 m i,1 )]; (iv) Packing is run to collect every partial ciphertext at Fig. 3 d).
The correctness of sequential bootstrapping in Fig 3 relies on two facts.First, the subtraction * ) part generates negligible error amplification because the error amplification of BlindRotate is O(2 η 0 )-times less than the error amplification of TensorProd, which are shown in Propositions 5 and 7 in Appendix.Therefore, the output of BlindRotate can be added to and subtracted from the output of TensorProd poly(η 0 )-times, with negligible error amplification.Second, every output ciphertext in Fig. 3 is generated by using the proposed sequential bootstrapping in Fig. 1 (or subroutine), except the output at * ) part.Therefore, if the proposed FPFHE is valid, every output ciphertext is valid.
Finally, we formally propose a sequential bootstrapping, which is the special case of WoP-PBS on our shared prime as follows.
Lemma 5 (Sequential Bootstrapping on Shared Primes, Analogous of Lemma 5 in [1]).Suppose that a valid FPFHE with LWE message space Z 2 t for t ≥ 2 and look-up tables f 0 ,. . .,f t ′ −1 are given.Then, for each bit of the message m i = (m i,t ′ −1 . . .m i,0 ) (2) of the ciphertext ct[ 2 m i ] obtained by BlindRotate, Pack, TensorProd, and SampleExtract, a valid ciphertext CT[ ′′ f i,j (m i,j )] is generated for any scaling factor ′′ ≥ .Although Lemma 5 is similar to Lemma 5 in [1], NTT can be used in the proposed FPFHE and hence a large message can be processed without generating extra noise, contrary to the improved TFHE using FFT [1].In addition, the proposed FPFHE allows to pack up to p LWE ciphertexts for multiplying two packed ciphertexts and to bootstrap every coefficient bit by bit, by applying sequential bootstrapping, which is essential for constructing homomorphic floatingpoint operations.
In the next section, floating-point homomorphic addition and multiplication are proposed for constructing FPFHE.

V. OVERFLOW-DETECTABLE FLOATING-POINT FHE
In Section V-A, floating-point homomorphic addition and multiplication denoted as ADD and MULT are proposed.Section V-B constructs various homomorphic (sub)algorithms for constructing ADD and MULT.Section V-C proposes a homomorphic method for normalizing floating-point outputs.Finally, Section V-D introduces a homomorphic algorithms for generating ciphertext of the message indicating overflow occurrence.
Due to Lemma 5, various floating-point homomorphic algorithms can be constructed and we will implement (4,27,-511,511) and (4,12,-127,127) OD-FPFHEs as examples, which achieve double and single precision, respectively.Also, we choose LWE message space with t = 6, meaning that each least significant 5-bit messages can be sequentially bootstrapped.Note that, various ACC initial polynomials for implementing each pseudo-code of homomorphic algorithms are listed in Appendix.

A. OVERVIEW OF HOMOMORPHIC OPERATIONS FOR OD-FPFHE: ADDITION, MULTIPLICATION, AND OVERFLOW-DETECTION
For a better understanding of homomorphic operations, we briefly explain how to do floating-point homomorphic addition and multiplication of ciphertexts by using examples in Figs. 4 and 5. Also, their pseudo-codes are given in Algorithms 6 and 7, denoted as ADD and MULT.Note that all the homomorphic algorithms used for construction of ADD and MULT are proposed and explained in detail in the following sections (Therefore, you may read the following sections and then come back to this section if needed.) Addition is performed as follows (e.g.See Fig. 4 A) Addition).(i) Takes two floating-point ciphertexts; (ii) Calculate the maximum of two exponents; (iii) For each exponent, subtract exponent values from the max value and calculate the minimum of p and the subtracted value homomorphically; (iv) The difference values are sign-reversed and lifted to the monomial exponent multiplied with its sign message by ConstToExp; (v) The outputs at (iv) are multiplied with each fraction polynomial; (vi) CarryAdd bootstraps each coefficient of the output ciphertext at (v) to make it less than the precision β and move its carry to higher-degree coefficients.
In Fig. 4 A), there are two noticeable points.The first one is that the max of two exponents is added by 1. because, after moving carry to higher-degree coefficients, the p-th coefficient of fraction polynomial may become non-zero.For this case only, the resulting fraction polynomial should be divided by X and the exponent should be increased by 1 to make it a normal form.Note that, it may require a lot of computation at the server to homomorphically check the value of the p-th coefficient.However, if the server adds 1 to the max value in advance, and regards the most significant position in fraction polynomial as p, normalizing process can be easily implemented.= X 4 + X 3 .Dashed-circle is the input, dotted-circle is the input value, solid-box is the homomorphic operation requiring sequential bootstrapping, dash-single dotted box is the homomorphic operation without sequential bootstrapping.

Algorithm 6 ADD
The second one is that the input to ConstToExp is forced to take a value between 0 and p by applying Min at (iii).If the monomial obtained at (iv) has the degree less than or equal to −p, multiplication of this monomial and each fraction polynomial, as performed at (v), generates a new fraction polynomial with coefficient 0 for the term X i , 0 ≤ i < p.However, time consumption of the proposed ConstToExp depends on the maximum input size, and therefore forcing its input less than or equal p enhances the overall speed of ADD.
For multiplication as in Fig. 5 A), (i) Take two floatingpoint ciphertexts; (ii) Multiply two fraction polynomials; (iii) CarryMul bootstraps each coefficient of the output at (ii) to make it less than the precision β and moves its carry to higherdegree coefficients; (iv) Add two exponents; (v) Multiply two signs.

Algorithm 7 MULT
For both ADD and MULT, the homomorphic operation output is converted into a normal form by applying Normalize as explained in Section V-C.The resulting exponent after performing ADD or MULT is examined to check if an overflow occurs or not by GenProof as in Figs. 4  C) and 5 C), and a ciphertext CT ′ proof containing overflow information is generated, as explained in Section V-D.

B. VARIOUS NON-LINEAR HOMOMORPHIC ALGORITHMS FOR FLOATING-POINT HOMOMORPHIC ADDITION AND MULTIPLICATION
In this section, various non-linear homomorphic algorithms are introduced which are necessary for ADD and MULT.
First, we propose Max in Algorithm 8 to calculate the maximum of two exponent values.The correctness follows  exp , assume that both messages take the value between e min and e max .Since the magnitude of message in CT (1) exp − CT (2) exp is less than 2 e , the server can add 2 e ′ to it and check whether m e is still one or not by using sequenctial bootstrapping in Line 2. Then, a ciphertext CTtmp e [m e ] can mask other ciphertexts after processing the loop Line 3-5, which is equivalent to ReLU.
Next, we propose a homomorphic algorithm for lifting a constant message m ′ to the monomial exponent message as X m in ConstToExp (Algorithm 9), which is used for equalizing exponent values before doing addition and for normalizing the resulting ciphertext after doing addition or multiplication.For the former case, ConstToExp returns a ciphertext containing the message m s X −m for given sign message m s , and for the latter case, returns a ciphertext containing the message X m .end for 8: end for 9: return Out The correctness of ConstToExp in Algorithm 9 is similar to the correctness of BlindRotate.Suppose that m = (m e m e−1 . . .m 0 ) (2) and CT sign [m s ].When i = 0 in Line 4, the message 2 ((X − 1)m 0 + 1) = 2 X m 0 is assigned to Out.By induction on i, we can show that the message 2 X m i ...m 0 (2)  is assigned to Out if the previous message is 2 X m i−1 ...m 0 (2) .In addition, we know that Out in Line 4 can be bootstrapped with sufficiently large Q 1 since the error added to Out is relatively small by Propositions 6 and 7.

Algorithm 9 ConstToExp Input
Next, a homomorphic carryover algorithm for addition is proposed in Algorithm 10, denoted as CarryAdd, which is a core part of performing carryover during addition of two fractions.Let π : Z → Z β , π (x) = x mod β be a message-extraction function.After two ciphertexts are added, message in each coefficient should be adjusted by using π and if a carry appears, it should be added to higher-degree coefficients.
There are many ways to move carry, and first we give abstract definition of carry system.Let c i→j : Z → Z be a carry function for all i, j ∈ N with i < j.After defining a carry collection C j by using c i→j and C 0 ,. . .,C j−1 , recursively, we define a carry system C over polynomial ring Z[X ] as follows: where C 0 n i=0 α i X i = α 0 , i.e. the constant coefficient does not take a carry.Therefore, the carry system C is dependent only on carry functions c i→j .
Intuitively, the carry collection C j adds every carry (c i→j • C i )(α(X )) to the j coefficient α j from all coefficients strictly less than j.In addition, we will call C a valid carry system if i.e. sharing the same value when evaluating β.For ADD, c i→i+1 (x) = (x − π (x))/β is used for all i ∈ N.

Algorithm 10 CarryAdd
contains a message m with = 1 if the message of ct ′′ i is zero, and m = 0 otherwise.

11: end for
The correctness of CarryAdd is as follows: The fraction polynomial is multiplied by X so that the least significant coefficient can take a carry from the previous coefficient.Note that the position of most significant coefficient is p + 1, not p − 1, since the fraction polynomial is multiplied by X , and the exponent message is increased by 1 in advance.
After CarryAdd runs at every iteration on Line 2, a last carry ct c appears at p + 1 coefficient and its sign becomes the sign of the output of adding two ciphertexts.However, if this sign is negative, a packed message from ct ′ 0 , . ..,ct ′ p+1 is sign-reversed, and to fix this problem, CT ′ sign in Line 6 is multiplied to the packed ciphertext to CT ′ in Line 7. While adjusting and moving carries in Line 8, CarryAdd also checks whether each coefficient of C(m(X )) is zero or not, and generates a ciphertext IsZero i .Note that IsZero i is used to make the message in normal form in Section V-C.
Similar to CarryAdd, a homomorphic carryover algorithm for multiplication CarryMul is proposed in Algorithm 11 using a valid C and carry functions c i→j .

Algorithm 11 CarryMul
for all j > i, from the ciphertext Tmp 5: Update carry ct c j += ct cc j for all j > i. 6: end for 7: return Packing(ct We use the following carry functions for multiplication.For the given i-th coefficient of l-bit message m = (m l−1 . . .m 0 ) (2) , set c i→i+1 (m 2 ) = (m 3 m 2 ) (2) and c i→i+2j (m) = (m 4j+3 m 4j+2 m 4j+1 m 4j ) (2) for all j ≥ 1.Note that it is not trivial to design look-up tables for generating ciphertexts in Line 4 and we list various look-up tables (ACCPoly(X)) for implementing above carry system in Appendix.
For accelerating CarryMul, an upper-bound of C i (m 1 (X ) m 2 (X )) is derived for any valid fraction polynomials m 1 (X ) and m 2 (X ) in Proposition 4. Proposition 4. Suppose that two polynomials a(X ), b(X ) ∈ N[X ] are given where a i ≤ b i , and carry functions c i→j : N → N are given for all i, j ∈ N such that c i→j (x) ≤ c i→j (y) if x ≤ y for all x, y ∈ N. Then the inequality C j (α(X )a(X )) ≤ C j (α(X )b(X )) holds for all j and any α(X ) ∈ N[X ].
Proof is provided in Appendix.By Proposition 4, , where m max (X ) = (β − 1) p−1 j=0 X j .Therefore, for any carry functions c i→j , the maximum value of each coefficient is less than or equal to C i (m 2 max (X )) while processing carrys.From Proposition 4, we can derive the condition for the modulus Q 0 as discussed in Section III-B, (iii)) such that 2 η 0 −η 1 > log max i C i (m 2 max (X ) .Otherwise, messages may be deformed due to small Q 0 .Moreover, this upper-bound can be determined in advance, and the range of index i in Line 2 of CarryMul can be reduced since the server knows the worst-case smallest index which affects the least significant position p − 1 by using max i C i (m 2 max (X )).

C. HOMOMORPHIC ALGORITHMS FOR FLOATING-POINT OUTPUTS
After doing CarryAdd or CarryMul, fraction and exponent should be adjusted to a normal form.The first step is to count the number of consecutive zeros from the most significant in the fraction until nonzero value appears and we propose HomCount in Algorithm 12 for that purpose.

Algorithm 12 HomCount Input
where m is the number of consecutive zeros from the most significant until nonzero value appears Out += CT 2 6: end for 7: return Out[m ′ ] ← Bootstrap Out Note that the message of CT 1 and CT 2 are the same except the scaling factor.In Line 4 of Algorithm 12, TensorProd(CT 1 , IsZero i ) works as AND gate meaning that the message of its output is only if the messages of both CT 1 and IsZero i are .Therefore, every returned ciphertext CT 1 in Line 4 encrypts until the message of IsZero i is 0 at some index i for the first time, and encrypt 0 afterwards.Then, all the returned ciphertexts CT 2 are added to generate Out in Line 5, which has the message scaled by 2 .Finally, we bootstrap Out to obtain a ciphertext containing the message about the number of consecutive zeros starting from the most significant until nonzero significant value appears.By using HomCount, we construct Normalize in Algorithm 13 for normalizing fraction and exponent of the output ciphertext.Since the number of consecutive zeros in the fraction is counted by HomCount, Normalize can subtract this value from the exponent ciphertext.However, since the subtracted message may be less than e min , Normalize evaluates Min in Line 1 and then subtracts this min value from the input exponent ciphertext in Line 4. Note that CT min exp of constant message is converted to CT tmp of monomial with this constant message as its exponent by ConstToExp in Line 2. Finally, all out ciphertext out i are packed into one MLWE ciphertext.

D. GENERATING A PROOF TO DETECT OVERFLOW AND CONSTRUCTING OD-FPFHE
In this section, we propose GenProof in Algorithm 14 to generate a ciphertext, called as a proof containing the message indicating whether ξ -overflow occurs or not.Note that we will use the threshold ξ = 2 β max +1 and an auxiliary numbers e ′ ≥ ⌊log max(|e max − 1|, |e max − 2e min + 1|)⌋ + 1 in Algorithm 14 to examine the value of exponent, which is used for the proposed OD-FPFHE.The GenProof operates as follows: GenProof operates similar to Max as follows: If the previous proof has the message m pf is 0, i.e., an overflow does not occur while performing the previous operations, then the message m is in 2e min ≤ m ≤ 2e max .Since m ′ ≜ e max −m + 1 is strictly positive if and only if m ≤ e max , the e ′ -th bit in the binary representation of 2 e ′ −m ′ is one if and only if m ′ is strictly positive, meaning that CT ′ proof in Line 2 has the message about whether m > e max or not.Otherwise, if the previous proof message m pf has a non-zero message, then it indicates that an overflow has already occurred, hence CT ′ proof has a non-zero message.Finally, a user can check whether an overflow occurs or not by decrypting the returning proof that is a sum of all the previous proofs.Therefore, by combining FPFHE with bootstrapping failure probability 2 − (v) given in Lemma 4 and an arithmetic circuit family C which contains a circuit f having poly(v) bounded operations, (β e max +1 , C)-OD-FPFHE is constructed.

VI. SECURITY ANALYSIS AND SIMULATION RESULTS
As other FHEs, the proposed OD-FPFHE takes keydependent message (KDM) and circular security assumptions to generate public keys [2], [22], [26].To determine concrete parameter values of OD-FPFHE for achieving target security, we estimate the computational complexity of Primal uSVP and dual lattice attack using k-block BKZ with SVP oracle having the sieving cost 2 0.292k+16.4[27].In addition, we apply hybrid primal and dual attack [28] to LWE key-switching key encrypted by h-sparse sk-ks.

A. SIMULATION RESULTS
Every simulation is performed by running Ubuntu 20.04 LTS over Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz having 20 core 40 threads and 256 GB of RAM.PALISADE is compiled with the following CMake flags: WITH-NATIVEOPT = ON (machine-specific optimizations are applied by the compiler), WITH-INTEL-HEXL = ON (AVX-512 acceleration is used), and WITH-TCM = ON (Tcmalloc is used, which is suitable for multi-thread processing), by Clang++ 10.0.0.
In Table 1, the number after D (double precision) refers to the security level in bits.For instance, D128 guarantees 128-bit security of double precision OD-FPFHE under Primal, Dual, and hybrid attacks.Note that OD-FPFHE with D128 can deal with the ciphertexts for both double and single precision messages.
We implement (4,27,-511,511) and (4,12,-127,127) floating-point number systems by using PALISADE v1.11.We simulate ADD and MULT in Algorithms 6 and 7 by using 1 (single-core), 4, and 10 threads using the parameters in Table 1 and list the operation time in Table 2.In addition, we simulate addition and multiplication time per threads, which is also listed in Table 2 as Amortized time.Therefore, if many threads are available, run time is expected to be close to the amortized time if a circuit is evaluated parallel such as matrix multiplication.However, if a circuit is evaluated by sequential operations, run time is expected to be close to the Time (10 thread) in Table 2.

B. PERFORMANCE EVALUATION OF THE PROPOSED OD-FPFHE
The speed of ADD and MULT of the proposed OD-FPFHE with D128 is 100x and 2.6x times faster than those of TFHE-based implementation [15].While the floating-point implementation in [15] inefficiently utilized the existing homomorphic encryption, the proposed approach redesigned homomorphic encryption to align with floating-point operations in the plaintext domain, resulting in a significant speed improvement.Also, the speed of ADD is 28.8x faster than that of CKKS-based implementation [15].However, the speed of MULT of CKKS-based implementation [15] is faster than that of the proposed OD-FPFHE because the implementation in [15] does not normalize the result and hence Proposition 3 is not guaranteed.
Next, we arbitrarily choose double precision and single precision messages x without encoding error, i.e.Decode(Encode(x)) = x as follows:  error between correct value and decryption result is bounded as expected from Proposition 3 if an overflow does not occur.All the simulation codes are open to the public.

C. CONCLUSIONS AND FUTURE WORKS
We proposed a floating-point fully homomorphic encryption with homomorphic normalization.Since floating-point number systems are is widely used in many areas such as high-precision deep learning and control systems, the proposed FPFHE can guarantee both privacy and accuracy for many precise applications.In addition, we proposed an OD-FPFHE which also has many applications.For instance, it is quite useful for solving satellite collision problem and performing accurate continual learning while keeping the privacy of training data because the encrypted training data causing an overflow can be excluded at the training stage to avoid the degradation in learning.
However, there are still many issues to be studied.First, we do not propose homomorphic floating-point division algorithm.Since algebraic structures of many FHEs are not in Euclidean domain, a simple and natural division algorithm is not trivial, and hence we have been searching an effective and fast division circuit suitable for the proposed FPFHE.In addition, floating-point homomorphic elementary functions such as exponential, logarithm, and N -th root functions are also desirable in privacy-preserving machine learning.
One of the critical disadvantages of current OD-FPFHE is slow operation time.However, the operation speed can be improved in many aspects as follows: Since a large Q affects bootstrapping time, reducing Q should be investigated.For instance, randomized gadget decomposition is reported that it reduces error amplification after running GSWlike multiplication [21].Therefore, effective randomized gadget decomposition for OD-FPFHE and both rigorous and practical error analysis will improve the operation time.
Also, studies of accelerating speed of FHEs on GPUs have been performed [29] and in the near future, OD-FPFHE is expected to benefit from such hardware acceleration, potentially leading to the improved performance.

APPENDIX A PROOF OF PROPOSITION 4
We prove Proposition 4 by induction on j of C j .Assume that C 0 , . . ., C j−1 satisfy Proposition 4. Since every coefficient of b(X ) is greater than or equal to the corresponding coefficient of a(X ), C 0 (a(X )α(X )) = a 0 α 0 ≤ b 0 α 0 ≤ C 0 (b(X )α(X )) holds.Then for the index j, C j (a(X )α(X )) ( Proof: When Algorithm 2 runs with i on Line 2, by using (7) and CMux gate analysis in Section 3.4 of [4], the additive error is derived as follows: (X a i − 1)G −1 crt (ACC (i) )E 1 j (X ) (X −a i − 1)G −1 crt (ACC (i) )E −1 j (X ) (X a i − 1)A 1 j (X ) + (X −a i − 1)A −1 j (X ) sk − bl(X ) j , where ACC (i) is the computed value after the (i − 1)-st iteration on Line 3, A 1 j (X ) and A −1 j (X ) are gadget error polynomials, and E 1 j (X ) and E −1 j (X ) are j-column error polynomials of BL 1 i and BL −1 i , respectively.Since the errors E 1 j (X ) and E −1 j (X ), and the secret key sk − bl j (X ) follow symmetric distribution and each of them is multiplied by independent and bounded random variable, then the summands in each summation in (18) have Pythagorean additivity by Corollary 2. By induction on i, we can obtain (17) by using (2) and Proposition 1. □ FHE = (KeyGen, Enc, Dec, Eval) such that (i) it uses (β, p, e min , e max )-floating-point numbers as its message space; (ii) it has an operation set O ⊆ {+, −, •, /} (iii); it uses a rounding function R, i.e. for ⊤ ∈ O and any x, y with β e min ≤ |x⊤y| ≤ (β − β 1−p )β e max , the ciphertexts ct[x] and ct[y] satisfy the inequality Dec Eval(ct[x], ct[y], ⊤) − R(x⊤y) ≤ β 1−p min x⊤y, R(x⊤y) , except negligible probability.Definition 6 (ξ -Overflow).For a message space M with norm | • | and ξ

FIGURE 1 .
FIGURE 1.Comparison of key-chain and bootstrapping between improved TFHE and the proposed FPFHE.
and X and Y are uncorrelated, then XY ∼ subG( √ 8Bσ ).Proof: Since X and Y are uncorrelated, E[XY ] = E[X ]E[Y ] = 0. Due to the following inequality

√ 8 inLemma 3 .
is a gamma function, and (ii) the moment generating function is bounded as M XY (s) ≤ exp(4B 2 σ 2 s 2 )[17].Therefore XY ∼ subG( Lemma 2 is undesirable and by putting additional condition on Y , √ 8 can be removed as follows: Let X be a B-bounded random variable and Y be σ -subgaussian with symmetric distribution, i.e.E[Y 2n−1 ] = 0 for all n ∈ N. If X and Y are independent, then XY ∼ subG(Bσ ).Proof: Since X and Y are independent, E[XY ] = E[X ]E[Y ] = 0.By Lemma 2,there exists a measurable function point-wisely larger than the moment generating function M XY (s) for all s ∈ R.Then, for any s ∈ R,

FIGURE 3 .
FIGURE 3.An example of sequential bootstrapping to evaluate look-up tables (f i ,j ) when t = 2.

Proposition 5 .
Assume that BlindRotate in Algorithm 2 runs with a valid squashed ct and returns out α ∈ Z N gct K gct +1 Q for the message (ACCPoly(X ) X −ϕ(ct) ) α .Then for any α ∈ [N gct ], the error in α-coefficient, denoted as E (α) bl of out α , is bounded except with probability 2 − (v) as follows: |E (α) bl | = O B bl vN gct K gct n + σ nl bl .
E pack and E The proof of Lemma 4 is provided in Appendix.If the overall error amplification E tot in (15) is less than or equal to 2N gct /2 t , the output of KeySwitch is a valid ciphertext and hence BlindRotate can be applied again.We call that FPFHE is valid if its parameters satisfy |E tot | ≤ 2N gct /2 t .Intuitively, a valid FPFHE can bootstrap ciphertext for the message (m t−1 m t−2 . . .m 0 ) (2)

TABLE 1 .
Concrete parameters of OD-FPFHE for various security levels.

TABLE 2 .
Time consumption of ADD and Mult for various parameter values (second).