Optimization of Homomorphic Comparison Algorithm on RNS-CKKS Scheme

Since the sign function can be used to implement the comparison operation, max function, and rectified linear unit (ReLU) function, several studies have been conducted to efficiently evaluate the sign function in the Cheon-Kim-Kim-Song (CKKS) scheme, one of the most promising fully homomorphic encryption schemes. Recently, Lee et al. (IEEE Trans. Depend. Sec. Comp.) proposed a practically optimal approximation method of sign function on the CKKS scheme using a composition of minimax approximate polynomials. In addition, Lee et al. proposed a polynomial-time algorithm that finds degrees of component polynomials minimizing the number of non-scalar multiplications for homomorphic comparison/max/ReLU functions. However, homomorphic comparison/max/ReLU functions using Lee et al.’s approximation method have not been successfully implemented on the residue number system variant CKKS (RNS-CKKS) scheme. In addition, the degrees of component polynomials found by Lee et al.’s algorithm are not optimized for the RNS-CKKS scheme because the algorithm does not consider that the running time of non-scalar multiplication depends much on the ciphertext level in the RNS-CKKS scheme. In this paper, we propose a fast algorithm for inverse minimax approximation error, a subroutine required to find the optimal set of degrees of component polynomials. This proposed algorithm makes it possible to find the optimal set of degrees of component polynomials with higher degrees than the previous study. In addition, we propose a method to find the degrees of component polynomials optimized for the RNS-CKKS scheme using the proposed algorithm for inverse minimax approximation error. We successfully implement the homomorphic comparison, max function, and ReLU function algorithms on the RNS-CKKS scheme with a low comparison failure rate (< 2.15) and provide the various parameter sets according to the precision parameter α. We reduce the depth consumption of the homomorphic comparison, max function, and ReLU function algorithms by one depth for several α. In addition, the numerical analysis demonstrates that the homomorphic comparison, max function, and ReLU function algorithms using the degrees of component polynomials found by the proposed algorithm reduce running time by 6%, 7%, and 6% on average compared with those using the degrees of component polynomials found by Lee et al.’s algorithm, respectively.


I. INTRODUCTION
H OMOMORPHIC encryption (HE) is a cryptosystem that allows some algebraic operations on encrypted data. Fully homomorphic encryption (FHE) is the HE that allows all algebraic operations on encrypted data, and Gentry proposed the first FHE scheme using bootstrapping in [1].
Then, FHE has attracted significant attention in various applications, and its standardization process is in progress.
The Cheon-Kim-Kim-Song (CKKS) [2] scheme, one of the representative FHE schemes, allows the addition and multiplication of real and complex numbers. Since data is usually represented by real numbers, the CKKS scheme that can deal with real numbers has attracted much attention in many applications such as machine learning [3]- [6]. Thus, lots of research has widely been done to optimize the CKKS scheme [7]- [11]. In particular, Cheon et al. [7] proposed the residue number system variant CKKS scheme (RNS-CKKS). The running time of the RNS-CKKS scheme is ten times faster than that of the original CKKS scheme with one thread. In addition, the running time performance can be more improved in the multi-core environment because the RNS-CKKS scheme enables parallel computation. Thus, many HE libraries such as SEAL [12], PALISADE [13], and Lattigo [14] are implemented using the RNS-CKKS scheme.
Although the CKKS scheme can support virtually all arithmetic operations on encrypted data, several applications require non-arithmetic operations. One of the core nonarithmetic operations is the comparison operation, denoted as comp(a, b), and this outputs 1 if a > b, 1/2 if a = b, and 0 if a < b. This comparison operation is widely used in various real-world applications, including machine learning algorithms such as support-vector machines, cluster analysis, and gradient boosting [15], [16]. The max function and the rectified linear unit (ReLU) function are other essential nonarithmetic operations that are widely used in deep learning applications [17], [18]. These three non-arithmetic operations can all be implemented using the sign function sgn(x), that is, comp(a, b) = 1 2 (sgn(a − b) + 1), max(a, b) = 1 2 (a + b + (a − b) sgn(a − b)), where sgn(x) = x/|x| for x ̸ = 0, and 0 otherwise. Thus, several studies have been conducted to efficiently implement the sign function on the CKKS scheme [9], [19]. A method to approximate sgn(x) using the composition of component polynomials was proposed in [19], and it was proved that this method achieves the optimal asymptotic complexity.
In addition, authors in [9] proposed a practically optimal method that approximates sgn(x) with the minimum number of non-scalar multiplications using a composition of minimax approximate polynomials. Although authors in [9] proposed a comparison operation algorithm with practically optimal performance on the CKKS scheme, there are other research topics on comparison operation on the RNS-CKKS scheme to study further. First, since the rescaling error is somewhat large in the RNS-CKKS scheme, unlike in the CKKS scheme, it is required to deal with this somewhat large rescaling error well to achieve low approximation failure rates. Another research topic is to find a set of degrees of component polynomials that provides better comparison operation performance. Although authors in [9] also proposed a polynomial-time algorithm that determines the set of degrees minimizing the number of nonscalar multiplications, this set of degrees is not optimized for the RNS-CKKS scheme, unlike the CKKS scheme. This is because the running time of a non-scalar multiplication changes much with the current ciphertext level on the RNS-CKKS scheme. Thus, if we optimize the degrees of component polynomials considering the running time of non-scalar multiplication according to ciphertext level, the performance will be improved further.

A. OUR CONTRIBUTIONS
There are three contributions in this paper as follows.
1) For the first time, we successfully implement the homomorphic comparison, max function, and ReLU function algorithms using a composition of minimax approximate polynomials on the RNS-CKKS scheme with low failure rate (< 2 −15 ), and we provide proper parameter sets. 2) We improve the performance of an algorithm to find the inverse minimax approximation error, which is a subroutine to find the optimal set of degrees of component polynomials. While the optimal set of degrees of component polynomials that minimizes the number of nonscalar multiplications was found among degrees only up to 31 in the previous study [9], we find the optimal set of degrees of component polynomials among degrees up to 63 using the improved algorithm for inverse minimax approximation error (see Algorithm 7). As a result, the depth consumption of homomorphic comparison operation (resp. max/ReLU functions) is reduced by one depth when α is 9 or 14 (resp. when α is 16, 17, or 18), enabling one more multiplication operation. In addition, this improved algorithm for inverse minimax approximation error enables finding a set of degrees of component polynomials optimized for homomorphic comparison operation, max function, or ReLU function on the RNS-CKKS scheme (see Section IV). Our source code of finding optimized degrees is available at https:// github.com/eslee3209/MinimaxComp_degrees. 3) We propose a method to find the set of degrees of component polynomials optimized for the homomorphic comparison, max function, and ReLU function on the RNS-CKKS scheme using the proposed fast algorithm for inverse minimax approximation error. Using the optimized set of degrees for the RNS-CKKS scheme obtained from the proposed algorithm, we reduce the running time of the homomorphic comparison, max function, and ReLU function algorithms by 6%, 7%, and 6%, respectively, compared to the previous work in [9] on the RNS-CKKS scheme library SEAL [12].

B. RELATED WORKS
While it is not difficult to perform comparison operation (or max/ReLU function) in bit-wise FHE, such as the fastest homomorphic encryption in the West (FHEW) [20] or the fast fully homomorphic encryption over the torus (TFHE) [21], the comparison operation is very challenging in wordwise FHE such as the CKKS scheme. Thus, several studies on comparison operation in the CKKS scheme that uses the evaluation of approximate polynomials have been conducted [9], [19], [22]. Among them, the comparison operation proposed in [9] has the best performance, and we improve the performance of [9]. Another comparison operation method on the CKKS scheme that uses FHEW/TFHE bootstrapping was recently studied [23]- [26]. Although this approach uses FHEW/TFHE bootstrapping, users can still use efficient word-wise operations in the CKKS scheme with this approach, and when a comparison operation is required, users switch the ciphertexts to FHEW/TFHE ciphertexts and perform the comparison operation using FHEW/TFHE bootstrapping. This comparison method that uses FHEW/TFHE bootstrapping can be less efficient than the proposed homomorphic comparison that fully uses the CKKS packing in terms of amortized running time (running time per one comparison operation). However, this comparison method is still interesting research topic because this can have advantages in the case of large-precision comparison.

C. OUTLINE
The remainder of this paper is organized as follows. Section II describes preliminaries regarding the notation, the RNS-CKKS scheme, scaling factor management technique, and homomorphic comparison operation using minimax composite polynomial. In Section III, a fast algorithm to find the inverse minimax approximation error is proposed. A new algorithm that finds the set of degrees of component polynomials optimized for the homomorphic comparison on the RNS-CKKS scheme is proposed in Section IV. In Section V, the application to the min/max and ReLU functions is presented. In Section VI, numerical results for the homomorphic comparison, max function, and ReLU function algorithms that use the proposed set of degrees for the component polynomials are provided on the RNS-CKKS scheme library SEAL. Finally, concluding remarks are given in Section VII.

A. NOTATION
Let R = Z[X]/(X N + 1) and R q = R/qR be the polynomial rings, where N is a power-of-two integer. Let C = {q 0 , q 1 , · · · , q ℓ−1 } be the set of positive integers that are coprime each other. Then, for a ∈ Z Q , where Z Q is the set of integers modulo Q and Q = ℓ−1 i=0 q i , we denote the RNS representation of a with regard to C by [a] C = ([a] q0 , · · · , [a] q ℓ−1 ) ∈ Z q0 × · · · × Z q ℓ−1 . For the set of real numbers R and the set of complex numbers C, a field isomorphismτ : In particular, if a = 1 − τ and b = 1 + τ for some τ ∈ (0, 1), thenR a,b =R 1−τ,1+τ is denoted by R τ . |{(n 1 , n 2 , · · · , n i ); S(n 1 , · · · , n i )}| denotes the number of tuples (n 1 , · · · , n i ) such that the statement S(n 1 , · · · , n i ) is true. α max , ℓ max , m max , n max , and t max denote the upperbound of precision α, ciphertext level, the number of nonscalar multiplications, depth consumption, and running time, respectively. These values should be set large enough, and thus we set α max = 20, ℓ max = 30, m max = 70, n max = 40, and t max = 240 in this paper. d max denotes the upper-bound of degrees of component polynomials, and d max of 31 or 63 is used in this paper.
This algorithm over integers Conv C→B (·) : i=0 Z pi can be extended to an algorithm over the polynomial rings as Conv C→B (·) : Then, the basic algorithms in the RNS-CKKS scheme are described as follows: -Setup(λ; ∆, L): For a security parameter λ, a scaling factor ∆, and the number of levels L (also called the maximum level), we set some parameters. The polynomial degree N of R is chosen so that the number of levels L can be supported with the security λ. A secret key distribution χ key , an error distribution χ err over R, and an encryption key distribution χ enc are chosen according to the security parameter λ. Bases with prime numbers B = {p 0 , p 1 , · · · , p k−1 } and C = {q 0 , q 1 , · · · , q L } are selected so that and q j ≡ 1 mod 2N for 0 ≤ j ≤ L. q 0 is usually set close to 2 60 , and q j − ∆ are as small as possible for 1 ≤ j ≤ L. All prime numbers are distinct. Let Then, the following numbers are computed as: -KSGen(s 1 , s 2 ): This algorithm generates the switching key for switching the secret key s 1 to s 2 . First, sample (a ′(0) , · · · , a ′(k+L) ) ← U ( and e ′ ← χ err . Then, for given s 1 , s 2 ∈ R, output the switching key swk = ( -KeyGen(λ): This algorithm generates the secret key, public key, and the evaluation key. First, sample s ← χ key and set the secret key sk ← (1, s). Sample (a (0) , · · · , a (L) ) ← U ( L j=0 R qj ) and e ← χ err . Then, the public key is pk ← (pk (j) = (b (j) , a (j) ) ∈ R 2 qj ) 0≤j≤L , where b (j) ← −a (j) · s + e mod q j for 0 ≤ j ≤ L. The evaluation key is evk ← KSGen(s 2 , s). -Enc(z; pk, ∆): For a message slot z ∈ C N/2 , compute m = Ecd(z; ∆). Then, sample v ← χ enc and e 0 , e 1 ← χ err . Then, output the ciphertext ct = (ct (j) ) 0≤j≤L ∈ L j=0 R 2 qj , where ct (j) ← v · pk (j) + (m + e 0 , e 1 ) mod q j for 0 ≤ j ≤ L. -Dec(ct; sk, ∆): For a ciphertext ct = (ct (j) ) 0≤j≤ℓ ∈ ℓ j=0 R 2 qj , obtainm = ⟨ct (0) , sk⟩ mod q 0 . Then, output z = Dcd(m; ∆).
-Add(ct 1 , ct 2 ): For two ciphertexts ct r = (ct (j) r ) 0≤j≤ℓ for r = 1, 2, output the ciphertext ct add = (ct In this paper, we set the key distribution χ key = HW T N (256), which samples an element in R with ternary coefficients that have 256 nonzero values uniformly at random.

C. SCALING FACTOR MANAGEMENT
A technique of eliminating the large rescaling error in the RNS-CKKS scheme was proposed in [27], where different scaling factors in different levels were used instead of using the same scaling factor for each level. If the maximum level is L, and the ciphertext modulus for level i is q i , the scaling factor for each level is set as follows: When two ciphertexts at the same level are multiplied homomorphically, they do not introduce approximate rescaling error. Then, we consider when two ciphertexts are in the different levels: levels i and j such that i > j. In this case, the moduli q i , q i−1 , · · · , q j+1 in the first ciphertext are dropped, and then the first ciphertext is multiplied by a constant ⌊ ∆j qj+1 ∆i ⌉. Then, we rescale the first ciphertext by q j+1 . Since both ciphertexts are now at the same level, conventional homomorphic multiplication can be performed. Also, the approximate rescaling error is decreased in this way.

D. HOMOMORPHIC COMPARISON OPERATION USING MINIMAX COMPOSITE POLYNOMIAL
In this paper, the required depth consumption and the number of non-scalar multiplications for evaluating a polynomial of degree d with odd-degree terms using the odd babystep giant-step algorithm and the optimal level consumption technique are denoted by dep(d) and mult(d), respectively. The values of dep(d) and mult(d) for odd degrees d up to 63 are presented in Table 1.
The minimax approximate polynomial of degree at most d on D for sgn(x) is denoted by MP(D; d). In addition, for the minimax approximate polynomial p(x) = MP(D; d), the minimax approximation error max D ∥p(x) − sgn(x)|| ∞ is denoted by ME(D; d). It is known that for any continuous function f on D, the minimax approximate polynomial of degree at most d on D is unique [29]. Furthermore, the minimax approximate polynomial can be obtained using the improved multi-interval Remez algorithm [30].
For a domain D =R a,b and a set of odd integers {d i } 1≤i≤k , a composite polynomial p k • · · · • p 1 is called a minimax composite polynomial on D for {d i } 1≤i≤k , denoted by MCP(D; {d i } 1≤i≤k ), if the followings are satisfied: Since ME(R τ ; d) is a strictly increasing function of τ , its inverse function exists, which is called inverse minimax approximation error and denoted by IME(τ ′ ; d). That is, for τ ∈ (0, 1) and d ∈ N, IME(τ ′ ; d) is equal to a value τ ′ ∈ (0, 1) that satisfies ME(R τ ′ ; d) = τ . An approximate value of IME(τ ′ ; d) can be obtained using binary search as in Algorithm 1 [9].
Input: Precision parameters α and ϵ Output: The comparison operation is denoted as The procedure of obtaining an approximate value of comp(a, b) for given precision parameters α, ϵ and inputs a, b ∈ [0, 1] is summarized as follows: Input: Inputs a, b ∈ (0, 1), precision parameters α and ϵ, depth consumption D, and margin η Output: ComputeMinDep and ComputeMinMultDegs algorithms use ComputehG algorithm as a subroutine, and MinimaxComp algorithm uses ComputeMinMultDegs algorithm as a subroutine. Then, the output of MinimaxComp satisfies the following comparison operation error condition: The set of degrees M degs = {d 1 , · · · , d k } obtained from the ComputeMinMultDegs algorithm satisfies deg(p i ) = d i , 1 ≤ i ≤ k. M degs is the optimal set of degrees such that the homomorphic comparison operation minimizes the number of non-scalar multiplications and satisfies the comparison operation error condition in (1) for the given depth consumption D.
If we want to use d max = 63, the number of computations of IME(τ ′ ; d) is 117, 675.
To obtain a precise approximate value of IME(τ ′ ; d) using the AppIMEbinary algorithm, it is required to iterate at least ten times. Then, the expected number of computations of ME(R τ ; d) in ComputehG is at least 117, 675 × 10 = 1, 176, 750. It should be noted that this is the case for only one value of a precision parameter α, where the input τ of ComputehG algorithm is 2 1−α . To perform the ComputehG algorithm for α from 4 to 20, around 1, 176, 750 × 17 = 20, 004, 750 calls for ME(R τ ; d) are required. Because of the large number of calls for ME(R τ ; d), the value of d max larger than 31 could not be used in [9], failing to improve the performance of homomorphic comparison operation using higher degrees. Thus, it is desirable to study how to efficiently find approximate value of IME(τ ′ ; d).

A. PROPOSED ALGORITHM FOR INVERSE MINIMAX APPROXIMATION ERROR
We propose a fast method to find the approximate value of IME(τ ′ ; d), which enables using a value of d max larger than 31. Our procedure of the proposed method is given as follows: 1) Sample the values of τ at moderate intervals.
2) Compute the values of ME(R τ ; d) for the sampled τ .
3) For τ ′ ∈ (0, 1), obtain an approximate value of IME(τ ′ ; d) by interpolation using the computed sample values of ME(R τ ; d). For α max , which is the upper-bound of α, we consider sampling τ between 2 −αmax−1 and 1−2 −αmax−1 . If τ is close to zero or one, sampling should be very dense. However, sampling the whole range densely between 2 −αmax−1 and 1 − 2 −αmax−1 requires a large number of samples. Thus, we propose to sample densely when τ is close to zero or one, and sparsely, otherwise.
Specifically, we first sample t uniformly between −α max − 1 and −1. Then, we compute ME(R 2 t ; d) for the sampled values of t. In addition, we sample t uniformly between 1 and α max +1 and compute ME(R 1−2 −t ; d) for the sampled values of t. Using interpolation with these samples, we can achieve a precise approximation of IME(τ ′ ; d) with a smaller number of samples. Precisionn determines how frequently the values of t are sampled, and we setn = 10. For a given maximum degree d max and a precisionn, StoreME algorithm in Algorithm 6 stores 2 t (resp. 1−2 −t ) in a two-dimensional tableX and ME(R 2 t ; d) (resp. ME(R 1−2 −t ; d)) in a two-dimensional tableỸ for all degrees d, 3 ≤ d ≤ d max . Forn = 10 and α max = 20, the number of calls for ME(R τ ; d) is 6, 030 for d max = 31 and 12, 462 for d max = 63.
AppIME algorithm in Algorithm 7 outputs an approximate value of IME(τ ′ ; d) using the tablesX andỸ obtained from StoreME algorithm. Here, many calls for AppIME algorithm require only one computation of tablesX andỸ , that is, one execution of StoreME algorithm. That is, StoreME is performed only once for various precision parameters α.

B. RUNNING TIME OF THE PROPOSED ALGORITHM
We compare the running time of ComputehG algorithm using the previous algorithm for inverse minimax approximation error with that using the proposed algorithm. The   Table 2 shows the expected running time of ComputehG algorithm for α from 4 to 20 using the previous and proposed algorithms for inverse minimax approximation error. It can be seen from Table 2 that using the proposed AppIME algorithm requires much less running time than using the previous AppIMEbinary algorithm, enabling the execution of ComputehG algorithm for d max = 63. While the running time of 17 hours might still seem to be large, it should be noted that this process only needs to be done once because the goal of this process is just to find the optimal set of degrees.
While ComputehG algorithm in Algorithm 2 could only be performed for d max ≤ 31 in [9], we perform ComputehG algorithm for d max ≤ 63 using the proposed fast AppIME algorithm in Algorithm 7. Table 3 lists the optimal sets of degrees M degs and the corresponding minimum depth consumption D min for d max = 31 and d max = 63. From Table 3, it can be seen that the depth consumption is reduced by one when α is 9 or 14. That is, for α = 9 or α = 14, high d max enables one more non-scalar multiplication per homomorphic comparison operation in the FHE setting, where the available number of operations is very limited per bootstrapping. Furthermore, the proposed AppIME algorithm enables finding a set of degrees optimized for the RNS-CKKS scheme, described in Section IV.

IV. FINDING DEGREES OF COMPONENT POLYNOMIALS OPTIMIZED FOR THE RNS-CKKS SCHEME
Unlike the previous study on homomorphic comparison operation in the CKKS scheme [9], we study homomorphic com-VOLUME , 2021 Algorithm 7: AppIME(τ, d; d max ,n) Input: Target maximum error τ , degree d, the odd maximum degree d max , and precisionn Output: An approximate value of IME(τ ; d) 1X ,Ỹ ← StoreME(d max ,n) 2 for j ← 0 to 2nα max + 1 do  parison operation in the RNS-CKKS scheme, and thus there are additional considerations. Unlike the CKKS scheme, the RNS-CKKS scheme has a somewhat large rescaling error, having a risk of high failure rate in the homomorphic comparison operation using minimax composite polynomial [9]. We replace all additions and multiplications required for polynomial evaluation with additions and multiplications that use scaling factor management technique, respectively. It can be seen in Section VI that low failure rate is achieved using   this technique and appropriate parameter sets. There is another difference between the homomorphic comparison operation in the CKKS scheme and that in the RNS-CKKS scheme. For a given depth consumption D, the set of degrees M degs that minimizes the number of non-scalar multiplications can be obtained using the ComputeMinMultDegs algorithm. Because the computation time of a non-scalar multiplication does not depend much on the current ciphertext modulus in the CKKS scheme, minimizing the number of non-scalar multiplications corresponds to minimizing running time. However, since the computation time of a non-scalar multiplication depends much on the current level in the RNS-CKKS scheme, minimizing the number of non-scalar multiplications does not always correspond to minimizing running time. Fig. 1 shows the computation time of an example polynomial of degree seven according to the current level on the RNS-CKKS scheme library SEAL [12]. From Fig. 1, it can be seen that the computation time of a polynomial tends to increase quadratically according to the maximum level. For example, we consider two ordered sets M degs = {7, 7, 31, 31} and M degs = {31, 31, 7, 7}. In the CKKS scheme, the computation time of the homomorphic comparison operation using two sets of degrees will be almost the same. However, the homomorphic comparison operation using the former is faster in the RNS-CKKS scheme because a high degree polynomial is computed in a lower level in the former case. Our core idea is to determine the set of degrees that minimizes the running time itself rather than the number of non-scalar multiplications. Specifically, we modify the previous ComputehG and ComputeMinMultDegs algorithms so that the modified algorithms can find the set of degrees M degs that minimizes the running time.
First, we set up for the maximum level ℓ. Then, starting from level ℓ ′ (≤ ℓ), any polynomial of degree 2i+3 is evaluated using the optimal level consumption technique [10] and odd baby-step giant-step algorithm [28]. If t is the running time of the polynomial evaluation in milliseconds, then we define C ℓ (i, ℓ ′ ) as C ℓ (i, ℓ ′ ) = ⌊ t 100 ⌉. Here, if ℓ < ℓ ′ or ℓ ′ < ⌈log 2 (2i + 3)⌉, the polynomial evaluation is infeasible, and thus, C ℓ (i, ℓ ′ ) is set to a large enough value 100, 000 in this case. We obtain the values of C ℓ (i, ℓ ′ ) by performing polynomial evaluation on encrypted data, and this computation is done on Intel Core i7-10700 CPU at 2.90GHZ in single thread with an Ubuntu 20.04 LTS distribution. Then, u τ,L (m, n) and V τ,L (m, n) are defined recursively using the values of C ℓ (i, ℓ ′ ) as follows: if m ≤ 1 or n ≤ 1 IME(uτ,L(m − CL(jm,n − 1, n), n− dep(2jm,n + 1)); 2jm,n + 1), otherwise, where jm,n = argmax 1≤i C L (i−1,n)≤m dep(2i+1)≤n IME(u τ,L (m − C L (i − 1, n), n − dep(2i + 1)); 2i + 1). u τ,L (m, n) implies the maximum value of τ ′ ∈ (0, 1) such that there exists a set of degrees {d i } 1≤i≤k that satisfies the followings: ) is the running time of the homomorphic comparison operation using set of degrees ComputeuV algorithm in Algorithm 8 outputs twodimensional tablesũ andṼ that store the values of u τ,L and V τ,L , respectively. This ComputuV algorithm requires many computations of IME(τ ′ ; d). However, these computations can be performed quickly using the proposed AppIME algorithm.
Algorithm 8: ComputeuV(τ ; L) Input: τ , maximum level L Output: 2-dimensional tablesũ andṼ 1 Generate 2-dimensional tablesũ andṼ , both of which have size of (t max + 1) × (n max + 1) n − dep(2j + 1)) 11 end 12 end 13 end Then, the ComputeMinTimeDegs algorithm in Algorithm 9 outputs the minimum running time M time (in 100 ms) and the optimal set of degree M degs using two tablesũ andṼ obtained from the ComputeuV algorithm. Now, we propose the homomorphic comparison algorithm OptMinimaxComp in Algorithm 10 that uses ComputeMinTimeDegs algorithm. This is the modified algorithm of the previous MinimaxComp algorithm in Algorithm 5 [9], minimizing the running time on the RNS-CKKS scheme for a given depth consumption D.

V. APPLICATION TO MIN/MAX AND RELU FUNCTION
In this section, we apply the methods of improving homomorphic comparison operation proposed in Sections III and IV to max and ReLU functions. First, the max function is an important operation that is used in many applications including deep learning. The max function is easily implemented using the sign function, that is, Thus, the approximate polynomial for the max function, p(a, b) can be obtained from the approximate polynomial for the sign function p(x) as: Then,p(a, b) should satisfy the following max function error condition for the precision parameter α: Since we have min(a, b) = a + b − max(a, b), the approximate polynomial for the min function,p(a, b) can be also easily obtained, that is, p(a, b) = a + b −p(a, b).
The previous homomorphic max function MinimaxMax in [9] uses the set of degrees of component polynomials obtained by executing ComputeMinMultDegs algorithm in Algorithm 4 for inputs α, ζ · 2 −α , and D − 1, where ζ is a max function factor that can be determined experimentally. The proposed algorithm in Algorithm 11 improves the previous MinimaxMax algorithm, and we obtain the set of degrees using the ComputeMinTimeDegs algorithm instead of ComputeMinMultDegs algorithm.
In addition, authors in [17] proposed a method to approximate the ReLU function precisely using the approximation Algorithm 11: OptMinimaxMax(a, b; α, ζ, D, η) Input: Inputs a, b ∈ [0, 1], precision parameter α, max function factor ζ, depth consumption D, and margin η Output: Approximate value of max(a, b) of sign function. This precise approximation of the ReLU function is necessary to evaluate pre-trained convolutional neural networks on FHE. The ReLU and sign function have the following relationship: Thus, approximate polynomial r(x) for the ReLU function can be implemented using the approximate polynomial p(x) for the sign function as follows: Then, r(x) should satisfy the following ReLU function error condition for the precision parameter α: The previous ReLU function algorithm that uses the equation in (3) can be described as Algorithm 12, which we call MinimaxReLU. The proposed ReLU function algorithm in Algorithm 13 improves the previous ReLU function algorithm MinimaxReLU, and we obtain the set of degrees using the ComputeMinTimeDegs algorithm instead of ComputeMinMultDegs algorithm. It should be noted that the ReLU function algorithm uses the same value of max function factor ζ as the max function algorithm for a given precision parameter α.
As explained in Section III, the proposed AppIME algorithm makes it possible to perform ComputehG algorithm for d max = 63, which enables obtaining a better set of degrees of component polynomials using ComputeMinMultDegs. Table 4 presents the optimal set of degrees M degs for max/ReLU functions and the corresponding minimum depth consumption D min for d max = 31 and d min = 63. From Table 4, it can be seen that the depth consumption is reduced by one when α is 16, 17, or 18, enabling one more non-scalar multiplication per homomorphic max or ReLU function.

VI. NUMERICAL RESULTS
In this section, numerical results of the proposed OptMinimaxComp, OptMinimaxMax, and Opt MinimaxReLU algorithms in Algorithms 10, 11, and 13, respectively, are presented. The performances of OptMinimaxComp, OptMinimaxMax, and OptMinimaxReLU algorithms using the proposed ComputeMinTimeDegs algorithm are evaluated and compared with those of MinimaxComp, MinimaxMax, and MinimaxReLU algorithms using the previous ComputeMinMultDegs algorithm. The numerical analyses are conducted using the representative RNS-CKKS scheme library SEAL [12] on Intel Core i7-10700 CPU at 2.90GHZ in single thread with an Ubuntu 20.04 LTS distribution.

A. PARAMETER SETTING
The precision parameters ϵ and α are the input and output precisions of the homomorphic comparison algorithm MinimaxComp or OptMinimaxComp in the RNS-CKKS scheme. We set ϵ = 2 −α , which implies that the input and output precisions are the same. On the other hand, the homomorphic max function algorithms MinimaxMax and OptMinimaxMax and the homomorphic ReLU function algorithms MinimaxReLU and OptMinimaxReLU only use input precision parameter α. We set N = 2 16 . MinimaxComp, MinimaxMax, MinimaxReLU, OptMinimaxComp, OptMinimaxMax, or OptMinimaxReLU is performed simultaneously for N/2 tuples of real numbers. Then, the amortized running time is obtained by dividing the running time by N/2.

1) Scaling Values and Margins
We use the scaled Chebyshev polynomialsT i (t) = T i (t/w) for a scaling value w > 1 as basis polynomials. The scaled Chebyshev polynomials can be computed using the following recursion: The scaling values w and margins η are obtained experimentally. The obtained scaling values and margins used in our numerical analyses on homomorphic comparison operation and homomorphic max/ReLU functions are shown in Tables 5 and 6, respectively.

2) Scaling Factor
If the output of the homomorphic comparison operation or homomorphic max function for one input tuple of two real numbers a and b does not satisfy the comparison operation error condition in (1) or the max function error condition VOLUME , 2021 in (2), respectively, it is said to be failed. In addition, if the output of the homomorphic ReLU function for one input real number x does not satisfy the ReLU function error condition in (4), it is said to fail. The homomorphic comparison operation, max function, or ReLU function is performed for 2 15 inputs for each α, and the number of failures is obtained. Then, the failure rate is the number of failures divided by the total number of inputs, 2 15 . We set the scaling factor large enough so that the homomorphic comparison operation, max function, or ReLU function does not fail in any slot, and the failure rate is said to be less than 2 −15 in this case. We set the scaling factor ∆ = 2 50 in all our numerical analyses, and the number of failures is zero in all of the numerical results.

3) Bases with Prime Numbers
Bases with prime numbers B = {p 0 , p 1 , · · · , p k−1 } and C = {q 0 , q 1 , · · · , q L } should be selected. We set k = 1 and p 0 ≈ 2 60 . In the numerical analysis for the homomorphic comparison operation that consumes D depth, we set the maximum level L = D. We set q 0 ≈ 2 60 and q j ≈ ∆ for 1 ≤ j ≤ L.

B. PERFORMANCE OF THE PROPOSED HOMOMORPHIC COMPARISON ALGORITHM
The previous homomorphic comparison operation uses the set of degrees M degs from ComputeMinMultDegs for d max = 31. On the other hand, the proposed homomorphic comparison operation obtains M degs for d max = 63 from ComputeMinTimeDegs. The depth consumption D should satisfy D ≥ M dep , where M dep is the minimum depth consumption obtained from ComputeMinDep algorithm. The used sets of degrees and running times (amortized running times) of the previous homomorphic comparison algorithm MinimaxComp and the proposed algorithm OptMinimaxComp are shown in Table 7. It can be seen that the proposed homomorphic comparison algorithm reduces running time by 6% on average compared with the previous algorithm.
Increasing the depth consumption D sometimes increases the running time. In that case, the larger depth consumption than D does not need to be used, and Table 7 does not include this case. Table 7 also does not include cases when the previous and proposed algorithms use the same set of degrees M degs .

C. PERFORMANCE OF THE PROPOSED HOMOMORPHIC MAX/RELU FUNCTION ALGORITHM
As in the numerical analysis of the homomorphic comparison operation, the proposed homomorphic max and ReLU function algorithms obtain M degs from ComputeMinTimeDegs for d max = 63. The used sets of degrees and running times (amortized running times) of the     Table 8. It can be seen that the proposed homomorphic max and ReLU function algorithms reduce running time by 7% and 6% on average compared with the previous homomorphic max and ReLU function algorithms, respectively. As in the numerical analysis of the homomorphic comparison operation, Table 8 does not include the cases when larger depth increases the running time or when the previous and proposed algorithms use the same set of degrees M degs .

VII. CONCLUSION
We implemented the optimized homomorphic comparison, max function, and ReLU function algorithms on the RNS-CKKS scheme using a composition of minimax approximate polynomials for the first time. We successfully implemented the algorithms on the RNS-CKKS scheme with low failure rate (< 2 −15 ) and provided the parameter sets according to the precision parameter α. In addition, we proposed a fast algorithm for inverse minimax approximation error, which is a subroutine required to find the optimal set of degrees. This algorithm allowed us to find the optimal set of degrees for a higher maximum degree than the previous study. Finally, we proposed a method to find the set of degrees that is optimized for the RNS-CKKS scheme using the proposed fast algorithm for inverse minimax approximation error. We reduced the depth consumption of homomorphic comparison operation (resp. max/ReLU functions) by one depth when α is 9 or 14 (resp. when α is 16, 17, or 18). In addition, the numerical analysis demonstrated that the proposed homomorphic comparison, max function, and ReLU function algorithms reduced the running time by 6%, 7%, and 6% on average compared with the previous algorithms respectively.