Analysis of Modified Shell Sort for Fully Homomorphic Encryption

The Shell sort algorithm is one of the most practically effective in-place sorting algorithms. However, it is difficult to execute this algorithm with its intended running time complexity on data encrypted using fully homomorphic encryption (FHE), because the insertion sort in Shell sort has to be performed by considering the worst-case input data. In this paper, in order for the sorting algorithm to be used on the FHE data, we modify the Shell sort with an additional parameter <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>, allowing exponentially small sorting failure probability. For a gap sequence of powers of two, the modified Shell sort with input array length <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> is found to have the trade-off between the running time complexity of <inline-formula> <tex-math notation="LaTeX">$O(n^{3/2}\sqrt {\alpha +\log \log n})$ </tex-math></inline-formula> and the sorting failure probability of <inline-formula> <tex-math notation="LaTeX">$2^{-\alpha }$ </tex-math></inline-formula>. Its running time complexity is close to the intended running time complexity of <inline-formula> <tex-math notation="LaTeX">$O(n^{3/2})$ </tex-math></inline-formula> and the sorting failure probability can be made very low with slightly increased running time. Further, the near-optimal window length of the modified Shell sort is also derived via convex optimization. The proposed analysis of the modified Shell sort is numerically confirmed by using randomly generated arrays. For the practical aspect, our modification can be applied to any gap sequence, and we show that Ciura’s gap sequence, which is known to have good practical performance, is also practically effective when our modified Shell sort is applied. We compare our modified Shell sort with other sorting algorithms with the FHE over the torus (TFHE) library, and it is shown that this modified Shell sort has the best performance in running time among in-place sorting algorithms on homomorphic encryption scheme.


I. INTRODUCTION
Fully homomorphic encryption (FHE) is an encryption scheme that provides encrypted data with an evaluation algorithm, which enables addition or multiplication of plaintext without decryption [1], [2]. The FHE enables specific operations to be performed on encrypted information without leaking any clue to the plaintext. The notion of the FHE was suggested by Rivest et al. [1]. Although several cryptography researchers had attempted to construct the FHE scheme because of its effectiveness with respect to operations in cloud systems, no one had been able to successfully construct it until 2009, when Gentry succeeded in developing an FHE scheme using an ideal lattice [2]. Several researchers suggested different types of the FHE algorithms in series The associate editor coordinating the review of this manuscript and approving it for publication was Cong Pu .
using the bootstrapping technique in Gentry's scheme and optimized the FHE schemes [3]- [7]. Recently, FHE schemes have been significantly improved in regard to various performance criteria [8]- [12], which makes this scheme practically applicable. Further, the efficient implementations of the FHE schemes have been proposed actively now [13]- [17].
Further, the algorithms used on the FHE data are expected to demonstrate the oblivious property, i.e., providing the most appropriate outputs without knowing any information about the input. In other words, the behavior of an oblivious algorithm does not depend on the input data. If it depends on the input, it implies the leakage of input information. The oblivious property of an algorithm is essential for the FHE schemes to ensure privacy.
When processing large amounts of ciphertexts in cloud systems, it is frequently required to process the sorted data rather than unaligned data. Thus, one of the most essential operations on the FHE data is the sorting algorithm, which is generally used as a subroutine algorithm of many algorithms. However, most sorting algorithms are not suitable for the FHE data. For example, because the quick sort algorithm, one of the most popularly used sorting algorithms, is not oblivious, it cannot be used on the FHE data. Although numerous studies have been conducted to render the quick sort algorithm oblivious, its running time complexity becomes O(n 2 ), where n is the input array length. Its actual running time is even longer than that of the bubble sort, which is considered to have the longest running time among all the known sorting algorithms. Therefore, modifying conventional sorting algorithms to make them suitable for the FHE data is necessary. Several studies have been conducted for this purpose [18]- [20].

A. MOTIVATION FOR SHELL SORT
Since the oblivious sorting algorithm can be applied for encrypted data with the FHE, Emmadi et al. [20] compared several oblivious algorithms for sorting the FHE data. We can divide sorting algorithms for the FHE data into two classes of oblivious sorting algorithms: in-place algorithm and recursive algorithm. The bubble sort and insertion sort are basic in-place oblivious algorithms, and the bitonic sort and odd-even merge sort are recursive oblivious algorithms. The recursive oblivious algorithms are much better than the in-place oblivious algorithms in the aspect of both the asymptotic performance and practical performance.
However, the recursive algorithms may have inefficiency in some cases. Since many function calls are caused in the recursive algorithms and the amount of memory for the ciphertext array is quite big, the total transmission of data in the memory bus must be somewhat large. When the bandwidth of the memory bus is restricted, this transmission time can be a bottleneck for sorting with encrypted data. This situation can occur in lightweight IoT devices, whose memory or bandwidth cannot be large enough. For this reason, it is desirable to devise an efficient in-place sorting algorithm for the FHE data. The Shell sort [21], [22], which is one of the oldest sorting algorithms, is the generalized version of the insertion sort. The Shell sort algorithm is an in-place algorithm, which is fast and easy to implement, and thus, many systems use it as a sorting algorithm.

B. MAIN RESEARCH PROBLEM AND PREVIOUS WORKS
It is known that Shell sort uses insertion sort as a subroutine algorithm, and insertion sort can be performed on the FHE data [18], [23]. However, the Shell sort should be modified to be used in the FHE setting. If we do not allow any error in sorting, then insertion sort is expected to be quite conservative, i.e., the number of operations for sorting must be set for the worst case, because the insertion sort algorithm in the FHE setting is an oblivious algorithm. Thus, if we use insertion sort in the Shell sort, the running time complexity of Shell sort in the FHE setting must be O(n 2 ), which makes the use of Shell sort ineffective. Therefore, it is important to devise a sorting algorithm that is better than the Shell sort on the FHE data in terms of running time complexity.
Goodrich [24] suggested an asymptotically optimal randomized oblivious Shell sort. He proved that its running time complexity is O(n log n) and sorting failure probability (SFP) is O(1/n b ) for some constant b ≥ 1, where n is the length of an array. While it is pretty efficient in the asymptotic sense, there are two points to be considered. First, the analytically induced SFP is an inverse polynomial of the length of an array. When we sort the array having a small length with this algorithm, the induced SFP may not be the practically allowable value. Further, the inverse polynomial SFP is not considered a small probability in the asymptotic sense. Since users are often conservative with the SFP in the sorting, the exponentially decaying SFP is more desirable. Second, it can be inefficient for the array of the small length. For lowering the SFP, many processes are required in the randomized Shell sort. This causes rather large additional operations for an array with a small length. Considering that the running time of the homomorphic operations in the FHE is quite large, sorting a large number of encrypted data is not a practical situation yet. Thus, it can be desirable to devise an oblivious variant of Shell sort which is practically efficient and has truly negligible SFP, which is independent of the array length.
On the other hand, it is known [23] that we can reduce the running time of insertion sort on the FHE data by allowing very small sorting failure probability using what is known as the window technique. According to this technique, in each insertion sort, instead of inserting the ith element into the subarray of (a 1 , a 2 , · · · , a i−1 ), we insert the ith element into the subarray (a i−k , a i−k+1 , · · · , a i−1 ) of length k, called the window length, immediately to the left of the ith element. We call this subarray a ''window'' with window length k. This technique is used to reduce the number of bootstrapping operations in [23]. Since the insertion sort is the subroutine algorithm for the Shell sort, the window technique is adequate to be applied to the Shell sort. However, the effective application of the window technique in the Shell sort for homomorphic encryption has not been proposed.

C. MAIN CONTRIBUTION
In this paper, we devise a method to modify the Shell sort in the FHE setting using the window technique, which is proved to be effective in the theoretical aspect and the practical aspect. It is referred to as a ''modified Shell sort''. The window technique in [18] is applied to each subroutine insertion sort in our modified Shell sort for FHE setting. We note that the role of the window technique in our algorithm is different from the original use of the window technique. Our algorithm does not reduce the bootstrapping itself compared to the number of the homomorphic gates, but we reduce the number of the comparison operations with the window technique. For this reason, the homomorphic comparison operation in our sorting algorithm does not generate a comparison error.
For theoretical view, the running time complexity of the modified Shell sort is O(n 3/2 √ α + log log n) with SFP 2 −α VOLUME 9, 2021 when the gap sequence is powers-of-two, which is close to the average-case time complexity O(n 3/2 ) of the original Shell sort. The value of α is our additional parameter that controls the trade-off between the running time and the SFP. This trade-off is quite effective because the SFP is decreased exponentially with α but the running time is proportional only to √ α. To this end, we use the exact distribution of window lengths of subarrays in each gap for successful sorting in the Shell sort. If the length of the subarray for the insertion sort in some gap is s, it is discovered that the average of the required window length for successful sorting is proportional to √ s, and the right tail of its probability distribution is very thin. In the sorting process, the window length is provided as a constant multiple of √ s, which ensures a negligible SFP. If the window length is close to β √ s, the SFP decays as e −β 2 , which signifies a very fast-decaying function. Therefore, with a fixed negligible SFP, we can set a small window length so that the running time is asymptotically faster than that of the naive version of the Shell sort on the FHE data.
For the practical view, the running time of the modified Shell sort is effectively reduced even in the small arrays, compared to the basic in-place sorting algorithms on the FHE data, bubble sort, and insertion sort.
In this paper, we address only the gap sequence of powers of two in the analysis of the modified Shell sort, i.e., 2 h , h = 1, 2, 3, · · · . Although this gap sequence is not optimal in terms of running time complexity, we first analyze the running time complexity of the modified Shell sort on the FHE data, which is important for the FHE in cloud systems. The performance of the modified Shell sort is numerically compared with the cases of near-optimal window lengths obtained through convex optimization and Ciura's optimal gap sequence [25], which was evaluated numerically as an optimal gap sequence in the non-FHE settings. Although we do not analyze this case, the method of deriving the near-optimal window length in the modified Shell sort functions well for Ciura's optimal gap sequence.
We also suggest the convex optimization method to derive a tighter window length. In other words, the window length obtained by the convex optimization method makes the running time of the modified Shell sort to be less than that of the case employing the analytical method in the modified Shell sort. The running time of the proposed modified Shell sort is compared with that of the conventional algorithms with the TFHE scheme, and it is shown that this modified Shell sort has the best performance in running time among in-place sorting algorithms on the TFHE scheme.
Thus, our contributions are summarized as follows: • We propose a modified Shell sort with an additional parameter α on the FHE data, and derive its theoretical trade-off between the running time complexity O(n 3/2 √ α + log log n) and the SFP 2 −α when the gap sequence is power-of-two sequence.
• The near-optimal window length of each gap in the modified Shell sort is derived via the convex optimization technique.
• The numerical simulation with TFHE homomorphic encryption scheme is performed, and the modified Shell sort with Ciura's gap sequence is proven to have the best running time performance in the practical situation among the in-place sorting algorithms for the FHE setting.

D. OUTLINE
The remainder of this paper is organized as follows. Section II presents the preliminary of the paper, which includes the related sorting algorithms and the notion of the FHE.
In Section III, we present the distribution of the required window length for each gap in the Shell sort on the FHE data with the gap sequence of powers of two. Then, we propose a modified Shell sort for the FHE and derive the trade-off between the running time complexity and the SFP. Section IV discusses a method to deduce the near-optimal window length of each gap of the modified Shell sort using the convex optimization technique. Section V shows numerical results that support the proposed analysis in the TFHE setting. From these results, the performance in the case of the optimal gap sequence or the near-optimal window lengths is compared with that of conventional algorithms using the TFHE library. Section VI concludes the study and discusses the scope for future research. x means the least integer which is larger than or equal to x. m n means the binomial coefficient which is given by m! n!(m−n)! . Pr[A] means the probability of an event A, and Pr[A|B] means the conditional probability of an event A given B. g(n) = O(f (n)) means that there is a positive real number C and a real number n 0 such that |g(n)| ≤ Cf (n) for all n ≥ n 0 . Plaintext data are denoted by the normal math italic letters like A, and ciphertext data are denoted by the letters of the typewriter type letters like A.

B. FULLY HOMOMORPHIC ENCRYPTION
the FHE is a public-key encryption scheme, which supports an arbitrary number of additions and multiplications of plaintext without decryption so that anyone without the decryption key can operate the circuit with any ciphertext without leaking the information of its plaintext.
Gentry suggested the bootstrapping technique to transform a somewhat homomorphic encryption scheme, which allows only a finite number of operations on the encrypted data, to a fully homomorphic encryption scheme [2]. The bootstrapping operation has enabled several researchers to construct the FHE schemes [2], [26], which involves implementing the decryption circuit on encrypted data using the evaluation algorithm, that is, the addition and multiplication algorithms in the FHE setting. All of the FHE schemes suggested thus far ensure security by adding the plaintext to an LWE sample or a ring-LWE sample, which is known as pseudorandom samples. For security reasons, the LWE sample or the ring-LWE sample includes some errors. As the addition and multiplication operations are repeated, the total number of errors increases, and if the total number of errors exceeds a certain limit, a decryption failure occurs. Thus, the errors need to be removed after a certain number of operations on the encrypted data, so that the ciphertexts can be further evaluated. The purpose of the bootstrapping operation is to reset the errors in the ciphertext when the errors are too large to be decrypted.
As bootstrapping utilizes a considerable amount of computation during the processing of the FHE, the number of bootstrapping operations significantly affects the total operation time of the FHE. In fact, the number of bootstrapping operations depends on the multiplicative depth of the circuit. The lower the depth of a circuit, the fewer the number of bootstrapping operations. Thus, it is crucial to consider the number of the bootstrapping operations for each element, when bootstrapping is implemented in the FHE schemes. If the total number of operations in an algorithm is fixed, it is better to evenly distribute the operations on the inputs. Furthermore, to stably address errors, deterministic algorithms are better than randomized algorithms. This is because we can predict the error size of each element in deterministic algorithms ensuring that these errors are handled easily and error control is optimized adequately.

C. TFHE HOMOMORPHIC ENCRYPTION
The TFHE homomorphic encryption scheme [27] is the most practical bit-wise homomorphic encryption scheme now. There are two types of the TFHE scheme: the leveled homomorphic encryption and the fully homomorphic encryption. Since we use the fully homomorphic encryption version of the TFHE scheme, we deal with only it in this subsection. Its basic elements are the bootstrapped homomorphic gates, which performs each gate followed by the bootstrapping. Although the noise in the ciphertext grows when we perform the homomorphic gate without the bootstrapping, the bootstrapping refreshes the noise independent of the input noise. Hence, any large-depth Boolean circuits can be performed without noise growth of the ciphertext using the TFHE scheme.
The secret key s is a vector of length n in {0, 1} n uniformly sampled, and the ciphertext is formed by (a, b) ∈ T n × T, where b = a · s + e + µ and µ ∈ {− 1 8 , 1 8 } is encoded by µ = 1 4 (b − 1 2 ) with the message bit b ∈ {0, 1}. The bootstrapping procedure makes the encoded message 1 8 when the input encoded message is in (0, 1 2 ) and makes the encoded message − 1 8 when the input encoded message is in ( 1 2 , 1). Before the bootstrapping for each bootstrapped homomorphic gate, each matched linear operation is processed so that the encoded message is in (0, 1 2 ) when the output bit is 1 and is in ( 1 2 , 1) when the output bit is 0. The linear operations can easily be performed homomorphically since the LWE ciphertext has the linear property. For example, the homomorphic NAND gate performs 1 8 −a − b homomorphically where a, b are the encoded message of the two input ciphertexts before the bootstrapping. All Boolean gates can be designed by this method, and thus we can compose any Boolean circuits with these bootstrapped homomorphic gates. Each linear operation for each homomorphic gate and the detailed bootstrapping procedure can be referred to in [27].

D. SORTING ALGORITHMS
Although there exist several sorting algorithms [28], we consider only the insertion sort and Shell sort in this paper. These are comparison-based sorting algorithms, which do not rely on the divide-and-conquer method.
The insertion sort is an iterative sorting algorithm that sorts from the leftmost element. In each iteration, we define an element to be sorted into its left-side subarray as the pivot element. It is assumed that the elements to the left of the pivot element are already sorted. We then compare the already sorted elements with the pivot element, deduce its proper position, and insert it into this position. Its worst-case and average-case running time complexities are both O(n 2 ). It is known that the insertion sort is slightly faster than the bubble sort in practical cases.
The operations in the conventional insertion sort require the knowledge of its input, and this is not allowed in the case of the FHE data. Therefore, we cannot determine the correct position of a pivot element in the already sorted subarray in the FHE setting, and thus, the operation and behavior of the insertion sort need to be modified. It is known [18] that we can perform an insertion sort on the FHE data by sequentially swapping the pivot element with the elements in the already sorted subarray to its left, from left to right. In fact, the FHE version of the insertion sort has already been proposed, and its performance has been assessed numerically in the previous works [18], [20]. This operation, however, is inefficient, as the number of operations is always the same as that in the worst case, that is, its average-case running time complexity is estimated to be O(n 2 ).
The Shell sort is a generalized version of the insertion sort [21]. It requires a gap sequence, which is a decreasing sequence of positive integers ending with 1. For each gap h and each integer j, 0 ≤ j ≤ h − 1, the (hi + j)-th elements i = 0, 1, 2, · · · are sorted using insertion sort. As the gap sequence ends with 1, we can finally obtain the correctly sorted array. Algorithm 1 shows the specific algorithm for classical Shell sort.
Even though the running time complexity of the Shell sort varies depending on the gap sequences [22], it is asymptotically better than that of insertion sort. To the best of our knowledge, a trial of the Shell sort on the FHE data has not been performed thus far.

E. COMPARISON OPERATION IN FHE
In the sorting algorithms in the FHE setting, the swap operation is performed by comparing two encrypted elements. Although it is not possible to determine the larger element VOLUME 9, 2021 in the FHE setting, it has been established that computing the maximum and minimum elements out of the two elements is possible in the FHE setting, even though these elements are encrypted.
Bit-wise encrypted numbers can be compared by homomorphically computing the maximum or the minimum using Boolean circuits. Algorithm 2 shows the algorithm sorting the two encrypted numbers with HomXNOR gate and HomMUX gate, which can be supported with any FHE scheme. HomXNOR and HomMUX are the bootstrapped homomorphic gates for XNOR gate and MUX gate, respectively, where Dec(HomXNOR(a, b)) = Dec(a) ⊕ Dec(b) and Dec(MUX(a, b, c)) = Dec(a) ? Dec(b) : Dec(c).
with m bits Output: X and Y are sorted with increasing order.

III. ANALYSIS OF MODIFIED SHELL SORT OVER FHE
In this section, we propose a modified Shell sort using the window technique suggested in [23], and the probability distribution of the required window length for the successful sorting is also obtained when the powers-of-two gap sequence is used. Finally, the running time complexity of the modified Shell sort in each gap for the successful sorting of each subarray is determined for the FHE, considering the trade-off with the SFP when the powers-of-two gap sequence is used.

A. MODIFIED SHELL SORT OVER FHE
As insertion sort can be performed on the FHE data, the Shell sort, which uses the insertion sort as a subroutine algorithm, can also be performed on the FHE data. The Shell sort using the FHE variant insertion sort as subroutine algorithms is shown in Algorithm 3. However, if the Shell sort is to be employed without any sorting failure in the FHE setting, it is expected to be pretty conservative. In other words, as we need to consider the worst case for each gap, its running time complexity becomes O(n 2 ), which does not provide any advantage in comparison with a simple insertion sort. This is because the FHE variant insertion sort cannot be performed adaptively with the intermediate situations. Thus, designing the Shell sort with a negligible SFP and a running time complexity close to the original average-case time complexity is necessary.
To this end, we employ the window technique [18], [23] in the Shell sort. During the insertion sort in each gap, instead of searching the position of each element in the whole partially sorted array, we search for its position in the partially sorted subarray of a certain window length, located to the left of the pivot element, as shown in Fig. 1. Fig. 1 shows an example of the modified Shell sort using the window technique, where the gap is 4 and the window length is 2. Subarrays consisting of elements that are separated by the gap are sorted using the insertion sort. To sort each subarray, it is compared only with the elements that are located to its left, within a distance equal to the window length from the pivot element to be inserted, which is called the modified Shell sort.
The proposed modified Shell sort is described in Algorithm 4. As the minimum and maximum functions can be computed without knowing their plaintext in the FHE  setting as we deal with in Section II, neither of the operations in Algorithm 4 require any knowledge of the contents of elements in the array A[i]. Thus, Algorithm 4 can be executed in the FHE setting. In designing this algorithm, deciding the window length in each gap for successfully sorting each subarray in the Shell sort for the given SFP 2 −α is not a trivial problem. Along with the design of the window length for each gap, we propose a modified Shell sort with an additional parameter α.
We prove that the running time complexity of the modified Shell sort is determined to be O(n 3/2 √ α + log log n) with an SFP of 2 −α for powers-of-two gap sequence, which consists of all powers of two less than the length of the array. Note that the average-case time complexity of the classical Shell sort with powers-of-two gap sequence is O(n 3/2 ) [29]. The parameter α is determined only from the SFP, regardless of the input length n. In fact, α is considerably smaller than n and should be larger than or equal to √ 6 log e − 1 2.534, the derivation of which is provided in a subsequent section of this paper. It is noted that the proposed modified Shell sort considers the trade-off between the running time complexity and the SFP.
Before analyzing the running time complexity and the sorting failure probability of the modified Shell sort with power-of-two gap sequence, we introduce the main idea of the analysis to help readers to understand the following theorems and lemmas.
Lemma 1 and 2 deal with the useful properties of the intermediate arrays to make it easy to induce the exact number of acceptable arrays for a certain window length, which is dealt with in Lemma 3 and Theorem 4. Theorem 4 is the special case of Lemma 3 that matches our aim.
While the number of acceptable arrays is represented with some binomial coefficients, the resultant running time complexity has to be represented with some analytic function. To this end, Lemma 5 and 6 relate some formulas with binomial coefficients with some exponential function. With the help of these lemmas, Theorem 7 suggests the running time complexity and the sorting failure probability of the modified Shell sort.
Remark: We assume that the modified Shell sort is performed with the bootstrapped homomorphic gate, a homomorphic Boolean gate followed by the bootstrapping. The bootstrapping always removes the noise in the ciphertext, and thus we do not have to consider the noise amplification in the ciphertext when processing any homomorphic evaluations. In addition, the ciphertext size does not grow in the fully homomorphic encryption scheme, and we also do not have to consider the amplification of the ciphertext size. Thus, we focus on the validity of the modified Shell sort itself in the following analysis. If we use the leveled homomorphic encryption instead of the FHE, the analysis for the noise growth or the ciphertext size growth will be needed, and this analysis becomes an interesting future work.

B. PROBABILITY DISTRIBUTION OF REQUIRED WINDOW LENGTH
In this subsection, we derive the probability distribution of required window length in each gap required for successfully sorting each subarray in the Shell sort. This probability distribution is essential in determining the window length of each gap in the modified Shell sort, because the properties of the tail of the probability distribution must be used to obtain the required window length.
The array of n elements is denoted by its index vector (a 1 , a 2 , · · · , a n ), which is a permuted vector of (1, 2, · · · , n). If we handle the real data, we map each datum to its respective index in {1, 2, · · · , n}. Moreover, we assume that n is an even integer. If n is odd, the same analysis can be applied, with an additional dummy element inserted in the rightmost position with the largest element. Several lemmas are needed for devising the main theorem of the probability distribution for the required window length.
Before obtaining the probability distribution, the meaning of the probability distribution in this subsection has to be clarified. For each permutation of (1, 2, · · · , n), the required window length is defined as the minimum window length such that the insertion sort with the window length returns a perfectly sorted array. The required window length is a random variable when the sample space is the set of all permutations of (1, 2, · · · , n) or its subset. For analyzing the modified Shell sort, we are interested in the case when the sample space is the set of all permutations (a 1 , a 2 , · · · , a n ) of (1, 2, · · · , n) which satisfy a i < a i+2 .
While the analysis of the conventional Shell sort is performed for an average number of operations, the analysis of the window length in the modified Shell sort involves the maximum number of insertion operations for each subarray.
We assume that the gap sequence is powers of two, i.e., 2 log n , 2 log n −1 , · · · , 2 2 , 2, 1. With this gap sequence, each subarray that is sorted using insertion sort has the following structure. The elements in odd positions of the subarray for a gap 2 h are already sorted and the elements in even positions are also sorted for a gap 2 h using the previous insertion sort for a gap 2 h+1 . We analyze the insertion sort under this special situation.
The following Lemma 1 and Lemma 2 suggest that the required window length of a permuted vector (a 1 , a 2 , · · · , a 2m ) of (1, 2, · · · , 2m) is equal to the maximum distance of the current position and the right position in our situation. After Lemma 1 and Lemma 2, we identify these two notions as equivalent notions.
Lemma 1: Let a = (a 1 , a 2 , · · · , a 2m ) be a subarray in each gap in the Shell sort with a gap sequence 2 h , which is permuted from ( Then there exists an even integer j and an odd integer k such that Proof: Let M 1 , M 2 , M 3 , and M 4 be defined as It is clear that at least one of M 1 and M 2 as well M 3 and M 4 is a non-negative integer. If we establish that M 1 = M 4 and M 2 = M 3 , the lemma can be proved by the following 3 , and thus, there exist an odd index j and an even index k, i) Firstly, we show that M 1 ≥ M 4 . Consider an index l, such that a 2l −2l = min 1≤i≤m (a 2i −2i), which is −M 4 . We establish this case for a 2 = 2m or a 2 < 2m. i)-1 If a 2l = 2m, l must be m, as 2m is the largest element. Thus, we obtain min 1≤i≤m (a 2i − 2i) = 0 and a 2i ≥ 2i for all i, 1 ≤ i ≤ m, which implies that 1 cannot be in the even index and must be in the first index, and a 1 − 1 = 0. Therefore, If a 2l < 2m, we show that a 2l + 1 must be in the odd index. Let a 2l + 1 be in the even index; this implies that a 2l + 1 = a 2l+2 , because all the elements in the even indices are already sorted. Then, we obtain a 2l+2 − (2l + 2) = (a 2l + 1) − (2l + 2) = a 2l − 2l − 1 < a 2l − 2l, which is a contradiction to the assumption that a 2l − 2l is the minimum value, and thus, a 2l + 1 must be in the odd index. Among {1, 2, · · · , a 2l − 1}, l − 1 elements have to be placed in the even indices in the left-side of a 2l . The remaining a 2l − l elements must be placed in the odd indices in the increasing order from the first index 1. Thus, the index of a 2l + 1 must be 2(a 2l − l) + 1. As a 2(a 2l −l)+1 − (2(a 2l − l) We establish this case for a 2 = 1 or a 2 > 1. ii)-1 If a 2l = 1, l must be 1, as 1 is the smallest element, and therefore, max 1≤i≤m (a 2i − 2i) = −1.
Let W (a) be the required minimum window length to sort the subarray successfully. Then, we have When we insert a i into the partially sorted subarray, the following scenarios can be given; if a i < i, we require a window length of i − a i , and if a i ≥ i, a i stays in place regardless of the window length.
Consider the first case, where a i < i. First, we assume that i is even. Consider the elements to the left of a i . From the condition a i < a i+2 , it is clear that all the elements in even indices to the left of a i are less than a i . As there are i/2 − 1 even indices to the left of a i , the remaining a i − i/2 elements in {1, 2, · · · , a i − 1} have to be placed in odd indices in increasing order from the leftmost odd index. As the number of odd indices to the left of a i is i/2 and i/2 > a i − i/2, all the elements less than a i are located to the left of a i .
We then assume that i is odd. The proof is almost the same as that for the scenario in which i is even. As there are (i−1)/2 odd indices to the left of a i , the remaining a i − (i + 1)/2 elements in {1, 2, · · · , a i − 1} must be placed in even indices from the first even index, in increasing order. As the number of even indices to the left of a i is (i − 1)/2 and (i − 1)/2 > a i − (i + 1)/2, all the elements less than a i are located to the left of a i .
Thus, we prove that all the elements less than a i are located to the left of a i . The partially sorted subarray, therefore, must include the elements {1, 2, · · · , a i − 1} in the indices {1, 2, · · · , a i − 1} in the appropriate order. This implies that a i moves to the index a i , and thus, we require a minimum window length of i − a i .
Consider the second case, in which a i ≥ i. It is evident that i/2 ≤ a i − i/2, when i is even, and (i − 1)/2 ≤ a i − (i + 1)/2, when i is odd. This implies that all the elements to the left of a i are less than a i . Thus, the partially sorted subarray in the indices {1, 2, · · · , i − 1} comprises elements smaller than a i . Therefore, a i does not move to the left but stays in its position, regardless of the window length.
From Lemma 1, it is noted that M (a) is equal to W (a). Lemma 3 is needed to obtain the exact number of the arrays whose required window length is some non-negative number k. Theorem 4 corresponds to the conclusion of this subsection, and this can be obtained by only considering the special case of Lemma 3.
Lemma 3: Let p k (n, m) be the number of distinct arrays (a 1 , a 2 , · · · , a m ) of length m, whose elements from {1, 2, · · · , n} are sorted in increasing order, a i < a i+1 , and max 1≤i≤m |a i −2i| ≤ k is satisfied for a positive integer k and n ≥ m. Let (b 0 , b 1 , · · · ) and (c 0 , c 1 , · · · ) be the two arrays defined as Proof: It is clear that p k (1, 1) = 1 for all k ≥ 1, and p k (n, 1) = n for n ≤ k +2. As the element a m in the last index must be 2m−k ≤ a m ≤ 2m+k from max 1≤i≤m |a i −2i| ≤ k, the following can be determined from the condition 2m−k ≤ a m ≤ 2m + k: i) For n < 2m − k, because the minimum possible value of a m must be 2m − k. ii) For n > 2m + k + 1, because the maximum possible value of a m must be 2m We derive the recurrence relation of p k (n, m) using the following three cases: iii)-1 For n = 2m + k + 1, It is easy to derive that Note that this case can be included in ii). Although this separation of the case appears unnatural, it enables us to analyze p k (n, m) well. iii)-2 For 2m − k + 1 ≤ n ≤ 2m + k, If the element in the last index m is n, the elements in the remaining indices should be selected from {1, 2, · · · , n − 1}, and thus, there are p k (n − 1, m − 1) possible arrays. If the element in the last index m is not n, the element n cannot be located in one of the indices {1, 2, · · · , m − 1}, because the elements are sorted in increasing order. Thus, {1, 2, · · · , n − 1} should be located in the indices {1, 2, · · · , m}, and there are p k (n − 1, m) possible arrays. Therefore, we obtain iii)-3 For n = 2m − k, We obtain because the element 2m − k must be located in the index m. VOLUME 9, 2021 Let q k (n, m) be the right-hand side in (1). Then, we prove that q k (2m + k + 2, m) = 0 and q k (2m − k − 1, m) = 0. First, q k (2m + k + 2, m) can be written as holds, and q k (2m − k − 1, m) is equal to 0. We now prove that p k (n, m) = q k (n, m) when 2m − k ≤ n ≤ 2m + k + 1. Using the fact that q k (n, m) is simply a linear combination of binomial coefficients and the property of the binomial coefficients, we easily know that q k (n, m) = q k (n − 1, m) + q k (n − 1, m − 1).
Remark: In order to understand intuitionally the proof of Lemma 3, we add the additional explanation of proof of Lemma 3 in the Appendix B.
From the previous lemmas, we have the following theorem. Theorem 4: Let C(2m, k) be the number of the permutations a of {1, 2, · · · , 2m}, such that a i < a i+2 for all possible i, and W (a) ≤ k. Then, we have where b i and c i are defined in Lemma 3. Proof: As M (a) of the odd indices is equal to that of the even indices from Lemma 1, we consider only the even indices. Thus, we can consider this situation to be equivalent to the following simple situation; we consider distinct m elements from {1, 2, · · · , 2m} randomly, sort them in increasing order, and consider a i − 2i rather than a i − i. Then, C(2m, k) is identical to p k (2m, m) in Lemma 3. This is established as 2m m+b i = 2m m−b i . In fact, C(2m, k) denotes the number of arrays for gap 2 h , which can be successfully sorted using the proposed modified Shell sort with a window length of k. Clearly, the exact number of arrays with W (a) = k, such that a i < a i+2 for all i can be obtained by computing C(2m, k) − C(2m, k − 1). With this result, we derive the running time complexity of the modified Shell sort in the next subsection.

C. DERIVATION OF RUNNING TIME COMPLEXITY FOR A SPECIFIC SFP
In this subsection, we derive the running time complexity O(n 3/2 √ α + log log n) of the proposed modified Shell sort with powers-of-two gap sequence, considering the optimal trade-off with the SFP 2 −α , in which α is the parameter that controls the window length of each gap. In the running time complexity, log log n increases gradually as n increases. Therefore, the running time complexity is approximately proportional to n 3/2 √ α. However, the probability that the output is not successfully sorted decreases exponentially as α increases. It is noted that the SFP 2 −α is not related to the number of the input data. One of the advantages of the modified Shell sort algorithm is irrespective of the number of the input data, and thus we can obtain a trade-off between the SFP and running time complexity by considering an appropriate α.
It is important to prove the following lemmas to determine the relation between the binomial coefficients and exponential function. It is a well-known fact from the central limit theorem in statistics that the closer n is to infinity, the closer a binomial distribution is to a normal distribution. Even though the binomial and normal distributions are similar, we should establish that some binomial coefficients are upper-bounded by the probability distribution function of the normal distribution. The following Lemma 5 is used in the proof of Lemma 6, and Lemma 6 is used to prove Theorem 7.

Then, f (x) > M for all x ∈ [a, ∞).
Proof: It is sufficient to show that f (m) (x) → 0 as x → ∞ and (−1) m f (m) (x) is a monotonically decreasing function for m, 1 ≤ m ≤ n − 1. If this is proved, then f (x) is a monotonically decreasing function and is larger than the limit value M from the first condition in Lemma 5, as f (x) is negative for (a, ∞). Since it is true for m = n that (−1) m f (m) (x) > 0, we will prove the following: if it is true for 2 ≤ k ≤ n that (−1) k f (k) (x) > 0, then we have lim x→∞ f (k−1) (x) = 0, and it is true that (−1) k−1 f (k−1) (x) is a monotonically decreasing function.
Let g k (x) = (−1) k f (k) (x). As (−1) k−1 f (k) (x) = g k−1 (x) < 0 on (a, ∞), g k−1 (x) is a monotonically decreasing function. As a monotonically decreasing function always converges to a certain value, if it possesses some lower bound, we obtain lim x→∞ g k−1 (x) = T for some T , or lim x→∞ g k−1 (x) = −∞. We assume that lim x→∞ g k−1 (x) = T for some T = 0, or lim x→∞ g k−1 (x) = −∞. Then, we can deduce some N ∈ Consider the case of f (k−1) (x) > R. If we integrate both terms from N to x ∈ (N , ∞) iteratively as whose right-hand side tends to infinity, as x → ∞. In this case, f (x) tends to infinity as well, which contradicts the first condition. If we consider the case of f (m) (x) < −R, the inequality is changed to whose right-hand side tends to negative infinity, as x → ∞. Then f (x) tends to negative infinity as well, which also contradicts the first condition. Thus, we obtain lim x→∞ g k−1 (x) = 0. As g k−1 (x) is a monotonically decreasing function, g k−1 (x) > 0 on (a, ∞), which completes the proof.
Lemma 6 directly uses Lemma 5. To prove the inequality in Lemma 6, we only prove that the condition of Lemma 5 holds for some function.
Lemma 6: For any real number α ≥ √ 6 and any positive integer n ≥ α 2 , the following inequality holds Proof: It can be derived that We must prove that If we consider the logarithm on the left-hand side and change the form, we obtain Then, the right-hand side of (8) can be defined as which is a type of Riemann sum of f (x). As f (x) is a monotonically decreasing function, the Riemann sum demonstrates its lower bound as the integration of f (x) from To integrate right-hand side of (9), let g(x) = x ln x. We then obtain (7). We must establish lim .
We present the following theorem, which is the main theorem of this subsection. The situation in Theorem 4 occurs in Theorem 7, so that we can directly use Theorem 4. Then, we use Lemma 6 to obtain a simple upper bound of the complicated formula.
Thus, S(n) can be expressed as Thus, we obtain S(n) = O(n 3/2 √ α + log log n), because ∞ =1 At this point, we consider the SFP. Let B denote the event that the output of the sorting algorithm is not successfully sorted and let B denote the event that at least one subarray for the gap 2 is not successfully sorted. As B ⊆ log n where log n u= +1 B c u implies the event that the sorting is successful for the gaps 2 +1 , · · · , 2 log n . All of the subarrays satisfy the condition a i < a i+2 in Theorem 4, before we perform the insertion sort for the gap 2 . Clearly, there are 2 subarrays when the gap is 2 , and the length of subarray is less than or equal to 2 n 2 +1 . Let m = n 2 +1 , and β = (α + 1 + log log n + ) · 1 log e . As β ≥ √ 6, the probability that one subarray of length 2m is not successfully sorted can be upper-bounded as where the second inequality is obtained from Lemma 6 if m ≥ β 2 . If m < β 2 , the left term of (10) is 0, and thus (10) trivially holds. We then obtain and thus, the theorem is proved. Remark: Theorem 7 states that the asymptotic running time complexity of Algorithm 4 is lower than the trivially modified Shell sort in Algorithm 3, which is O(n 2 ). The reduction of the running time in a concrete sense is rather clear, in that the number of iterative steps in the last for statement in Algorithm 4 is lower than that in Algorithm 3. On the other hand, the asymptotic running time of the modified Shell sort is lower than that of the insertion sort for the FHE setting, but the concrete comparison for them in a practical situation is not clear in this theoretical analysis. In Section V, we numerically compare their running time using TFHE homomorphic encryption scheme.

IV. NEAR-OPTIMAL WINDOW LENGTH BY CONVEX OPTIMIZATION
It is necessary to find the shortest window length for the SFP so that the least running time complexity of the modified Shell sort is obtained. Generally, it is not easy to derive the optimal window length in closed form. In this section, we obtain the near-optimal window length using convex optimization [30]. Let β n/2 +1 be the window length for the gap 2 , and Pr B log n u= +1 B c u be the SFP for the gap 2 , when sorting is successful for the gaps 2 +1 , · · · , 2 log n . From Theorem 4 and Lemma 6, we obtain The objective function that needs to be minimized is the total number of swap operations, which determines the running time. As the exact running time formula is rather complicated, we consider a tight upper bound of the running time, n log n =0 β n/2 +1 , which is used in the proof of Theorem 7. Let p = 2 e −β 2 . Then, we have β = √ ( + log(1/p ))/ log e. As it is sufficient to minimize log n =0 n/2 +1 ( + log(1/p )), the problem of the near-optimal window length can be formulated as follows; This formulation implies that the total running time with SFP upper-bounded by p err needs to be minimized. We can validate that c + log 1 x is a convex function on small positive values, where c is a constant. As the weighted sum of convex functions is also a convex function, the objective function is a convex function, and the constraint is also convex. Thus, this can be termed as a convex optimization problem. As every convex optimization problem can be solved using numerical analysis, it is easy to obtain the near-optimal window length. Then, we can deduce p , and the near-optimal window length is determined to be n/2 +1 ( + log(1/p ))/ log e for each gap 2 . It is noted that the above formulation is not sufficiently tight, because it still uses the union bound. Constructing a tighter formulation, which can be solved easily, can be a focus for future research.

A. SIMULATION WITHOUT HOMOMORPHIC ENCRYPTION
The performance of the proposed modified Shell sort is numerically verified using a personal computer with an AMD Ryzen 9 5950X CPU running at 2.04GHz, and 128GB RAM. First, we validate the running time and SFP when the array length varies. Then, the running time and SFP are numerically obtained when the parameter α is varied. Finally, the performance of the modified Shell sort is compared with the cases corresponding to the near-optimal window length, which is obtained using convex optimization, and Ciura's optimal gap sequence, which has been validated numerically as an optimal gap sequence in the non-FHE settings. We firstly simulate these sorting algorithms without homomorphic encryption schemes, i.e., in the plaintext region. Since the use of homomorphic encryption schemes can affect only the running time, the result of SFP values in this simulation has the same meaning in the case of using homomorphic encryption. Fig. 2 shows the relation between the running time and SFP against various array lengths for α = 3. It is observed that the array length increases from 50 to 1000. The input arrays are randomly generated, and 10 5 input arrays are generated for each array length. It is observed from Fig. 2 that the running time increases in proportion to n 3/2 , and the SFP is independent of the array length. This numerical result coincides well with the proposed analysis of the modified Shell sort. Note that the value c = α + 1 + log log n increases slightly as the length of the array increases.   [25], and a-win and o-win denote the analytically derived window length and near-optimal window length derived by convex optimization, respectively. The input array length is fixed at 1000. Similar to the previous simulation, 10 5 input arrays are randomly generated for each α value. Algorithm 4 and the case corresponding to the Ciura's optimal gap sequence or near-optimal window length are simulated, with the near-optimal window length derived using the convex optimization discussed in Section IV. From Fig. 3, it is observed that the running time of Algorithm 4 increases as α increases and the growth rate decreases. This observation coincides with the proposed analysis, i.e., the running time is approximately proportional to √ α. The logarithms of the SFP values of Algorithm 4 are parallel to that of the SFP bounds. This implies that the SFP is proportional to 2 −α with some small proportional constant. When the gap sequence is replaced with Ciura's gap sequence, the running time is reduced by approximately 0.5 ms. Sorting failure is not detected in the case of the simulation that uses Ciura's gap sequence. This implies that the order of the SFP of Ciura's optimal gap sequence is less than or equal to 10 −5 . Although the window lengths of each gap in this paper are analytically derived for the power of the 2-gap sequence, a better result is obtained when Ciura's optimal gap sequence is used.
We numerically find the value c = α + 1 + log log n when the SFP value reaches 10 −5 for some length of an array, and Table 1 shows the values. While the value c increases slightly as the length of the array increases when the gap sequence is powers-of-two and the SFP value is fixed, the value c decreases sharply as the length of the array increases. It suggests that the trade-off in the case of Ciura's gap sequence is asymptotically better than the case of the powers-of-two gap sequence. The exact asymptotical analysis of Ciura's gap sequence is an open problem.
The near-optimal window length is derived using the convex optimization problem described in Section IV. The running time in this case is marginally reduced compared with the case using the analytically obtained window length. However, their values become closer as α increases. The SFP of the case using the near-optimal window length for the power of the 2-gap sequence is closer to the SFP bound than that of the case using the analytically obtained window length. Thus, the running time can be reduced, while the SFP remains less than the SFP bound.

B. SIMULATION WITH TFHE SCHEME
In this subsection, we measure the running time of several sorting algorithms on encrypted data, including the modified Shell sort algorithm. We implement each sorting algorithm with the TFHE library [31]. The security parameter in the TFHE scheme is set to be 128, and the number of bits for each data is set to be 10. Table 2 shows the main parameters used in the simulation satisfying 128-bit security. For the modified Shell sort, we set the value of c to make the SFP 10 −5 , and the Ciura's gap sequence is used rather than the powers-oftwo gap sequence. The unit of each running time result is in seconds.
The sorting algorithm to be compared with the modified Shell sort is chosen as follows. Since the modified Shell sort can be the generalized algorithm for the insertion sort, we choose to compare the insertion sort. The randomized Shell sort [24] is the most related sorting algorithm to the proposed modified Shell sort, and thus we also choose it to be compared. These two sorting algorithms are in-place algorithms. For the recursive sorting algorithm, the odd-even merge sort and the bitonic sort are chosen, which are the standard oblivious recursive sorting algorithms. These two recursive algorithms are also used in [20] to compare the sorting algorithm for homomorphic encryption.
As for the odd-even merge sort and the bitonic sort, the length of the input array is originally assumed to be a power of two. However, we cannot generally choose the array length, and thus we perform the simulation with a more general type of numbers rather than the power-of-two array length. Since the length of the input array in our simulation is not a power-of-two integer, we add dummy data in the end of the array to make the input array power-of-two, and these dummy data is assumed to be larger than the data in the input array. Since these dummy data will not be moved in the sorting, we ignore the comparison if dummy data is homomorphically compared to other data, in order to erase the effect of the addition of dummy data. Thus, we can fairly compare the running time of each sorting algorithm in the case of more general array lengths other than the powerof-two length. We specify the whole algorithms used in the simulation in Appendix A. Table 3 shows the running time of several sorting algorithms required to sort an array of encrypted data of length 500. The running time of the modified Shell sort with Ciura's gap sequence is far lower than the insertion sort, which is the basic in-place sorting algorithm. The use of the modified Shell sort is proved to be efficient not only in the asymptotic sense but also practical sense. Also, although the randomized Shell sort [24] is asymptotically better than our algorithm, the performance of our algorithm is better than that of the randomized Shell sort for an array of length 500.
When we compare the efficient and recursive sorting algorithms, bitonic sort and odd-even merge sort, the running time of the modified Shell sort is yet larger, but it is closer than the original insertion sort. This running time performance will depend on the situation, especially in the IoT device. Since these recursive sorting algorithms make many function calls recursively and the memory of the input array is quite big, the transmission time can be a serious problem when the memory bus bandwidth is not large enough. In this situation, the modified Shell sort will be useful in that it uses no function calls or almost no additional memories. Even though the running times of bitonic sort and odd-even merge sort are smaller than that of the modified Shell sort, the numbers of memory and function calls of the bitonic sort and odd-even merge sort increases to 5121 and 9729, respectively. Table 4 shows the performance of the modified Shell sort, the insertion sort, and the bitonic sort for an array of various  lengths less than 500. While the running time of the insertion sort increases fast as the array length increases, the running time of the modified Shell sort with Ciura's gap sequence increases not very fast as the array length increases. This rate is somewhat similar to the rate of the Bitonic sort, whose running time complexity is better than ours.

VI. CONCLUSION AND FUTURE WORK
In this paper, we proposed a modified Shell sort with an additional parameter α in the FHE setting, and for a gap sequence of powers of two, we derived the running time complexity O(n 3/2 √ α + log log n), considering a trade-off with the SFP 2 −α . We also established that the running time complexity of the proposed algorithm is almost the same as the average-case running time complexity of the original Shell sort, while the SFP is maintained to be minimal. We then obtained the near-optimal window length of each gap by numerically solving a convex optimization problem. We believe that this study plays a significant role in the foundation of the analysis of the Shell sort in the FHE settings. Using the TFHE encryption scheme, the running time of the proposed modified Shell sort with Ciura's gap sequence was compared with that of the conventional sorting algorithms, and it has the best running time performance among other in-place sorting algorithms in the FHE setting.
The performances of the recursive sorting algorithms on the FHE data, such as bitonic sort and odd-even merge sort, are better than our algorithms but use many function calls in the process of sorting. The detailed analysis of the memory usage in the sorting algorithms for FHE and the simulation in the practical environment with limited memory bus bandwidth is also an important research topic, and we leave it as future work. Designing a practically faster variant of the Shell sort on the FHE data than the recursive sorting algorithms is also our future work. Also, we plan to analyze the modified Shell sort with other gap sequences.
Recently, Hong et al. [32] proposed the k-way sorting method extending the conventional 2-way sorting. All previous works used sorting of the two elements as the building block, but they used sorting of the k elements larger than two as the building block. Since we used the 2-way sorting in our modified Shell sort algorithm, analyzing the k-way Shell sort algorithm will be interesting future work.

APPENDIX A IMPLEMENTATION OF OTHER SORTING ALGORITHMS
We specify the algorithms for other sorting algorithms used in the simulation with TFHE scheme. Algorithm 5, 6, 9, and 12 shows the algorithms used in the simulation for the insertion sort, the randomized Shell sort, the odd-even merge sort, and the bitonic sort, respectively. Other algorithms are the subroutine algorithms for the main algorithm. Note that the randomized Shell sort can be implemented without any function calls by including the subroutine algorithms in the body of the algorithm, but the odd-even merge sort and the bitonic sort should be performed recursively.  4 Remove the m − n dummy elements at the end of A.

APPENDIX B EXPLANATION OF PROOF OF LEMMA 3
The recurrence relations (2)-(4) of p k (n, m) are similar to the Pascal's triangle n m = n−1 m + n−1 m−1 shown in Fig. 4(a), except that the width of the triangle for p k (n, m) is limited, as shown in Fig. 4(b) as well as (2) and (4). This recursive relation can then be transformed into overlapped Pascal's triangles. Fig. 4(c) shows a part of Fig. 4(b) near the boundary of the lower dotted line. Here, we only consider the lower dotted line. We then establish that this recursive relation near the boundary in Fig. 4(c) is equivalent to the situation of Fig. 4(d), which is two overlapped Pascal's triangles, in which p k (0, 0) = 1 and p k (0, k + 1) = −1. First, it can be obtained that the values on the dotted line in Fig. 4(d) are always 0, because of the symmetry of Pascal's triangles. As adding a 0 does not change the value, the cases of Fig. 4(c) and Fig. 4(d) are equivalent regarding the area to the left of the dotted line. However, the values on both the dotted lines in Fig. 4(b) must be 0. To satisfy the other boundary condition p k (2m + k + 2, m) = 0 on the upper dotted line in Fig. 4(b), we consider another Pascal's triangle translated by −(k + 2) with p k (0, −(k +2)) = −1. If we add these three Pascal's triangles P −1 , P 0 , and P 1 shown in Fig. 4(e), there are zero boundary values on the lines from Q 1 to Q 2 and from R 1 to R 2 . However, the boundary value after Q 2 or R 2 is not equal to 0. To obtain the boundary values on the lines from Q 2 to Q 3 and from  1, m, true). 4 Remove the m − n dummy elements at the end of A. n], init + count/2, count/2, ascend) R 2 to R 3 , we must add the Pascal's triangles P 2 and P −2 . Therefore, we repeat this process, as shown in Fig. 4(e). The sequence {b i } in Lemma 3 is the distance from the initial vertex of P 0 to that of P i , while {c i } is the distance from the initial vertex of P 0 to that of P −i . The initial value at the initial vertex of P i is 1 if i is even, and −1 if i is odd. Q i is defined as the intersection of the boundaries of the two Pascal's triangles starting from the initial vertices of P i−1 and P −i , and R i is defined as the intersection of the boundaries of the two Pascal's triangles starting from the initial vertices of P i and P −(i−1) . We establish that if the Pascal's triangles P i 's, i = · · · , −1, 0, 1, · · · , are overlapped, all of the integer points on the half-lines of −−→ Q 1 Q 2 and − − → R 1 R 2 must be 0s. The integer points on the upper half-line of −−→ Q 1 Q 2 exhibit the form n = 2m + k + 2 for all non-negative integers m, and those on the lower half-line of − − → R 1 R 2 exhibit the form n = 2m − k − 1 for all m ≥ k + 1. First, in the case of the points on the half-line of −−→ Q 1 Q 2 , we consider the integer points on Q j Q j+1 , which can be denoted as n 1 = 2m 1 + k + 2 and b i−1 ≤ m 1 ≤ b i . Then, we can only consider Pascal's triangles P −j , · · · , P j−1 . Considering the parallel translation of each Pascal's triangle, the overlapped values on the points are defined as As (m 1 + c i ) + (m 1 − b i−1 ) = 2m 1 + k + 2, 2m 1 +k+2 m 1 +c i = 2m 1 +k+2 m 1 −b i−1 holds, and (5) is equal to 0. In the case of the points on the half-line of − − → R 1 R 2 , we consider the integer points on R j R j+1 , which can be denoted as n 2 = 2m 2 − k − 1 and b i ≤ b i+1 . Then, we can only consider Pascal's triangles P j−1 , · · · , P j . The overlapped values on the points are defined as As (m 2 + c i−1 ) holds, and (6) is also equal to 0. Therefore, we establish that with respect to the region between the two dotted lines in Fig. 4(b), Fig. 4(b) is exactly equivalent to the hashed part of Fig. 4(e). We obtain p k (n, m) by adding the values of points of several Pascal's triangles as in (1), where the first term is from the central Pascal's triangle P 0 ; the second term is from the right-side Pascal's triangles P i 's for the positive integer i; and the third term is from the left-side Pascal's triangles P −i 's for the positive integer i. He joined the Semiconductor Division, Samsung Electronics, where he worked in the research and development of security hardware IPs for various embedded systems, including modular exponentiation hardware accelerator (called Tornado 2MX2) for RSA and elliptic-curve cryptography in smartcard products and mobile application processors of Samsung Electronics, until 2010. He is currently a Professor with Chosun University, Gwangju, South Korea. He is also a Submitter of two candidate algorithms (McNie and pqsigRM) in the first round for the NIST Post Quantum Cryptography Standardization. His research interests include post-quantum cryptography, the IoT security, physical-layer security, data hiding, channel coding, and signal design. He is selected as one of the 2025's 100 Best Technology Leaders (for crypto-systems) by the National Academy of Engineering of Korea.