Memory-Efficient Random Order Exponentiation Algorithm

Randomizing the execution of the sequence of operations in an algorithm is one of the most frequently considered solutions to improve the security of cryptographic implementations against side-channel analysis. Such an algorithm for public-key cryptography was introduced by Tunstall at ACISP, 2009. In his right-to-left <inline-formula> <tex-math notation="LaTeX">$m$ </tex-math></inline-formula>-ary exponentiation algorithm, the radix-<inline-formula> <tex-math notation="LaTeX">$m$ </tex-math></inline-formula> digits of the exponent are treated in somewhat random order. This randomized solution will inhibit attacks that allow operations to be distinguished from <italic>one acquisition</italic>. In this article, we present a memory-efficient variant of Tunstall’s random-order exponentiation algorithm, making it applicable to modular exponentiations in <inline-formula> <tex-math notation="LaTeX">$(\mathbb {Z}/N \mathbb {Z})^{*}$ </tex-math></inline-formula> (for instance, the RSA cryptosystem). The proposed algorithm requires only <inline-formula> <tex-math notation="LaTeX">$(m + 1)$ </tex-math></inline-formula> memory registers instead of <inline-formula> <tex-math notation="LaTeX">$(m + r)$ </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">$r > m$ </tex-math></inline-formula> as recommended in Tunstall’s algorithm. Namely, the proposed algorithm saves about half the memory registers. Our analysis shows that our algorithm can be used as a supplement in order to defeat statistical side-channel analysis attacks, especially recent collision-correlation power analysis in the <italic>horizontal setting</italic>. Last but not least, we present a random order binary implementation, which is the first right-to-left binary implementation resisting attacks in the horizontal setting.


I. INTRODUCTION
Side-channel analysis (SCA) attacks, formally introduced by Kocher et al. [16] and Kocher [17], are nowadays one of the most serious threats to the security of a given implementation of a cryptographic algorithm. This kind of attacks uses leaked side-channel information from cryptographic devices to determine the secret key. Kocher et al. described two main attacks: simple power analysis and differential power analysis. While the former uses the power consumption from one or several measurements directly to determine the secret information, the later (also called statistical side-channel analysis) requires a large number of consumption traces, and statistical tools to exploit the correlations between the leakage and processed data to recover the secret information. Regular algorithms, e.g., square-multiply always [8], or Montgomery powering ladder [14] could be resistant to simple power analysis. However, to prevent differential power analysis, one would use blinding techniques [8], [16], or randomized techniques [19].
The associate editor coordinating the review of this manuscript and approving it for publication was Fan Zhang.
Shuffling that was first introduced to symmetric encryption algorithms in [12], provides additional resistance against statistical SCA attacks. Basically, shuffling randomizes the execution of the sequence of operations in an algorithm and can be applied to any set of independent operations. Nowadays, shuffling is one of the main approaches used to thwart different power analysis attacks for symmetric cryptographic implementations. Readers are referred to [10], [22] for comprehensive studies about this method.
For public-key cryptosystems, Tunstall introduced such a shuffling technique in [21]. In his randomized right-to-left m-ary exponentiation algorithm, operations are performed in somewhat random order. The author observed that in the right-to-left exponentiation algorithm, the multiplication operations can be performed independently, and in any order without influencing the final result. To our knowledge, this is so far the only random-order countermeasure for public-key cryptosystems.
One disadvantage of Tunstall's algorithm is to require (m + r) memory registers to store group elements, where r > m is large enough to provide a suitable level of random ordering. Compared to the usual right-to-left exponentiation, VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ his algorithm requires r extra memory registers. This may not matter in software implementation. However, in hardware implementation, e.g., smart devices with constrained resources, this algorithm is unlikely to be possible for exponentiation in (Z/N Z) * . In this article, we first propose a memory-efficient variant of Tunstall's algorithm. The proposed algorithm requires only (m + 1) group elements stored in memory instead of (m + r). This memory requirement is comparable to the usual m-ary algorithm. We then analyze the security of the proposed algorithm, as well as existing right-to-left m-ary algorithms in the presence of the recent advanced differential power analysis in horizontal setting [6], [7], [11], [23]. Last but not least, we present an efficient implementation of the random order algorithm in the binary case. To the best of our knowledge, it is the first right-to-left binary exponentiation algorithm that resists to the horizontal collision-correlation attacks.
The rest of the paper is organized as follows. We briefly recall in Section II power analysis, and the random order mary exponentiation algorithm. Section III describes the proposed algorithm and analyzes its performance. Section IV describes our random order binary exponentiation and Section V analyzes the security of the proposed algorithm, as well as the security of the existing right-to-left m-ary exponentiation algorithms against the Big Mac attack and its extensions. We conclude in Section VI.

A. POWER ANALYSIS
Simple Power Analysis (SPA for short) attacks aim at recovering the secret key by just querying one message to the embedded devices. From the power consumption trace measured, an attacker makes use of a distinguisher to deduce a sequence of squaring and multiplication operations that is equivalent to the secret exponent in some implementations. Regular algorithms [8], [14] and atomic algorithms [5] can be used to thwart SPA attacks.
Differential Power Analysis (DPA for short) attacks, also known as statistical side-channel analysis attacks, exploit the correlations between the leakage and processed data to defeat countermeasures that are immune from SPA. In contrast to SPA attacks, DPA attacks require a large number of power consumption traces and then make use of statistical tools to deduce the secret information. Consequently, many improvements to DPA have been introduced. For example, Correlation Power Analysis [4] and Collision-Correlation Analysis [24] require far fewer power traces than the original DPA. Randomizing techniques [8], [16], [19], [21] can be used to inhibit DPA attacks.

Big MAC Attack and Its Extensions
All the statistical analysis attacks mentioned above require numerous traces to be taken to reduce noise to the point of time where an attack will perform. On the contrary, the Big Mac attack [23] requires only one power consumption trace to recover the secret key. This is a horizontal statistical analysis attack. The original Big Mac attack applies to m-ary exponentiation and to all similar algorithms which use a table of pre-computed values. Subsequent works further studied the Big Mac attack and demonstrated with experimental results. Instead of using a Euclidean distance, Clavier et al. [7] used the Pearson correlation to detect collision between two multiplications. Studies in [6], [11] presented the further refined attacks that use collision-correlation and applied not only to RSA but also to elliptic curve cryptosystems.
Randomizing intermediate computations can be used to thwart the horizontal (collision)-correlation analysis [7]. This approach requiring randomization of the intermediate long integer multiplication is generally costly from a performance viewpoint.

B. RANDOM ORDER EXPONENTIATION ALGORITHM
For an input key, the square-and-multiply algorithm [20] outputs a unique sequence of squares and multiplications. This allows an attacker to immediately determine the private key. On the other hand, given the same input, randomized algorithms output different sequences of operations. The idea is to randomize the number and the sequence of operations executed in the exponentiation algorithm itself.
Let n = (d −1 , . . . , d 1 , d 0 ) m denote the radix-m representation of an exponent n, where = log 2 (n)/w and w = log 2 (m), that is n = i d i m i with d i ∈ {0, 1, . . . , m − 1} and d −1 = 0. From this expansion, Yao [25] introduced a right-to-left m-ary exponentiation. Its principle is based on the following equality: In Yao's algorithm (Algorithm 4 in the Appendix), one uses (m−1) accumulators, R [1], . . . , R[m−1], each of them initialized the 1 G . A loop is processed which applies w successive squarings in every iteration to compute A = x m i from x m (i−1) , and which multiplies the result to some accumulators R[j], where j = d i . Let R[j] (i) (resp. A (i) ) denote the value of the accumulator R[j] (resp. A) before entering step i. We have: At the end of the loop each accumulator R[j] contains the product 0≤i≤ −1 d i =j x m i . The different accumulators are finally aggregated as 0≤j≤ −1 R[j] j = x n . This algorithm requires more memory than the binary method but it is faster since the number of multiplications is roughly reduced to 1 + m−1 m log 2 m log 2 n (see [18] This approach certainly costs extra memory to store precomputed values S[i], for 0 ≤ i < . In order to reduce this required memory space, the idea can be repeatedly applied with only r < precomputed values. Tunstall [21] used this interesting observation to describe a random order m-ary exponentiation algorithm (Algorithm 1) as follows 1 : Algorithm 1: Random Order Right-to-Left m-Ary Exponentiation Algorithm [21] Input:

return A
Basically, Algorithm 1 makes use of a precomputed table S to store r values A (i) = x m i , 0 ≤ i ≤ , and a list of r corresponding digits d i of the exponent. For example: corresponding with the list: Firstly, Algorithm 1 precomputes and stores x m i , for i ∈ {0, . . . r − 1}. Then, at each step, the algorithm chooses at random a stored value S[τ ], updates the corresponding accu- [10][11][12], computes the next precomputed value S[γ ] m , and finally overwrites this value to the register S[τ ] that has been chosen to compute (line 13). By performing in such a random order, an attacker can't guess the value of digit being processed at a specific point in time for each acquisition in a set of acquisitions. The randomization of Algorithm 1 is performed within one exponentiation, and thus it is able to inhibit power analysis attacks that allow operations to be distinguished from one acquisition as discussed in Section II-A. Although Algorithm 1 reduced the required memory compared to the original idea (i.e., r instead of group elements), it still requires a large amount of memory to store precomputed values to provide a suitable level of random ordering, and thus guarantee the security against Big Mac attacks and its extension. In [21], the author stated that one needs r > m to add as much randomness as in the exponent n and that his algorithm is not suitable to implement exponentiations in (Z/N Z) * (see an analysis in [21, Section 6] for more details).

A. OBSERVATIONS
Our improvements are from the following observations. In the precomputed table of Algorithm 1, there may exist two values and corresponding digits d i may be, The elements S [1], S [2] stores values x m 6 , x m 2 corresponding digits d 6 , d 2 , and these two digits have the same value of 3.
As mentioned in Section II-A, the horizontal collisioncorrelation power analysis attacks assume that a collisioncorrelation can be detected when the output of an operation is the input to another operation, i.e., these two operation are processing the same value of d i . In the above example, if S [1] and S [2] are consecutively processed, an attacker will be able to detect a collision.
In order to avoid such a potential collision attack, we won't allow any repetition in the precomputed table. We store in [1] and so on till S[m − 1]. For example, if we have: then the corresponding digits d i must be: The size of the precomputed table thus could be fixed to m instead of r as in Algorithm 1.
Even, we can do better by using the sliding window technique to reduce the number of accumulators required. Since the digits d i are only odd numbers, i.e., d i ∈ {1, 3, . . . , m−1}, our algorithm thus requires only m 2 memory registers for the precomputed table. For example, m = 8: This section describes our random-order sliding window algorithm, which offers the following features: 1) it requires less memory registers than Tunstall's algorithm, that is, (m + 1) instead of (m + r) registers. It is thus more likely to implement exponentiations in (Z/N Z) * ; 2) it doesn't use a fixed base (i.e., m), but varies the base.
Hence, more potential values of digits d i are generated for a fixed exponent n (line 7-10 in Algorithm 2). This increases the level of randomness compared to Algorithm 1 and thus minimizes the collision-correlation between operations. Likewise, we denote R[j] (i) (resp. A (i) ) for the value of the accumulator R[j] (resp. A) before entering iteration i. Like the left-to-right sliding window method, Algorithm 2 is treating w binary digits d = (n i+w−1 , . . . , n i ) 2 in the case n i = 1 as follows: ( . The explicit description of the proposed algorithm is given in Algorithm 2. Instead of decomposing the exponent in k fixed windows of w(= log m) bits, the proposed algorithm treats bit-by-bit from the least significant bit to the most significant bit. The algorithm performs a square of A (line 7)

Algorithm 2: Random Order Sliding Window Exponentiation
Input: x ∈ G, w = log 2 m, and an k-bit integer n = (n k−1 , . . . , n 1 , n 0 ) 2 ∈ N Output: x n if n i = 0, however, this may not be the actual bit value as n is further processed at the line 18. To inhibit the simple power analysis attack, the proposed algorithm requires squaring and multiplication operations to be performed in the same routine, i.e., using atomic principle (see [5]).

D. PERFORMANCE CONSIDERATION
From the memory point-of-view, the main advantage of Algorithm 2 is the number of memory registers required to be only m + 1 instead of m + r as in the original algorithm. It is also worth to note that, unlike Tunstall's algorithm, the proposed algorithm doesn't require an array D of r elements (see Algorithm 1) to store the values of the digits. As suggested in [21], r should be greater than m. Thus, the proposed algorithm approximately requires a half of the number of registers required in Tunstall's algorithm. It also is worth to note that the number of registers required in the proposed algorithm is competitive with that in Yao's m-ary algorithm (Algorithm 4) that is insecure against the Big Mac attack (as analyzed in Section V). Since the size of window will reduce in some cases (lines 11-12), Algorithm 2 may require slightly more multiplications than the m-ary window method. A summary of comparison is given in Table 1. . We also define a function lookup(e, D), looking for the first element in the array D whose value is equal to e. If found, it returns the index j of that element, that is, e = D[j]. If not, it returns ∅. Finally, as the right-to-left square-and-multiply always, the accumulator R [1] (resp., R[0]) accumulates and outputs the values x n (resp., x 2 k −n−1 ), where k is the bit-length of the exponent n. The algorithm works as follows. At each step, depending on the randomly chosen value b (line 5), Algorithm 3 will perform a multiplication related to R[n i ] (line 7 or line 18)

B. EXAMPLE
We demonstrate the correctness of the above algorithm by the following example. Let us compute x n , where n = 135 = (1000 0111) 2 .

return R[1]
Assuming b = 0, Algorithm 3 executes line 9, looking for a delayed operation. As there is no such an operation, it goes to line 13 to look for an available accumulator to delay the operation (line 14). × x 32 = x 120 ). Finally, Algorithm 3 returns R [1] as its output, that is x 135 .

C. DISCUSSION
As the right-to-left square-and-multiply always algorithm, the proposed binary algorithm performs two group operations, one multiplication and one squaring, per bit. From the security viewpoint, at the round i, an attacker couldn't determine the value of bit n i . That is because at that round the proposed algorithm would perform a multiplication related to R[n i ] or to R[¬n i ] with the probability 1/2 if the value of bits has a uniform distribution. The proposed algorithm is thus resistant to the Big Mac attack and its extensions. To the best of our knowledge, it is the first right-to-left binary exponentiation algorithm that resists to such attacks, and remains the same performance, i.e., it requires 2 group operations per bit. On the other hand, Algorithm 3 requires 5 instead of 3 registers in comparison to the right-to-left squareand-multiply always algorithm.

Secure Against Fault Analysis
The outputs of R[0] and R [1] of Algorithm 3 can be used to prevent the fault attacks as suggested by Boscher et al. in [3] due to this relation: As it can be seen in the above example, we have x × x 120 × x 135 = x 256 . Both Algorithm 2 and Algorithm 3 are resistant to combined attack [2] and safe-error attacks [26] because at the i-th loop, the digit processed may not be n i (counting from right to left) with high probability, and hence the attacker learns nothing about the value of n i . In addition, Algorithms 2-3 are both secure against safe-error attacks because they don't have dummy operations. A security comparison between right-toleft binary algorithms is shown in Table 2.

V. SECURITY ANALYSIS A. SIMPLE POWER ANALYSIS
Using the atomic principle [5], Algorithm 2 requires that the multiplication and squaring operations are implemented by using identical code and, therefore, cannot be distinguished easily. In the case the attacker can distinguish a multiplication from a squaring by using statistical methods (e.g., [1]), she/he may learn about the number of bits '0' in the secret exponent but it is unclear whether she/he would determine the real value of the current bit because at each iteration the proposed algorithm performs a squaring without considering the current bit is '0' or '1'.

B. DIFFERENTIAL POWER ANALYSIS
To thwart DPA attacks, classical blinding techniques (e.g., message, exponent blinding) with a big enough randomness (e.g., 48-bits) can be applied. However, as discussed in Section II-A, the Big Mac attack may defeat all these blinding techniques.
In the following section, we focus on analyzing the security of the proposed algorithm in the presence of the Big Mac attack and its extensions, that is statistical side-channel analysis attacks in the horizontal setting. We start with an extension of the Big Mac collision-correlation attack to analyze the security of the right-to-left m-ary exponentiation and the security of the random order right-to-left m-ary exponentiation algorithm. Then, we analyze the security of the proposed algorithm.

C. HORIZONTAL COLLISION-CORRELATION ANALYSIS 1) ON RIGHT-TO-LEFT m-ARY EXPONENTIATION
For implementations of the right-to-left binary algorithms, Hanley et al. [11] presented horizontal collision correlation analysis on Joye's add-only exponentiation algorithm [13], and then Feix et al. [9] presented a similar attack in the rightto-left square-and-multiply always [13]. While, the former uses the fact that the register R 0 (resp. R 1 ) remains the same when the value of bit being processed n j = 0 (resp. n j = 1), the later uses the fact that if the two consecutive bits have the same value then the output of the multiplication in the previous loop will be the input of the multiplication in the next loop.
In the case of the right-to-left m-ary exponentiation with m > 2, we assume that m is a power of 2 so that raising to the m-th power is a sequence of log 2 m squarings. For convenience, we also assume that the mth power can be detected by recognizing squares from multiplications. Similar to [9], the attack uses the fact that the adversary can detect a collision-correlation when the output of an operation is the input to another operation (i.e., operations processing the same value of d i ). As Big Mac attack, the attacker must then partition the multiplications (line 7 in Algorithm 4) into disjoint sets for which the digits d i have the same values. Once the partitioning has been performed, there are (m−1)! ways of associating specific different digit values with the m − 1 sets of multiplications. One of these choices will yield the sought key. Because the value of m shouldn't be too big, this attack computationally can be performed.

2) ON THE RANDOM ORDER m-ARY EXPONENTIATION
We revise the security of Tunstall's algorithm against the Big Mac collision-correlation analysis attack and show that under this attack, we don't need to set r > m to get more randomness than r = m. Likewise, we use the fact that the attacker would detect a collision-correlation when the output of a multiplication is the input to another multiplication (i.e., the multiplication involving in the accumulators R[j]). If an attacker attempts a collision-correlation analysis attack, it would be assumed that the digit treated at the t-th loop has the same value with the (t + r − 1)-th digit of the exponent, d t+r−1 (counting from right) that has just been included the set of digit from which the algorithm will randomly choose. As long as such a digit chosen, the attacker can detect a collision-correlation because the accumulator R[d t+r−1 ] will be the first operand of the multiplication in line 13, Algorithm 1. This is different from the security analysis in [21,Section 6.2], where the partial correlation is detected due to the second operand of the multiplication, that is there is correlation when the digit treated must be one that has been included. Let us assume that the values of digits have a uniform distribution. So, it doesn't matter what size r is of (for r > m), the probability a digit chosen having the same value with d k+r−1 is 1/m. In this setting of attacks, to balance the memory performance and the security, the best value of r should be m. On the other hand, since the digit treated is randomly chosen at the t-th loop, this digit wouldn't have the same value of d t+r−1 with the probability (m−1) m . This randomization is performed within one execution of the exponentiation, Tunstall's algorithm is hence secure against the horizontal collision-correlation power attacks.

3) SECURITY OF THE PROPOSED ALGORITHM
The proposed exponentiation algorithm (Algorithm 2) randomly performs multiplications from the set of delayed multiplications. Similar with Tunstall's algorithm, assume that a side-channel attacker learns the value of digit e (line 13, Algorithm 2) being processed at the loop t, however she/he may not learn about the real position of this e. Thus, the proposed is secure against the Big Mac attack and its extensions that deduce the secret information from a single power trace.
Unlike Tunstall's algorithm, Algorithm 2 doesn't use a fixed base m, but varies it by using the sliding-window technique. In each iteration, the proposed algorithm tries to find a good digit to execute, it therefore minimizes the possibility that the digit treated has the same value of the digit that has been included. Moreover, this allows the proposed algorithm to generate plural unpredictable values of digits {d 0 , d 1 , . . . , d −1 } for a fixed exponent n. If the attacker collects different power traces, she/he would deduce different combinations of digits d i .
As analyzed in [21], the random-order exponentiation algorithms can be used as a supplement, rather than as a replacement, to the blinding countermeasures. By combining the random order algorithm with a blinding countermeasure, exponentiation implementations should be resistant to statistical side-channel analysis in both vertical and horizontal setting.

VI. CONCLUSION
In this article, we revisited Tunstall's random order m-ary exponentiation algorithm. We considered its security against the Big Mac collision-correlation analysis attack and then present a memory-efficient variant. The proposed algorithm requires only (m + 1) group elements in memory instead of (m + r). Finally, we presented an efficient implementation in the binary case. To the best of our knowledge, it is the first right-to-left binary exponentiation algorithm that resists to side-channel attacks only using a single consumption trace such as the combined attacks or the horizontal collision-correlation attacks.

APPENDIX. EXPONENTIATION ALGORITHMS
A right-to-left m-ary version of Algorithm 4 was described by Yao in [25] and hence it is often referred as Yao's algorithm.
For example, one needs to compute x n , where n = 871. The binary representation of n is (1101100111) 2 . If one use Algorithm 4 with m = 4 (i.e., n = (31213) 4 ), the register A