Chosen-Ciphertext Clustering Attack on CRYSTALS-KYBER Using the Side-Channel Leakage of Barrett Reduction

This study proposes a chosen-ciphertext side-channel attack against a lattice-based key encapsulation mechanism (KEM), the third-round candidate of the national institute of standards and technology (NIST) standardization project. Unlike existing attacks that target operations, such as inverse NTT and message encoding/decoding, we target <inline-formula> <tex-math notation="LaTeX">$\mathsf {Barrett~reduction}$ </tex-math></inline-formula> in the decapsulation phase of <inline-formula> <tex-math notation="LaTeX">$\mathsf {CRYSTALS{-}KYBER}$ </tex-math></inline-formula> to obtain a secret key. We show that a sensitive variable-dependent leakage of <inline-formula> <tex-math notation="LaTeX">$\mathsf {Barrett~reduction}$ </tex-math></inline-formula> exposes an entire secret key. The results of experiments conducted on the ARM Cortex-M4 microcontroller accomplish a success rate of 100%. We only need six chosen ciphertexts for <inline-formula> <tex-math notation="LaTeX">$\mathsf {KYBER512}$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$\mathsf {KYBER768}$ </tex-math></inline-formula> and eight chosen ciphertexts for <inline-formula> <tex-math notation="LaTeX">$\mathsf {KYBER1024}$ </tex-math></inline-formula>. We also show that the <inline-formula> <tex-math notation="LaTeX">$\mathsf {m4}$ </tex-math></inline-formula> scheme of the <inline-formula> <tex-math notation="LaTeX">$\mathsf {pqm4}$ </tex-math></inline-formula> library, an implementation with the ARM Cortex-M4 specific optimization (typically in assembly), is vulnerable to the proposed attack. In this scheme, six, nine, and twelve chosen ciphertexts are required for <inline-formula> <tex-math notation="LaTeX">$\mathsf {KYBER512}$ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$\mathsf {KYBER768}$ </tex-math></inline-formula>, and <inline-formula> <tex-math notation="LaTeX">$\mathsf {KYBER1024}$ </tex-math></inline-formula>, respectively.

that are particularly important for companies looking to secure their IoT devices and assets.
A key encapsulation mechanism (KEM), a public-key cryptosystem for generating a shared secret key between two parties, is needed to establish cloud-based peer-to-peer secure transactions. Diffie-Hellman (DH), Rivest-Shamir-Adleman (RSA), and elliptic curve cryptography (ECC) have been mainly used; however, they are insecure under quantum computer attacks [3]. Hence, if a large-scale quantum computation is realized, KEMs become vulnerable. Experts estimate that RSA, with a public-key size of 2000-bit, will not guarantee safety until 2030 [4]- [6].
To address this issue, the national institute of standards and technology (NIST) is working on the postquantum cryptography (PQC) standardization project [7]. The third-round candidates (seven finalists and eight alternatives) of the NIST PQC project were notified on July 22, 2020 [8]. Accordingly, 15 (seven, excluding alternatives) candidates were selected in the third-round of the NIST PQC project, and nine (four, excluding alternatives) of them are public-key encryption (PKE)/KEMs [8]. Lattice-based KEMs have got increasingly concerned due to their balanced performance in size and speed. Among the third-round KEM candidates, five (three, excluding alternatives) schemes are lattice-based KEMs [9]- [13]. They are classified into two types: 1) the schemes based on the learning with error (LWE)/learning with rounding (LWR) problem [9]- [11] and 2) the schemes based on the NTRU problem [12], [13]. CRYSTALS-KYBER, SABER, and FrodoKEM belong to the first class, whereas NTRU and NTRU Prime belong to the second.
Even if a cryptographic scheme is secure against mathematical analysis owing to the hardness of the mathematical problem, it is subject to side-channel attacks (SCAs). It was first discovered by Paul Kocher in 1996 [14], and many cryptographic schemes have been easily broken by SCAs. SCAs allow recovering secret information (e.g., a cryptographic key) using physically measured side-channel information. Side-channel information includes consumed power, radiated electromagnetic wave, emitted sound, and executed time while the cryptographic device operates. Therefore, SCAs are considered major threats to the implementations of cryptographic schemes, especially for applications in embedded devices. Recently, the investigation of SCAs for PQC has attracted increasing attention in connection with the NIST PQC project. Given that most of the candidates are implemented to execute This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ constant time, simple timing attacks that measure only execution time can be prevented. Even if the algorithms have a constant time implementation, they can be vulnerable to the other SCAs, such as power analysis and electromagnetic analysis. Not only are many researchers finding SCA vulnerabilities for PQC implements but NIST also noted that implementations addressing SCAs are more meaningful than those that do not [15]. Therefore, various SCAs related to PQC are being studied to verify the side-channel resistance of PQC [16]- [38].
Most IoT devices come with limited resources, i.e., power constraints, strict memory, and chip area. Currently, NIST officially requires performance evaluations of PQC's software implementations on ARM Cortex-M4 microcontrollers available in a wide range of IoT devices. Accordingly, the open-source library pqm4, the testing and benchmarking framework for PQC schemes operating on the ARM Cortex-M4 microcontroller, was initiated by the PQCRYPTO project (ICT-645622) funded by European Commission in the H2020 program [39]. The pqm4 library is specifically optimized for the ARM Cortex-M4 microcontroller. Therefore, to use IoT devices secure against SCAs must involve verifying the side-channel vulnerability against the pqm4 library.

A. Related Works
Lattice-based KEMs have been studied for different types of SCAs vulnerability. Especially, several studies about sidechannel assisted chosen-ciphertext attacks (CCAs), which recover the secret key, have been conducted [30]- [38]. CCAs on various operations, such as error-correcting codes, inverse NTT, message encoding/decoding, and Fujisaki-Okamoto (FO) transform, have been studied.
D'Anvers et al. [31] reported that the Ring-LWE scheme LAC's secret key leaked by exploiting variable runtime of error-correcting codes in decryption. They used less than 2 16 decryption queries to recover the secret key. The following year, Ravi et al. [32] proposed generic side-channelassisted CCAs on six lattice-based KEMs. They used binary information about the message through EM leakage in errorcorrecting procedures and FO transforms to perform key recovery. Their attacks could also be applied to implementations operate in a constant time.
More recently, Xu et al. [34] showed that an attacker with complete knowledge of the decrypted message for chosen ciphertexts could perform the full key recovery using small decapsulation queries for KYBER512. They targeted the inverse NTT for the clean scheme and the message encoding function for the m4 scheme. Four and eight decapsulation queries were used to recover the secret key for the clean and m4 schemes, respectively. Ravi et al. [35] demonstrated side-channel assisted message recovery attacks, which target storage of the decrypted message in memory. In more detail, they exploited the fact that the decrypted message is stored one bit at a time. That is, it is possible to restore a message by comparing the Hamming weight of the message stored immediately before. As a result, the full message recovery of KYBER512 was possible with a single trace (actually, the success rate ramps to 98.24% with five averaged traces), but this method required 128k traces to profiling. Another method they proposed was to recover the message by using the targeted flip of message bits and the cyclic message rotation technique. In the presence of a side-channel Hamming weight classifier, this technique required (w + 1) traces to recover the full message where w is the storage width. They mentioned that implementations with shuffling and masking countermeasures could also be attacked. Unfortunately, their attack on protected implementations with shuffling and masking requires a strong attack assumption that an attacker can turn off or deactivate the countermeasure to generate templates. They also proposed the recovered message-based key recovery attack. Six chosen ciphertexts are needed to recover the secret key of KYBER512. However, the specification of CRYSTALS-KYBER was updated and the noise parameter of KYBER512 was increased [40]; thus, it is obvious that more chosen ciphertexts are needed than the number stated in [34] and [35].
Ngo et al. [36] proposed the first SCA on a first-order masked SABER. They used the incremental storage leakage presented in [35] and applied deep learning-based power analysis. Extracting the random mask at each execution was unnecessary because the input trace contains both where the shares m ⊕ r and r were computed. Thus, they could improve success probability by combining score vectors of the multiple-trace attack. The [8,4,4] 2 extended Hamming codes were applied to improve the key-recovery attack, and 16 chosen ciphertexts were used for LightSaber.
Although CCAs on various operations have been studied, no study has been conducted on reductions. The input value of the reduction in decryption is also affected by the secret key; thus, it can lead to attacks that use CCA to derive the secret key. Xu et al. [34] mentioned that operations after the inverse NTT could be vulnerable; however, they did not perform a detailed analysis. Additionally, the output of the inverse NTT can have various values; thus, there are many restrictions on finding a valid chosen ciphertext. Xu et al. presented that 15 possible binary classifiers and 40 possible ternary classifiers exist. The incremental storage leakage used in [35] relies only on the 1-bit value of the decoded message, requiring average preprocessing to increase the signal-to-noise ratio (SNR). Moreover, template generation is necessary for attacks. These works motivated us to investigate a new attack position that constructing chosen ciphertexts is more efficient and can maximize the side-channel leakage.

B. Main Contributions
In this study, we focus on a lattice-based KEM corresponding to the third-round candidate of the NIST PQC standardization project. Specifically, we present a comprehensive analysis and the corresponding experiment results on CRYSTALS-KYBER by focusing on Barrett reduction in the decapsulation phase, which was not considered a target operation against SCA-based CCAs. The main contributions of this study can be summarized as follows.
We introduce a chosen-ciphertext clustering attack using the side-channel leakage of Barrett reduction in the decapsulation phase. The obtained experimental results show that we can recover the full secret key using six chosen ciphertexts for KYBER512. In the ref, clean, and opt schemes, six and eight chosen ciphertexts are needed for KYBER768 and KYBER1024, respectively. In the m4 scheme, nine and twelve chosen ciphertexts are needed, respectively. Our target intermediate value can have only three values, and 14 496 782 valid chosen ciphertexts exist. Moreover, the maximum difference in leakage would be noise resistant because it is proportional to 13, which is the Hamming distance between the two intermediate values. Therefore, averaging is not required to increase SNR, and template building is also unnecessary.

C. Organization
The remainder of this article is organized as follows.
In Section II, we briefly explain the specification of CRYSTALS-KYBER. We explain the proposed chosenciphertext clustering attack methodology in Section III, and we show experimental results in Section IV. In Section V, we recommend countermeasures. Finally, we summarize the conclusions in Section VI.

II. PRELIMINARIES
A. Notation 1) Let n and q be positive integers.
2) Let R be a base ring defined as Z[x]/ x n + 1 . R can be represented as 3) Let R q := R/qR. The quotient ring R q can be represented as

B. CRYSTALS-KYBER
CRYSTALS-KYBER [40] is a lattice-based KEM using a PKE scheme similar to the LPR encryption scheme suggested by Lyubashevsky et al. [41]. It is based on a polynomial ring R q = Z q [x]/ x n + 1 of the dimension n = 256 and modulus q = 3329. The parameters k, p, and t are different according to the security level. Three parameter sets, namely, KYBER512, KYBER768, and KYBER1024, aim to support NIST security levels 1, 3, and 5, respectively.
Hash1 and Hash2 are SHA3-256 and SHA3-512, respectively. KDF is implemented using SHAKE-256. Compress q,log p (x) and Compress q,log t (x) take an element x ∈ Z q and output log p-and log t-bit integers, respectively. Decompress q,log p (x) and Decompress q,log t (x) take log pand log t-bit integers, respectively, and output y ∈ Z q .
encode is message encoding that converts -bit message to a polynomial. decode is message decoding that is the inverse of encode. Algorithm 1 illustrates message decapsulation of CRYSTALS-KYBER. To construct the IND-CCA2-secure KEM, a slightly tweaked FO transform is applied on a CPA-secure PKE.

III. PROPOSED CHOSEN-CIPHERTEXT CLUSTERING ATTACK ON CRYSTALS-KYBER
In this section, we propose a chosen-ciphertext clustering attack on CRYSTALS-KYBER using a sensitive variabledependent leakage of Barrett reduction.

A. Sensitive Variable-Dependent Leakage of Barrett Reduction
We target step 5 of Algorithm 1. We focus on the v − s u mod q operation, which calculates the input of decode.
We downloaded the reference implementation submitted to NIST [42]. Listings 1-3 illustrated decryption, reduction, and Barrett reduction in the reference implementation of CRYSTALS-KYBER, respectively. In Listing 1, skpv, bp, and v are s, u, and v described in Algorithm 1, respectively. At steps 12 and 14 of Listing 1, s u is calculated in the NTT domain, and Montgomery reduction is applied to the output. Hence, for the output polynomial mp of poly_invntt_tomont(), all coefficients mp i satisfy Here, mp is s u, and it is the input of poly_sub(). For a polynomial v, all coefficients v i satisfy Accordingly, for the output polynomial mp of poly_sub() at step 16 of Listing 1, all coefficients mp i satisfy Here, mp is v − s u, and it is the input of poly_reduce().
As shown in steps 6 and 7 of Listing 2, Barrett reduction applies to all coefficients of the input polynomial mp. The intermediate value t at steps 9 and 10 of Listing 3 is described as follows: The intermediate value t is determined by one of three values depending on the coefficient of v−s u. Given that s is a secret key, i.e., sensitive variable, the intermediate value t can leak sensitive variable-dependent information.

B. Designing Threat Model
Our threat model is a CCA using a sensitive variabledependent side-channel leakage. Thus, we construct chosen ciphertexts to magnify the difference in the sensitive variabledependent leakage of t depending on the coefficient value of s.

1) Constructing Chosen Ciphertexts:
We establish criteria for constructing chosen ciphertexts as follows.
1) Because the Hamming weight difference between 0 and −3329 is 13, which is the largest, we configure ciphertexts so that t is 0 or −3329. 2) t is configured so that only one coefficient value of the secret key is affected.
Here, s j and u j are polynomials in the ring R q for 0 ≤ j ≤ k−1. We denote s j,i and u j,i as the ith coefficient of polynomials s j and u j , respectively, for 0 ≤ i ≤ n − 1. To make the intermediate value t affected by only one coefficient value of s 0 , we set all coefficients of u, except u 0,0 , to zero. Thus, u 0 is a constant, and u j for 1 ≤ j ≤ k − 1 is zero. We also set v as zero to remove its effects (We can set the values of all coefficients v i to the same value. In this case, the value of the chosen-ciphertext is slightly changed.). Accordingly, all coefficients of the input polynomial mp of poly_reduce are determined as mp i = −s 0,i u 0,0 for 0 ≤ i ≤ n − 1.
For KYBER512, s = (s 0 , s 1 ) and u = (u 0 , u 1 ). Thus, we set (u 0 , u 1 ) = (x, 0) and v = 0, where x ∈ Z q . To  We choose three u 0 values, as shown in Table I, and make sequences based on the value of t. Set to 0 when t is zero; otherwise, 1 to create sequences. As shown in Table I, the sequence according to the sensitive variable s 0,i is different. Therefore, if the sequence is obtained using the side-channel leakage, then the sensitive variable s 0,i is discovered. Accordingly, we can recover s 0 , half of the secret key, by performing coefficientwise analysis. Similarly, we can recover s 1 by using chosen ciphertexts u = (0, 208), u = (0, 1109), and u = (0, 2217) (v is always zero). As a result, we can acquire the secret key s using six chosen ciphertexts. Here, we define three chosen ciphertexts (3-CC) used for clustering the coefficients of s j as follows.  (cv 0 , . . . , cv 2η 1 ) be (0, 1, −1, 2, −2, . . . , η 1 , −η 1 ), i.e., cv 0 = 0, cv 2α−1 = α, and cv 2α = −α for 1 ≤ α ≤ η 1 , and let  Tables I and II are examples used to find s 0 .
For KYBER768 and KYBER1024, s j,i ∈ {−2, −1, 0, 1, 2} because the parameter η 1 is 2 at both levels. Therefore, similar chosen ciphertexts can be used as before. Since k = 3 and k = 4 for each level, 3 × 3 = 9 and 4 × 3 = 12 chosen ciphertexts are required, respectively. However, if we additionally use the leakage that occurs at steps 8 and 10 of Listing 3, we can reduce the number of chosen ciphertexts. If s j,i = 0, then the input coefficient of Barrett reduction is always zero; otherwise, it is nonzero. Thus, a leakage difference depending on the operand value at steps 8 and 10 of Listing 3 can be used to distinguish zero from the others. Accordingly, we can distinguish s 0,i values using u = (208, 0, 0) and u = (1109, 0, 0) for KYBER768. As a result, it only needs 3 × 2 = 6 and 4 × 2 = 8 chosen ciphertexts for KYBER768 and KYBER1024, respectively.
2) Proposed Threat Model: An attacker can find the secret key s by obtaining power consumption traces according to chosen ciphertexts when message decapsulation of CRYSTALS-KYBER runs on a target device followed the Hamming weight power consumption model.

C. Attack Methodology
We target reference codes submitted to the NIST Website by developers. All reference codes were implemented based on the C language; thus, we applied the Hamming weight power consumption model, commonly supposed in software implementations. Based on the previous analysis results, we can figure out the power consumption properties of 9 and 10 steps of Listing 3 as follows.
Property 1: The power consumed in a software implementation is proportional to the Hamming weight of an intermediate value. Therefore, when the intermediate value t is 0x0000, consuming power in proportion to 0 is occurred. Whereas, when the t value is equal to -3329 = 0xf2ff, consuming power in proportion to 13 is occurred. Here, 13 is the Hamming weight of the t value when t is a 16-bit integer.
Algorithm 2 shows an attack algorithm based on the leakage that occurs at steps 9 and 10 of Listing 3. A significant difference in the performance of analysis exists, depending on the position of the attack. Therefore, specific Points of Interest (PoIs) must be found. Based on profiling, we can select the PoIs where significant variances are observed depending on secret coefficient values when using specifically chosen ciphertexts. We can identify the PoIs by calculating the sum of squared pairwise t-differences (SOST) [43] of the traces and then identifying the location of the information-leaking point. The SOST of two groups, G 1 and G 2 , is calculated as follows: E(·), σ (·), #, and g denote the mean, standard deviation, number of elements, and number of groups, respectively. Here, g is 2.
For each s j,i , we take the points where the t value is computed, stored, and loaded. We take these points p c,i , which consume power proportional to the Hamming weight of the t value, as the PoIs and sort them into two groups using a clustering algorithm. Here, we can apply various clustering algorithms, such as k-means, fuzzy k-means, and expectation-maximization (EM) [44]- [47].
By using one of these clustering algorithms, p c,i can be sorted into two groups: G 1 and G 2 . Here, G 1 and G 2 represented each clustered group. Because power consumption depends on the Hamming weight of intermediate values, the mean values of G 1 and G 2 are different. Therefore, supposing that the larger the hamming weight, the less power consumed, we can identify the corresponding t value for each group according to the mean value of the two groups. This Algorithm 2 Chosen-Ciphertext Clustering Attack on Barrett Reduction in CRYSTALS-KYBER Require: Trace sets T = (T 0 , · · · , T k−1 ) Require: Secret sequences Seq = (seq 0 , · · · , seq 2η 1 ) Require: Secret coefficient values cv = (cv 0 , · · · , cv 2η 1 ) Ensure: Secret key s = (s 0 , · · · , s k−1 ) 1: /*as many as rank*/ 2: for j = 0 up to k − 1 do end for 30: end for 31: Return (s 0 , · · · , s k−1 ) supposition depends on the structure of the measuring equipment; in this study, the supposition is established according to the structure of the ChipWhisperer-Lite main board used to obtain the power consumption of the target board [27].
Thus, for instance, when E(G 1 ) is larger than E(G 2 ), the value of t belonging to G 1 has a value of 0 and that belonging to G 2 has a value of −3329. E(G 1 ) and E(G 2 ) are the mean values of G 1 and G 2 , respectively. In Algorithm 2, ss c,i is the value for creating sequences. Therefore, it is set to 0 when t is zero; otherwise, it is set to 1. After repeating as many as the number of chosen ciphertexts, we can acquire a sequence (ss 0,i · · · ss cc−1,i ) of each coefficient s j,i . Hence, s j , the part of the secret key, can be found. As a result, by repeating as many as rank, we can acquire the secret key s.
Remark: The pqm4 library includes four schemes, namely, ref, clean, opt, and m4 [39]. The schemes ref, clean, and    opt are implemented in plain C; Listings 1-3 are all identical in ref, clean, and opt. An implementation optimized for Cortex-M4 is the m4 scheme; it is typically implemented in assembly language as described in the Appendix.

IV. EXPERIMENT RESULTS
In this section, we present experimental results that the secret key s could be recovered using six chosen ciphertexts for KYBER512 to show the proposed attack could be applied not only to theory but also to the real world. Side-channel vulnerability depends on how algorithms are   implemented. Therefore, we utilized reference codes submitted to the NIST Website by developers. All reference codes were implemented based on the C language; thus, we used the Hamming weight power consumption model, commonly supposed in software implementations. The experiments were conducted by focusing on ARM Cortex-M4 at NIST's request.   We used gcc-arm-none-eabi compiler and options -O3 and -Os, which optimize speed (High) and size, respectively.
We drew lines th1 and th2 in Figs. 4 and 5, and we marked them as 0 if the value of the y-axis in the highlighted area is bigger than th2; otherwise, we marked them as 1. Sequences denoted in Figs. 4 and 5 are the same as the sequence (0, 0, 0, 0, 1, 1, 1) obtained in accordance with the t value. Therefore, we can see that the information of the t value is leaking, and the differences in power consumption are big enough to be exploited.
To identify the PoIs, we computed the SOST values of measured power consumption traces, as shown in Figs. 6 and 9. Figs. 7(b) and 10(b) show the distributions at 195 and 387 points, respectively. The differences between E(G 1 ) and E(G 2 ) are large enough to be visually distinct, and no error rate is observed.
Figs. 8 and 11 show that power consumption traces measured using 3-CC. The magnification of the positions for each coefficient is shown in Figs. 15 and 16. We marked sequences according to the value of the y-axis; thus, they are the same as the sequence in Table I. We split power consumption traces in Figs. 8 and 11 into subtraces for each coefficient and applied min-max normalization. As a result, the secret key can be extracted with a 100% success rate using Algorithm 2 based on the EM algorithm.
As shown in Figs. 4 and 5, whether t = 0 or not can be distinguished by identifying whether power consumption trace is higher than th1 or not. Moreover, Figs. 7 and 10 show that clustering into three groups is possible; thus, distinguishing whether t = 0 or not is also possible. This reduces the number of chosen ciphertexts from three to two to recover s j of KYBER768 and KYBER1024. Accordingly, the number of chosen ciphertexts required to recover s of KYBER768 and KYBER1024 is six and eight, respectively. We split three power consumption traces into subtraces for each coefficient and applied min-max normalization. We then slightly modified steps 10-21 of Algorithm 2 to cluster into three groups. As a result, the secret key can be extracted with a 100% success rate using the EM algorithm.
Experimental Results on the m4 Scheme: We also show that the m4 implementation with Cortex-M4 specific optimizations (typically in assembly) is vulnerable to the proposed attack. Since Barrett reduction is implemented in assembly language as shown in Listings 7 and 8, we only report the experiment results for compiler option -O3.
In contrast to the ref scheme, the m4 scheme performs Barrett reduction on two coefficients simultaneously. The intermediate value tmp and tmp2 in Listing 8 are for two coefficients s 0,i and s 0,i+1 , respectively. Fig. 12 shows power consumption traces when Listing 8 is in operation. Power consumption is affected by a sequence of t values for two coefficients. For example, if s 0,i = −1 and s 0,i+1 = 3 when u = (208, 0) and v = 0, then a sequence of t values is 01. Accordingly, it can be classified into four groups according to the power consumption pattern of four clock cycles in which steps 7-10 of Listing 8 are performed. In particular, Fig. 13(b) shows that clustering into four groups is possible with no error rate. In Fig. 13(a), distributions of 01 and 10 are overlapped; thus, they would be classified into same groups. Accordingly, if we use the point 31, clustering into three groups is possible.
Figs. 14 and 17 show that power consumption traces measured using 3-CC. We marked sequences according to the patterns of the four clock cycles in which steps 7-10 of Listing 8. When rearranged into a sequence by each coefficient, they are the same as the sequence in Table I. In the m4 scheme, distinguishing when s 0,i = 0 or s 0,i+1 = 0 from other cases is difficult because two coefficients are computed simultaneously. Therefore, for KYBER768 and KYBER1024, 3 × 3 = 9 and 4 × 3 = 12 chosen ciphertexts are needed, respectively. We split three power consumption traces into subtraces for two coefficients and applied z-score normalization. As a result, the secret key can be extracted with a 100% success rate using the EM algorithm.
Remark: Because [34] and [35] did not experiment on the updated specification, accurate comparisons are not possible. However, since the noise parameter was increased [40], it is obvious that more chosen ciphertexts are needed than the number stated in [34] and [35]. Accordingly, our proposed method is much more efficient for the m4 scheme, as shown in Table III. V. COUNTERMEASURES Since the proposed attack constructs an intermediate value t, which is affected by only one coefficient of s j , and exploits it, masking [22], [50] can be secure against the proposed attack. However, substantial time and memory resources are needed because masking would not be appropriate for use in resourceconstrained IoT devices due to its high-performance overhead. Thus, using shuffling and hardware noise-addition to increase attack complexity might be a good idea. Similar to [29], shuffling can be applied by generating a shuffling index array, as shown in Listing 4. To attack the shuffling, k × 3 × 256 chosen ciphertexts are required because the target coefficient v i must  [42]). be changed. That is, it is possible to recover one coefficient at a time. Thus, if the key reuse period is properly adjusted, it can be fully responded to. Using another reduction method, such as Montgomery reduction, can also be a countermeasure. Listing 6. Reduction of CRYSTALS-KYBER (m4 scheme [42]).

VI. CONCLUSION
In this study, we proposed a chosen-ciphertext clustering attack on CRYSTALS-KYBER using sensitive variabledependent leakage of Barrett reduction. We took advantage of the fact that the intermediate value of an operation is determined to be the value of one of the three values, and the difference in the Hamming weight of the intermediate value is larger than 4. To magnify the difference in the sensitive variable-dependent leakage, we used chosen ciphertexts. As a result, we could acquire the full secret key using only six chosen ciphertexts for KYBER512. Depending on an implementation scheme, recovering the secret key of KYBER768 requires six or nine chosen ciphertexts. For KYBER1024, eight or twelve chosen ciphertexts are required depending on an implementation scheme.
Vulnerability occurred due to implementation methods that prevent timing leakage. Barrett reduction, used in CRYSTALS-KYBER, is secure against timing attack; however, it does not guarantee security against power analysis. Especially, the method applied to secure implementation against timing attacks led to a greater amount of side-channel leakage. Therefore, research should be conducted on how to avoid such leakage. Listing 7. Barrett reduction of CRYSTALS-KYBER (m4 scheme [42]). APPENDIX pqm4: TESTING AND BENCHMARKING NIST PQC ON ARM CORTEX-M4 [41] Listings 5-8 are codes of m4 schemes submitted to NIST [42]. Barrett reduction is implemented in assembly Listing 8. Barrett reduction of CRYSTALS-KYBER (m4 scheme [42]). language and performs on two coefficients simultaneously. This is because Cortex-M4 implements the ARMv7E-M architecture and offers single instruction multiple data (SIMD) instructions.