Modified pqsigRM: RM Code-Based Signature Scheme

We present a novel code-based signature scheme called modified pqsigRM. This scheme is based on a modified Reed–Muller (RM) code, which reduces the signing complexity and key size compared with existing code-based signature schemes. In fact, it strengthens pqsigRM submitted to NIST for post-quantum cryptography standardization. The proposed scheme has the advantage of the pqsigRM decoder and uses public codes that are more difficult to distinguish from random codes. We use $(U,U+V)$ -codes with the high-dimensional hull to overcome the disadvantages of code-based schemes. The proposed decoder samples from coset elements with small Hamming weight for any given syndrome and efficiently finds such an element. Using a modified RM code, the proposed signature scheme resists various known attacks on RM-code-based cryptography. For 128 bits of classical security, the signature size is 4096 bits, and the public key size is less than 1 MB.


I. INTRODUCTION
Recently, code-based cryptographic algorithms have been extensively studied in post-quantum cryptography (PQC). Code-based cryptography is based on the syndrome decoding problem and its variants. The syndrome decoding problem is to find a vector e satisfying He T = s T and wt(e) ≤ w, where H is a parity check matrix of a random (n, k) code, s is a random syndrome vector, w is a small value, and wt(e) denotes the Hamming weight of a vector e. Berlekamp and McEliece first proved the hardness of the syndrome decoding problem [19] and McEliece proposed a cryptosystem based on Goppa codes [22].
Courtois, Finiasz, and Sendrier proposed the CFS signature scheme [2], which is a code-based signature scheme using a full-domain hash (FDH) approach. In this scheme, t! hashes, and decodings are required on average to sign a message when an (n, k) Goppa code with error correction capability t is used. It is proposed to use high-rate Goppa codes, which The associate editor coordinating the review of this manuscript and approving it for publication was Sedat Akleylek .
have relatively small error correction capability t = n−k log n , to reduce the signing time. Therefore, it has a large signing complexity and certain drawbacks in terms of parameter scaling. Moreover, it has been shown in [4] that high-rate Goppa codes can be distinguished from random codes. This falsifies the assumption of existential unforgeability under a chosen message attack (EUF-CMA) security proof in [17], which is based on the indistinguishability of Goppa codes. Although Morozov et al. claimed to have proved the strong EUF-CMA security of the CFS signature scheme without the indistinguishability of Goppa codes [18], the large key size and expensive signing remain as drawbacks.
There are several variants of the CFS signature scheme, such as signature schemes using LDGM codes [7] and blockwise-triangular secret key [9]. To find a signature with small Hamming weight, the scheme in [7] uses a sparse coset element added to a codeword with small Hamming weight. Even though this is efficient and has a small key size, an attack algorithm was presented in [6]. An attack algorithm for the signature scheme using a blockwise-triangular secret key was also proposed [8].
The Kabatianskii-Krouk-Smeets (KKS) signature scheme [30] and its variants [31], [32] take a different approach than CFS signature scheme. However, owing to the attack proposed in [36], these are considered (at best) to be one-time signature schemes. Moreover, from the attacks in [37], it is known that the parameters in the KKS scheme and its variants should be carefully chosen.
SURF is a variant of CFS signature scheme using (U , U + V )-codes [29]. SURF uses (n, k U + k V ) binary codes defined by {(u|u + v)|u ∈ U , v ∈ V }, where U and V are (n/2, k U ) and (n/2, k V ) random binary codes, respectively. A variant of the Prange decoder is applied to SURF to find an error vector with a small Hamming weight. The security of SURF is based on the decoding-one-out-of-many (DOOM) problem, in which a solution for the syndrome decoding problem is sought in the presence of several syndromes. Unfortunately, as it has been demonstrated that the hull of any (U , U + V )code is highly probable to be a two-repetition code when U and V are random binary codes [29], the hull of the public key can be used for key attacks on SURF. In the recently proposed signature scheme, Wave [35], the generalized ternary (U , U + V )-codes are used instead of binary codes as they efficiently resist the hull attack in [29]. Moreover, finding errors with large Hamming weight for the given syndrome allows small parameters. A tighter security reduction using rejection sampling and preimage samplable functions [34] was proved in [35].
In this paper, a new code-based signature scheme using binary codes with a (U , U + V )-code as its subcode is proposed. For two linear codes C 1 and C 2 , C 2 is called a subcode of C 1 if all codewords in C 2 are in C 1 . The subcode used in the proposed signature scheme is a binary (U , U + V )-code, where U and V are obtained by modifying the RM codes. We design V and U ⊥ to have a sufficient number of common codewords, where U ⊥ denotes the dual code of U . Using the relationships between U and V , it is shown that the proposed signature scheme resists the attack for (U , U + V )-codes in [29]. Further, an efficient and randomized decoding algorithm is proposed. This algorithm makes it possible to reduce the key size and signature length. As the codes in the proposed signature scheme are a modification of RM codes, the decoding algorithm makes use of the recursive structure. The proposed signature scheme is an improvement of pqsigRM [1] submitted to NIST for PQC standardization, and it resolves the weaknesses of early versions of pqsigRM by modifying the public code. Moreover, we ensure the distinguishability of the public code of the proposed signature scheme.
The rest of this paper is organized as follows. In Section II, we discuss FDH code-based signature schemes and RM codes. A new code-based signature scheme, called modified pqsigRM using modified RM code is proposed in Section III. In Section IV, the security of the proposed signature scheme is analyzed, and it is proved that the signature scheme is EUF-CMA secure. The proof is based on two ad-hoc problems and the assumption that these are hard.
The two problems are analyzed in Section V. Considering state-of-the-art attacks, we suggest security parameters in Section VI. The paper is concluded in Section VII.

A. BASIC NOTATION
A Vector is denoted in boldface in the form of a column vector. (x 0 |x 1 ) denotes the concatenation of two vectors x 0 and x 1 . For example, h(m|r) means the hash function h with input (m|r), where (m|r) represents the concatenation of binary representation of vector m and a random value r. Matrices are denoted by a boldfaced capital letter, for example, A. Matrix multiplication is denoted by · or can be omitted when it is unnecessary. Codes and probability distributions are denoted in calligraphic fonts, for example C, and it can be distinguished by context. x σ denotes that a vector x is permuted by a permutation σ , for example,

B. CFS SIGNATURE SCHEME
CFS signature scheme is an algorithm that applies the FDH methodology to the Niederreiter cryptosystem. The CFS signature scheme is based on Goppa codes, as McEliece cryptosystem. A summary of CFS signature scheme is given in Algorithm 1.
As described in Algorithm 1, the signing process iterates until a decodable syndrome is obtained. The probability that a given random syndrome can be decoded is Hence, the error correction capability t = n−k log n should be sufficiently small to reduce the number of iterations. Thus, the high-rate Goppa codes should be used. Regarding the key size, the complexity of the decoding attack on the CFS signature scheme is known to be a small power of the key size, namely, ≈ keysize t/2 . Hence, the key size should be fairly large to meet a certain security level. In summary, the CFS signature scheme is insecure and inefficient owing to the use of Goppa codes. [24]. and its decoding algorithm, so-called recursive decoding, was proposed in [10]. There are various definitions of RM codes, but we adopt a recursive definition here as recursive decoding is defined by using this structure. An RM code RM (r,m) is a linear binary (n = 2 m , k = 2 . This is the well-known Plotkin's construction, and its generator matrix is given by

RM codes were introduced by Muller [23] and Reed
where G (r,m) is the generator matrix of RM (r,m) . Recursive decoding is a soft-decision decoding algorithm that depends on the recursive structure of the RM codes; VOLUME 8, 2020 it is described in detail in Algorithm 2, where y · y denotes the component-wise multiplication of the vectors y and y . In recursive decoding, a binary symbol a ∈ {0, 1} is mapped onto (−1) a , and it is assumed that all codewords belong to {−1, 1} n .
First, y (the second half of the received vector y) is component-wisely multiplied by y (the first half of the received vector). Then, a codeword from RM (r,m−1) (i.e., u) is removed from y as it is both in y and y , and then only v and the error vector remain. This is regarded as a codeword of RM (r−1,m−1) added to an error vector and is referred to asv. Usingv, we can remove the codeword of RM (r−1,m−1) from the second half of the received vector. y is then added to y ·v, and the sum is divided by 2. This is regarded as a codeword of RM (r,m−1) added to the error vector, and then decoding is performed. Recursively, the received vector is further divided into sub-vectors of length n/4, n/8, etc. Finally, we reach RM (m,m) or RM (0,m) , then the division terminates and the minimum distance (MD) decoding of RM (m,m) or RM (0,m) , which is trivial, is performed. The decoding for the entire code is performed by reconstructing these results into (U , U + V ) form.

III. MODIFIED REED-MULLER CODES AND PROPOSED SIGNATURE SCHEME
In this section, we propose new codes, their decoder, and a signature scheme that uses these codes and decoders. The proposed code essentially has a (U , U + V )-code as its subcode, and recursively, U and V are also (U , U + V )-codes. This recursive structure allows the decoding of any given vector Algorithm 2 Recursive Decoding of RM Code [10] function RecursiveDecoding(y, r, m) if r = 0 then Perform MD decoding on RM(0, m) else if r = m then Perform MD decoding on RM(r, r) else (y |y ) ← y Then, we can find an error vector with small Hamming weight for any given syndrome corresponding to the received vector. Starting from (U , U + V )-codes, we replace certain rows and append random rows on the generator matrix of (U , U + V )-codes. Thus, these codes are no longer (U , U + V )-codes. However, they have a (U , U + V )-subcode and can use the decoder for (U , U + V )-codes.

A. PARTIAL PERMUTATION OF GENERATOR MATRIX AND MODIFIED REED-MULLER CODES
New codes named modified RM codes are defined in this section. We first present the core of the proposed codes, which is a (U , U +V )-code. Subsequently, we describe which rows are replaced or appended to the generator matrix. The rationale for these operations is provided in Section V.
For a code C, we define its hull by the intersection of the code and its dual, in other words, hull , hull(C) has only (u|u) codewords, and this may reveal the secret key. To avoid this, the proposed code is designed so that dim(U ⊥ ∩ V ) is large. For convenience, we focus on the generator matrix. First, we construct the generator matrix G (r,m) of an RM code and then permute its submatrices. An example is shown in Figure 1, where σ 1 p and σ 2 p denote two independent partial permutations that randomly permute only p out of n/4 columns. As will be explained in Section VI-B, p is related to the decoding performance. To generate σ 1 p and σ 2 p , p column indices are randomly selected from the index set {0, 1, . . . , n/4 − 1}, and the selected indices are randomly permuted, whereas the others are not. Then, σ 1 p is used to permute the submatrices corresponding to G (r,m−2) 's in the first dim(RM (r,m−2) ) rows, and σ 2 p is used to permute the submatrix corresponding to G (r−2,m−2) in the last dim(RM (r−2,m−2) ) rows, as shown in Figure 1. The codes generated by the generator matrix in Figure 1 are called partially permuted RM codes. It should be noted that, unlike in the case of code-based cryptographic algorithms, we permute submatrices of the generator matrix rather than the entire matrix here. We note that the entire matrix should also be permuted to design a signature scheme. This will be discussed on the key generation in Section III-C.
dim(U ⊥ ∩V ) is large for the following reasons. Let G U and G V denote the generator matrices of U and V , respectively: .
Then, the generator matrix of the dual code of U is . Thus, U ⊥ ∩ V has a subcode that is the intersection of the codewords generated by G (r−1,m−2) G (r−1,m−2) and the codewords generated by G ⊥ (r−1,m−2) G ⊥ (r−1,m−2) . Its dimension is min(dim(RM (r−1,m−2) , dim(RM (m−r−2,m−2) )), as the dual of RM (r,m) is equal to RM (m−r−1,m) and With the partially permuted RM codes, the received vector and the syndrome have the same parity, causing the signature leak. Thus, the generator matrix in Figure 1 should be further modified. That is, some rows are replaced with repetitions of random codewords and random rows are appended to the generator matrix. Considering G U , it is also an (U , U + V )code, which can similarly be divided into (permuted) (U , U + V )-codes. By repeating this process 2 m−r times, the rows of the partially permuted RM code consist of the 2 m−r repeated generator matrices of RM (r,r) , which are 2 r × 2 r identity matrices. Then, RM (r,r) is replaced by a repeated random (2 r , k rep ) code such that its dual code has at least one non-zero codeword with odd Hamming weight.
We now append random independent rows to the generator matrix. One row to be appended is a random codeword of the dual code. This should be independent of the existing rows; i.e., it should not belong to the hull of the code. Furthermore, it should be verified that the hull has codewords with Hamming weight that is not a multiple of four as a result of appending this row. The others are k app random independent vectors including at least one vector of odd Hamming weight. These k app vectors are independent of the partially permuted RM codes and independent of each other.
After all these modifications, the resulting code is called a modified RM code. An example of its generator matrix is given in Figure 2.

B. DECODING OF MODIFIED REED-MULLER CODES
Unlike the Niederreiter cryptosystem and CFS signature scheme, it is required to find an error vector whose Hamming weight is greater than the error correction capability. Hence, there may exist several solutions e satisfying He T = s T and wt(e) ≤ w for a given syndrome s. Such decoding can be achieved by the modified Prange decoder using the (U , U + V ) structure, as in the signature schemes in [29], [35]. However, in this section, a new decoder is proposed that uses the recursive structure of the subcode of modified RM codes and it achieves better performance than the modified Prange decoder. In other words, it finds error vectors whose Hamming weights are less than the result in [29]. This results in the smaller parameters, considering attacks as in [28].
In addition to the decoding performance, a major difference between the proposed decoder and the modified Prange decoder is their input. The input of the modified Prange decoder used in [35] and [29] is a syndrome vector. In contrast, the input of the proposed decoder is an n dimensional vector r satisfying Hr T = s, which is called received vector in coding theory, and the decoder outputs codewords close to the received vector. An error vector with a small Hamming weight is obtained by subtracting the output from the received vector. Even if two different received vectors in the same coset are given, the proposed decoder can return different outputs. Besides, as the input of the decoder is a random received vector, decoding can be performed even if random rows are appended to the generator matrix.
As stated in the previous section, random rows (one from the dual code and the others being k app independent random vectors) are appended to the generator matrix of the partially VOLUME 8, 2020 permuted RM codes. Let C app be the code spanned by the added k app + 1 rows. The number of codewords increases by 2 k app +1 times when rows are appended by adding codewords of C app to each (U , U + V )-codeword. Choosing a codeword of C app (including 0), subtracting it from the received vector r, decoding it, and adding the subtracted codewords back is the decoding process when rows are appended. Thus, the code is decodable even if arbitrary random codes are appended to its generator matrix.
Hence, it suffices to explain the decoding algorithm for the (U , U + V )-subcode of a modified RM code. This decoding basically follows the recursive decoding of RM codes [10]. The difference is the partial permutation and the replacement of RM (r,r) . Considering the decoding proposed in [10], we have c = (u|u + v) for all c ∈ RM (r,m) , where u ∈ RM (r,m−1) and v ∈ RM (r−1,m−1) . RM (r,m−1) and RM (r−1,m−1) are also (U , U + V )-codes, except for r = 0 or r = m. Here, if the code corresponding to u or v is replaced with a code other than the RM code and the decoding of the replaced code can be performed appropriately, the entire code c can also be decoded [15].
When the subcode of the RM code is replaced with its permutation, the entire code can also be decoded by slightly modifying the recursive decoding. Moreover, no decoding failure occurs because the recursion eventually reaches RM (0,m ) , RM (r ,r ) , or the (2 r , k rep ) code to replace RM (r,r) and there exists polynomial-time MD decoder for these codes. Even the (2 r , k rep ) random code is MD decodable in constant time because it is a small code. To handle partial permutations, when the code is decodable, it uses the fact that the permutation is always decodable if the permutation is known. Depermutation and decoding followed by permutation is the decoding process for permuted codes.
In general, the output distribution of decoding is crucial for security. Thus, we also propose a randomized decoding method, the output of which is almost uniformly distributed. Using the algorithm described above, a random decoder can easily be designed. Algorithm 3 summarizes the randomized decoding. It is easy to find a received vector (regardless of its Hamming weight) for any given syndrome; a coset element corresponding to the syndrome is randomly selected. This is given to the decoder as an input. Finally, the decoder finds a different error vector with a small Hamming weight for different inputs.

C. PROPOSED SIGNATURE SCHEME
Herein, the proposed modified pqsigRM signature scheme using the codes in the previous section is presented. Its decoding algorithm is presented in Section III-B.

1) KEY GENERATION
Let G be the generator matrix of a modified (n, k) RM code, and H be the parity check matrix. Let S be an (n − k) × (n−k) random non-singular matrix and Q be an n×n random permutation matrix. Then, the public key is H = SHQ, and the secret keys are H, S, and Q.
Output y σ end function *σ is σ 1 p or σ 2 p for permuted block and identity, otherwise.

2) SIGNING
To sign a given message m, we randomly select a coin i from Our goal is to find the error vector e satisfying H e T = SHQe T = s. Let s = S −1 m. Performing the decoding as in Algorithm 3, we find an error vector e satisfying He T = s . If wt(e ) ≤ w, we compute e T = Q −1 e T , and the signature is then given as (m, e, i).

3) VERIFICATION
If wt(e) ≤ w and H e T = h(h(m|H )|i), we return ACCEPT; otherwise, we return REJECT.
The key generation, signing, and verification processes are summarized in Algorithm 4. For simplicity, let H represent all the secrets such as partial permutations σ 1 p and σ 2 p , appended rows, and replaced codes. It should be noted that in the signing process, we choose a random coset element and perform ModDec(·). As ModDec(·) returns different outputs for different inputs even in the same coset, we can achieve randomized decoding. The output distribution of this randomized decoding output is analyzed in Section V. We add a salt λ 0 to obtain a tight security proof.

IV. SECURITY ANALYSIS OF MODIFIED pqsigRM
In this section, the security of the proposed modified pqsi-gRM will be analyzed. We will consider the best-known algorithms for solving DOOM. Thereafter, we will discuss the resistance of the proposed signature scheme against key substitution attacks. Finally, it will be proved that the modified pqsigRM is EUF-CMA secure.
As the public key of the proposed signature scheme is a modification of an RM code, one may consider key recovery attacks on RM codes, such as Minder and Shokrollahi [13] and Chizhov and Borodin [12] attacks, as well as square code attacks [11]. However, owing to the partial permutation as well as the appending and replacement of codewords in the generator matrix, these attacks cannot be adopted here. Table 1 shows the comparison between the proposed modified pqsigRM and the original pqsigRM.

A. DECODING ONE OUT OF MANY
Information set decoding is a brute-force attack method that finds an error vector e such that He T = s and wt(e) ≤ w, where Stern improved the attack complexity in [14]. It has been extensively studied, and Dumer's algorithm [38] as well as more involved variants in [39], [40] have been proposed.
In the variants of the CFS signature scheme, there are several hash queries. Therefore, to launch a forgery attack, it suffices to find an error vector with small Hamming weight for any of the syndromes. Hence, the decoding problem DOOM given below is adequate for tight security proof. The usual FDH proof for existential forgery using syndrome decoding would require a work factor ≥ q H · 2 λ , where q H ≤ 2 λ is the number of hash queries. However, with DOOM, the work factor is required to be ≥ 2 λ . Although the work factor of DOOM is greater than that of syndrome decoding, it provides tighter bounds for security.
Problem 1 (DOOM): of an (n, k) linear code, syndromes s 1 , s 2 , · · · , s q ∈ F n−k 2 , and an integer w. Output: (e, i) ∈ F n 2 × [1, q] such that wt(e) ≤ w and He T = s T i . We consider the case in which the adversary has q instances and M = max (1, n w /2 n−k ) solutions for each instance. Of course, in our case, w is not small, and thus M is n w /2 n−k . In [28], the work factor of solving DOOM is given as is the complexity of solving the DOOM problem using Dumer's algorithm and is the success probability. This work factor is the reference for choosing the parameters of the signature scheme. Although advanced algorithms for information set decoding can be adapted to DOOM to reduce complexity, this has not yet been conducted. The proposed signature scheme is designed to use codes with a high-dimensional hull. Hence, the attacker can exploit this. However, to our knowledge, there is no algorithm for information set decoding or DOOM that considers this.

B. SECURITY AGAINST KEY SUBSTITUTION ATTACKS
In a key substitution attack, the adversary attempts to find a valid key that is different from the correct key and can be used for signature verification. If the adversary knows the secret key and the public key corresponding to a message-signature pair, we have a weak-key substitution attack, whereas if the adversary knows only the public key, we have a strong-key substitution attack. Both polynomial-time weakand strong-key substitution attacks on the CFS signature scheme were proposed in [21]. A modification of the CFS scheme that resists such attacks was also proposed in [21]. In this modification, the syndrome s is generated by hashing the message, counter, and public key, rather than hashing only the message and counter. It has been demonstrated that this modified CFS signature scheme is secure against key substitution attacks [18]. In the modified pqsigRM, the syndrome is given as s = h(h(m|H )|i), and thus it is also secure against key substitution attacks.

C. EUF-CMA SECURITY
Here, we prove the EUF-CMA security of the modified pqsi-gRM. The methods presented below are adapted from the EUF-CMA security proof of SURF and Wave [29], [35]. It should be noted that although a key attack for SURF is presented in [29], its proof technique is valid and generally applicable. The proof is essentially the same except for the code used for the key and the decoding algorithm for signing.

1) BASIC TECHNIQUES FOR EUF-CMA SECURITY PROOF
EUF-CMA is a widely used attack model against signature schemes. In the security reduction task, EUF-CMA is viewed as a game played between an adversary and a challenger. The public key PK , hash oracle H, and signing oracle are given to a (t, q H , q , )-adversary A, where A can query at most q H hash values and q signatures for inputs of its own choice. Within a maximum computation time t, A attempts to find a valid message-signature pair (m * , σ * ). A wins the game if Verifying(m * , σ * , PK ) = 1 and σ * has not been provided by ; otherwise, the challenger wins the game. The winning probability of the (t, q H , q , )-adversary is at least .
Definition 1 (EUF-CMA Security): Let S be a signature scheme. We define the EUF-CMA success probability against S as The signature scheme S is called (t, q H , q )-secure in EUF-CMA if the above success probability is a negligible function of the security parameter λ.
We use the statistical and computational distance as basic metrics.
Definition 2 (Statistical Distance): The statistical distance between two discrete probability distributions D 0 and D 1 over the same space E is defined as Proposition 1 [29]: Let (D 0 1 , . . . , D 0 n ) and (D 1 1 , . . . , D 1 n ) be two n-tuples of discrete probability distributions over the same space. For all n ≥ 0, we have

Definition 3 (Computational Distance and Indistinguishability): The computational distance between two distributions
where |A| denotes the running time of A, and Adv D 0 ,D 1 is the advantage of distinguisher A, which returns b ∈ {0, 1} against D 0 and D 1 : The EUF-CMA security of the modified pqsigRM is reduced to the modified RM code distinguishing problem and DOOM with high-dimensional hull, which are defined as follows. We assume here that the probability is negligible (as a function of λ) for the parameters given in Table 2. We will discuss these problems in greater detail in Section V. It is worth noting that there are sufficiently many codes with high-dimensional hull for the parameters given in Tables 2 and 4 [25].

2) PROOF OF EUF-CMA SECURITY
Let S pqsigRM denote the proposed modified pqsigRM. The following definitions as well as the theorem and its proof are adopted from those in [29], [35].
Definition 5 (Challenger Procedures in the EUF-CMA Game): The challenger procedures in the EUF-CMA game corresponding to S pqsigRM are defined as follows: We note that the procedures in Definition 5 simplify Algorithm 4. We can now modify the security reduction in [29], [35] and prove the EUF-CMA security of the modified pqsigRM as follows.
Theorem 1 (Security Reduction): Let Succ EUF−CMA S pqsigRM (t, q H , q ) be the success probability of the EUF-CMA game corresponding to S pqsigRM for time t when the number of queries to the hash oracle (resp. signing oracle) is q H (resp. q ). Then, in the random oracle model, we have for all t where t c = t + O(q H · n 2 ), D H w is the distribution of the syndromes H e T when e is drawn uniformly from the binary vectors of weight w, U s is the uniform distribution over F n−k 2 , D w is the distribution of the decoding result of Algorithm 3, U w is the uniform distribution over the binary vectors of weight w, D rand is the uniform distribution over the random codes with high-dimensional hull, and D pub is the uniform distribution over the public keys of modified pqsigRM.
Proof: Let A be a (t, q H , q , )-adversary against S pqsigRM , and let (H 0 , s 1 , . . . , s q H ) be a random instance of DOOM with high-dimensional hull for the parameters n, k, q H , and w. We stress that s 1 , . . . , s q H are random independent vectors of F n−k 2 . Let P(S i ) denote the probability that A wins Game i.
Game 0 is the EUF-CMA game for S pqsigRM . Game 1 is the same as Game 0 except for the following failure event F: There is a collision in a signature query. From the difference lemma in [41], we have P(S 1 ) ≤ P(S 0 ) + P(F). (1) The following lemma is from [35]. Lemma 2: For λ 0 = λ + 2 log 2 (q H ), we have P(F) ≤ 1 λ . Game 2 is obtained from Game 1 by changing Hash and Sign as follows, where S w denotes the set of vectors with Hamming weight w in F n 2 : Index j is initialized to 0 in the Init procedure. We introduce the list L m , which contains q H random elements of F The statistical distance between the syndromes generated by matrix H and the uniform distribution over F n−k 2 is ρ(D H w , U s ). This is the difference between Hash in Game 1 and Game 2 when i ∈ L m . There are at most q H such instances. Thus, by Proposition 1, it follows that Game 3 is obtained from Game 2 by replacing Decode with e m,i in Sign procedure as follows: e is drawn according to the proposed decoding algorithm Decode in Game 2, whereas it is now drawn according to the uniform distribution U w . By Proposition 1, we have Game 4 is the game in which H is replaced with H 0 . This implies that the adversary is forced to construct a solution for DOOM with high-dimensional hull. Here, if a difference between Game 3 and Game 4 is detected, then this yields a distinguisher between D pub and D rand . According to [29], the cost to call Hash does not exceed O(n 2 ), and thus the running VOLUME 8, 2020 time of the challenger is t c = t + O(q H · n 2 ). Therefore, we have Game 5 is modified in Finalize. The success of Game 5 implies i / ∈ L m and the success of Game 4. A valid forgery m * has never been queried by Sign, and the adversary has never accessed L m * . As there are q signing queries, we have Moreover, (1 − 2 λ 0 ) q ≥ 1 2 because we assumed λ 0 = λ + 2 log 2 (q ). Thus, this can be simplified to P(S 5 ) is the probability that A returns a solution for DOOM with high-dimensional hull, which yields Combining (1)-(6) concludes the proof.

V. INDISTINGUISHABILITY OF CODE AND SIGNATURE IN THE PROPOSED SCHEME
It is challenging to prove the hardness of distinguishing a public code of a code-based cryptographic algorithm from a random code. As it is difficult to prove the hardness of distinguishing the public code from a random code, several cryptographic algorithms are designed by assuming it. In this section, we will consider possible attack algorithms and consider the difficulty of distinguishing the public code and signatures. Moreover, the difficulty of distinguishing signatures from random errors is also analyzed.

A. MODIFICATIONS OF PUBLIC CODE
For successful decoding of any received vector, a (U , U +V )code should be used in the modified RM codes. To resist the attack on (U , U + V )-codes proposed in [29], we design a code with high-dimensional hull. Generally, the expected dimension of the hull of a random code is O(1), which is smaller than d with probability ≥ 1 − O(d) [25]. This is a difference between random and public codes. However, there is currently no algorithm for solving the syndrome decoding problem by taking advantage of the hull. We consider that a high-dimensional hull is not a significant drawback unless the hull has a certain structure that may reveal the secret. Moreover, in [25], it is demonstrated that there are a large number of codes with the high-dimensional hull. Hence, we can expect the one-wayness of DOOM with the high-dimensional hull as in Definition 4. Cryptanalysis using hulls is widely used in code-based cryptography. However, this is valid if the hull has a specific structure that allows information leakage about the secret key. Therefore, using only the fact that the dimension of the hull is large, it is difficult to distinguish whether the code is public or random code with the high-dimensional hull.
The EUF-CMA security proof requires the indistinguishability between public and random codes, i.e., ρ c (D pub , D rand )(t c ) should be negligible. We will discuss the design methodology and how these modifications can ensure indistinguishability.
Considering the key recovery attack in [29], a (U , U + V )code used in code-based crypto-algorithms should have a high-dimensional hull for security. Even though the public code of the proposed signature scheme is not a (U , U + V )code, it should contain a (U , U + V ) subcode for efficient decoding.
The attack on SURF in [29] uses the fact that for any (U , U + V )-code, the hull of the public code is highly probable to have a (u|u) structure when . This (u|u) reveals information about the secret permutation Q and enables the attacker to locate the U and U + V codes. To avoid this, we should maintain the high dimension of U ⊥ ∩ V , implying that the public code should have a high-dimensional hull. Hence, we define DOOM with high-dimensional hull and assume that the public code of pqsigRM is indistinguishable from a random code with a hull of the same dimension as that of the public code, rather than any random linear code.
Moreover, k app random rows are appended to the generator matrix, and 2 r rows of the generator matrix, that is, the repeated RM (r,r) , are replaced by k rep random rows; furthermore, a codeword from the dual code is appended to the generator matrix. These modifications are equivalent to increasing the dimension of the code itself, the hull, and the dual of the code, respectively, by appending random codewords. Moreover, by adding random codewords, the code is no longer a (U , U + V )-code, and thus distinguishing attacks are more difficult to perform.
We now explain the rationale for the aforementioned modifications, which are applied in addition to partial permutation.

1) k app RANDOM ROWS ARE APPENDED TO THE GENERATOR MATRIX
The Hamming weights of a random code are distributed. However, the partially permuted RM code has only codewords with even Hamming weight. This is because the Hamming weights of codewords of RM (r,m) are even numbers, and partial permutations do not affect parity.
By appending a random row with odd Hamming weight to the generator matrix, the Hamming weights of the public code become distributed binomially. The problem is that if only one row with odd Hamming weight is appended, it can easily be extracted. This can be resolved by appending more than one codeword. Hence, we append k app random rows such that at least one has odd Hamming weight. By the nature of the decoding process, it is still possible to decode the resulting code.

2) APPENDING A RANDOM CODEWORD OF THE DUAL CODE TO THE GENERATOR MATRIX
The Hamming weights of the codewords in the hull of the partially permuted RM code are only multiples of four. However, the Hamming weight of the codewords in the hull of a random code may be an arbitrary even number, not only a multiple of four. As in the previous modification, a random codeword is appended to the hull. Thereby, we force the codewords of the hull of the public code to have arbitrary even Hamming weights. As a randomly appended row to the generator matrix is unlikely to be appended to its hull, appending a codeword to the hull is more complicated. The following is the process for appending a random codeword to the hull.
Let hull(C) be the hull of a code C. We define C and C by C = hull(C) + C and C ⊥ = hull(C) + C , where hull(C), C , and C are linearly independent. We can then generate a code with a hull with dimension dim(hull(C)) + 1 by the following procedure: i) Find a codeword c dual ∈ C such that c dual · c dual = 0. This is easy because a codeword with even Hamming weight satisfies it. ii) Let we have c dual ∈ C ⊥ inc , where for a vector x and a set of vectors A, x · A is the set of all inner products of x and elements of A. iv) It can be seen that C inc ∩ C ⊥ inc = (hull(C) + {c dual }). Hence, C inc is a code that has a hull of which dimension is dim(hull(C)) + 1.
If the Hamming weights of the codewords of the hull are only multiples of 4, then another c dual is selected, and the above process is repeated.
3) REPEATED RM (r,r) IS REPLACED WITH RANDOM (2 r , k rep ) CODES We note that by replacing repeated RM (r,r) by random (2 r , k rep ) codes, the dimension of the code is reduced by 2 r − k rep ; this is equivalent to appending 2 r −k rep rows to the parity check matrix. The codewords of the dual code of the partially permuted RM code have only codewords of even Hamming weight owing to a subcode of the partially permuted RM code. This can be resolved by replacing this subcode with another random code such that its MD decoder exists. The partially permuted RM code includes (RM (r,r) | . . . |RM (r,r) ), and the dual code of this has only codewords of even Hamming weight by the proposition below. It is easy to verify that the dual code of the partially permuted RM code is a subset of the dual code of (RM (r,r) | . . . |RM (r,r) ). That is, (RM (r,r) | . . . |RM (r,r) ) causes the dual code of the partially permuted RM code to have only codewords of even Hamming weight.
Proposition 2: Let C be a code such that its dual code has only codewords of even Hamming weight. Then, the dual of the concatenated code, {(c|c)|c ∈ C}, has only codewords of even Hamming weight.
Proof: Let h ∈ (C|C) ⊥ , where C is an (n, k) code and C|C is a concatenated code given as {(c|c)|c ∈ C}. We define vectors h 1 and h 2 of length n so that h = (h 1 |h 2 ).
This implies that h 1 = h 2 . Hence, wt(h) is even. By replacing the repeated RM (r,r) with a random code such that its dual code has codewords of odd Hamming weight, we can force the dual of the public code to have codewords with odd Hamming weight.
Clearly, the dual code of RM (r,r) is {0}. We replace RM (r,r) with a random (2 r , k rep ) code. We note that the dual code of this (2 r , k rep ) code must have codewords with odd Hamming weight. The generator matrix is modified in this manner, rather than by appending rows to the parity check matrix, to ensure that the entire code is decodable.

B. PUBLIC CODE INDISTINGUISHABILITY
In the EUF-CMA security proof, ρ c (D pub , D rand ) is required to be negligible, that is, the modified RM code distinguishing problem should be hard. As it is challenging to find the computational distance between public and random codes, in this section, we study the randomness of the public code and consider possible attacks.

1) PUBLIC CODE IS NOT A (U, U + V )-CODE
After random rows have been appended to the generator matrix of a (U , U + V )-code, the resulting code is unlikely to be a (U , U +V )-code. Considering the following proposition, it can be seen that with probability O(2 k U −n/2 ), a (U , U + V )-code remains a (U , U + V )-code after a row has been appended to its generator matrix.
It is expected that attacking the modified RM code is difficult because the appended codewords change the algebraic structure of the code (i.e., the (U , U + V ) structure), there is considerable randomness, and there is currently no recovery algorithm.

2) DISTINGUISHING USING HULL
When a random row is appended to the generator matrix, it is unlikely to be included in the hull. To achieve this, the appended row should be a codeword of the dual code, and its square should be zero. Hence, we append a codeword from the dual code to the generator matrix.
The appended row can be omitted when the attacker collects several independent codewords with Hamming weight 4 from the hull. However, for any random code with a high-dimensional hull, the same process can be applied, and finally, there only remain codewords of which the Hamming weight is a multiple of 4. Hence, this is not a valid distinguishing attack.
The hull of a random (U , U + V )-code is {0} when k U < k V and is highly probable to have codewords of (u|u) form when k U ≥ k V . However, the hull of an RM code is also an RM code, and in our case, the partial permutation randomizes its hull and retains its large dimension. As shown in Section VI, the hull is neither a subcode of the RM code nor a (U , U + V )-code. Moreover, most of the hull depends on the secret partial permutations σ 1 p and σ 2 p .

C. SIGNATURE LEAKS
In the EUF-CMA security proof, it is required that ρ(D w , U w ) is a negligible function of the security parameter λ. If this is true, then the signature does not leak information. In several signature schemes, such as Durandal, SURF, and Wave, this is achieved and proved. In SURF and Wave, the rejection sampling method is applied to render D w indistinguishable.
To apply rejection sampling, the distribution of the decoding output should be known. In SURF and Wave, a simple and efficient decoding algorithm is used, and thus it is easy to find the distribution of the decoding output. However, in our case, the decoding output exhibits a high degree of randomness, and the structure of the decoder is complex. Therefore, it is difficult to analyze the distribution of the decoding output. Instead, we conduct a proof-of-concept implementation of the modified pqsigRM using SageMath. Then, we perform statistical randomness tests under NIST SP 800-22 [42] on the decoding output, and we compare the results with random errors in F n 2 with Hamming weight w. No significant difference is observed. However, it should be noted that the success of a statistical randomness test does not imply indistinguishability. Thus, the indistinguishability of the signature should be rigorously studied as future work.

VI. PARAMETER SELECTION A. PARAMETER SETS
The constraint here is that n is a power of two. We can numerically find the feasible ranges of w once n and k are determined. If the security level λ is achieved in this range, we accept the value; otherwise, we increase n. Considering DOOM, a smaller value of w implies higher security. If w is so small that a large number of decoding iterations are required, we could reduce the partial permutation parameter p. p is at most n/4, and the characteristics of the codes are retained by lowering p to a certain degree. The method for obtaining the minimum values is described in the following subsection. The discussed state-of-the-art algorithm for DOOM is used as a basis for the parameters proposed in Table 2. We set k app = 2 (the minimum value) and k rep = 2 r −2 (the maximum value). Regarding the key size, the public key is a parity check matrix given in the systematic form and requires (n−k)n bits. The secret key does not include a scrambler matrix S because it can be obtained from H , Q, and H. Moreover H can be represented by σ 1 p , σ 2 p , replacing code, and appending rows. The comparison of parameter sets is given in Table 3.
The key size of the proposed modified pqsigRM is small compared to other algorithms. We note that it is for reference only, and the actual parameter size is given variously along with trade-off with signing complexity, etc. The security level in parallel-CFS is based on the generalized birthday algorithm [5], and the distinguisher for high-rate Goppa code [4] is not considered. For detailed information, see [3] and [35].

B. STATISTICAL ANALYSIS FOR DETERMINING NUMBER OF PARTIAL PERMUTATIONS
If w is excessively small, there is a low probability of finding an error vector with Hamming weight less than equal to w. We present two solutions. One is iterating until an appropriate error vector is obtained, and the other is improving the decoder. The number p of columns permuted in the partial permutation varies from 0 to n/4. From numerical analysis, it is demonstrated that small values of p result in low Hamming weight of the decoding output. However, it should be noted that when p = 0, the (U , U + V ) part of the modified RM codes becomes identical to the RM code except that RM (r,r) is replaced. Hence, we propose the lower bound of p that does not affect the randomness of the hull.
Regarding the modified RM code, its hull overlaps with (but is not a subset of) the original RM code. If the hull is a subset of the original RM code, and its dimension is large, the codeword of minimum Hamming weight of the original RM code may be included in the hull. Then, attacks such as the Minder-Shokrollahi attack may be applied using codewords with minimum Hamming weight. Therefore, to prevent attacks, the hull of the public code should not be a subset of the original RM code, and hull(C pub ) (RM (r,m) permuted by Q) should occupy a large portion of the hull, where C pub denotes the public code, and denotes the relative complement.
As the permutation Q is not important for determining the parameter p, we ignore it in this subsection, and the term permutation refers to the partial permutations σ 1 p and σ 2 p . When p = n/4, which implies that σ 1 p and σ 2 p are full permutations, the average dimension of the hull and the dimension of hull(C pub ) RM (r,m) are given in Table 4. The values may slightly change according to the permutation.
If p is small, the Hamming weight of the errors decreases. Hence, the signing time can be reduced by using partial permutation with p rather than full permutation. The aim is to find a smaller value for p maintaining the dimension of hull(C pub ) RM (r,m) as large as that by the full permutation. It can be seen that the average of the dimension of hull(C pub ) RM (r,m) tends to increase as p increases, and it is saturated when p is above a certain value, as in Figure 3. Specifically, the dimension of hull(C pub ) RM (r,m) is saturated when p is approximately equal to the average dimension of hull(C pub ) RM (r,m) with full permutation. Hence, we determine p as 130, 386, and 562 in Table 2.

VII. CONCLUSION
We introduced a new signature scheme, called modified pqsi-gRM, based on modified RM codes with partial permutation as well as row appending and replacement in the generator matrix. For any given syndrome, an error vector with a small Hamming weight can be obtained. Moreover, the decoding method achieves indistinguishability to some degree because it is collision-resistant. The proposed signature scheme resists all known attacks against cryptosystems based on the original RM codes. The partially permuted RM code improves the signature success condition in previous signature schemes such as CFS and can improve signing time and key size.
We further modified the RM code using row appending/ replacement. The resulting code is expected to be indistinguishable from random codes with the same hull dimension; moreover, the decoding of the partially permuted RM code is maintained. Assuming indistinguishability and the hardness of DOOM with a high-dimensional hull, we proved the EUF-CMA security of the proposed signature scheme. The challenge of rigorously verifying these two assumptions will be addressed in the future.