Masta: An HE-Friendly Cipher Using Modular Arithmetic

The <inline-formula> <tex-math notation="LaTeX">$\mathsf {Rasta}$ </tex-math></inline-formula> cipher, proposed by Dobraunig <italic>et al.</italic> (CRYPTO 2018), is an HE-friendly cipher enjoying the fewest ANDs per bit and the lowest ANDdepth among the existing ciphers. A novel feature of <inline-formula> <tex-math notation="LaTeX">$\mathsf {Rasta}$ </tex-math></inline-formula> is that its affine layers are freshly and randomly generated for every encryption. In this paper, we propose a new variant of <inline-formula> <tex-math notation="LaTeX">$\mathsf {Rasta}$ </tex-math></inline-formula>, dubbed <inline-formula> <tex-math notation="LaTeX">$\mathsf {Masta}$ </tex-math></inline-formula>. Similarly to <inline-formula> <tex-math notation="LaTeX">$\mathsf {Rasta}$ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$\mathsf {Masta}$ </tex-math></inline-formula> takes as input a (master) secret key and a nonce, and generates a keystream block for each counter. On the other hand, <inline-formula> <tex-math notation="LaTeX">$\mathsf {Masta}$ </tex-math></inline-formula> has two main differences from <inline-formula> <tex-math notation="LaTeX">$\mathsf {Rasta}$ </tex-math></inline-formula>: <inline-formula> <tex-math notation="LaTeX">$\mathsf {Masta}$ </tex-math></inline-formula> uses modular arithmetic to support HE schemes over a non-binary plaintext space, and it uses a smaller number of random bits in the affine layers by defining them with finite field multiplication. In this way, <inline-formula> <tex-math notation="LaTeX">$\mathsf {Masta}$ </tex-math></inline-formula> outperforms <inline-formula> <tex-math notation="LaTeX">$\mathsf {Rasta}$ </tex-math></inline-formula> in a transciphering framework with <inline-formula> <tex-math notation="LaTeX">$\mathsf {BGV}/ \mathsf {FV}$ </tex-math></inline-formula>-style HE schemes. Our implementation shows that <inline-formula> <tex-math notation="LaTeX">$\mathsf {Masta}$ </tex-math></inline-formula> is 505 to 592 times faster in terms of the throughput on the client-side, while 4792 to 6986 times faster on the server-side.


I. INTRODUCTION
Recently, effective manipulation of large amount of data has become one of the most important issues in IT-related industries. Smart devices measure and record immense data every day, and novel insights are being drawn from these data. However, in light of scalability, small companies have concerned the cost of server maintenance. Cloud computing might mitigate their scalability concerns by outsourcing data and computation instead of maintaining their own servers. However, cloud computing has risks in privacy and security including data exposure and misuse by the service provider. Conventional encryption schemes might efficiently address such security issues, while any computation over encrypted data would be impossible. If the encryption scheme is homomorphic, then the cloud would be able to perform meaningful computations on the encrypted data, supporting a wide range of applications such as machine learning over a large amount of encrypted data.
Unfortunately, HE schemes commonly have two technical problems: speed and ciphertext expansion. The encryption/decryption time and the evaluation time of HE schemes are relatively slow compared to conventional encryption schemes. In particular, ciphertext expansion seems to be an intrinsic problem of homomorphic encryption due to the noise used in the encryption algorithm. Although the ciphertext expansion has been significantly reduced down to the order of hundreds in terms of the ratio of a ciphertext size to its plaintext size since the invention of the batching technique [32], it does not seem to be acceptable from a practical view point. Furthermore, this ratio becomes even worse when it comes to encryption of a short message; encryption of a single bit might result in a ciphertext of a few megabytes.
Transciphering Framework. To address the issue of the ciphertext expansion and the client-side computational VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ overload, a hybrid framework, also called a transciphering framework, has been proposed [48]. In the client-sever model, a client encrypts a message m using a symmetric cipher E with a secret key k; this secret key is also encrypted using an HE algorithm Enc HE . The resulting ciphertexts c = E k (m) and Enc HE (k) are stored by the server. When the server wants to compute Enc HE (m) (for computation over encrypted data), it will first compute Enc HE (c) for the corresponding ciphertext c. Then the server homomorphically evaluates E −1 over Enc HE (c) and Enc HE (k), securely obtaining Enc HE (m).
Given a symmetric cipher with low multiplicative depth and complexity, this framework has the following advantages on the client side.
• A client does not need to encrypt all its data using an HE algorithm (except the symmetric key). All the data can be encrypted using only a symmetric cipher, significantly saving computational resources in terms of time and memory.
• Symmetric encryption does not result in ciphertext expansion, so the communication overload between the client and the server will be significantly low compared to using any homomorphic encryption scheme alone. All these merits come at the cost of computational overload in the server side. That said, this trade-off would be worth considering in practice since servers are typically more powerful than clients.
HE-friendly Cipher. Symmetric ciphers are built on top of linear and non-linear primitives, and in a conventional environment, there have been no significant difference between the two types of primitives in terms of their implementation cost. However, when combined with BGV/FV-style HE schemes in a transciphering framework, the situation changes. Homomorphic addition is way cheaper than homomorphic multiplication in terms of computation time and noise growth.
With this observation, an HE-friendly cipher is evaluated by its multiplicative complexity and depth. In an arithmetic circuit, its multiplicative complexity is represented by the number of multiplications (ANDs in the binary case). Since homomorphic multiplication is much slower than homomorphic addition, smaller number of multiplications are desirable. Multiplicative depth is the depth of the tree that represents the arithmetic circuit, closely related to the noise growth in the HE-ciphertexts. These two metrics bring a new direction in the design of symmetric ciphers: to use simple non-linear layers at the cost of highly randomized linear layers.
Rasta and Its Drawbacks. The Rasta stream cipher [23] is one of the latest HE-friendly ciphers, which offers the fewest AND/bits and the lowest ANDdepth among the existing ciphers. It features an ASASA-like stream cipher, where affine layers are freshly generated for every encryption. Although this novel construction provides good results in terms of the cost metrics for HE-friendly ciphers, there are two drawbacks that limit its practical relevance.
The first drawback is the use of a binary plaintext space. Most of HE schemes operate on a specific class of rings, basing their security on the hardness assumption of the Ring Learning-With-Errors (RLWE) problem. The choice of the ring affects the overall security and efficiency of the HE scheme. For example, one can use a batching technique -an encoding technique enabling SIMD-like operationsonly by carefully choosing the ring and the plaintext space. With a binary plaintext space, the maximum number of slots (i.e., bits which can be packed in a single ciphertext) is significantly less than the maximum number of slots for a non-binary case. In particular, the batching technique is not available in a power-of-two cyclotomic ring, which is one of the most widely-used rings in RLWE-based cryptography. Practically, it would lead to a loss of efficiency in the serverside computation.
The second drawback is that Rasta requires a heavy computational cost in the client-side due to the generation of a huge number of pseudorandom bits for its affine layers. To efficiently prevent algebraic attacks with a small number of rounds, Rasta generates fresh affine layers for every keystream generation. Numerically, a 128-bit secure Rasta (with n = 525 and r = 5) generates pseudorandom bits of 202 KB to encrypt a plaintext of 525 bits. It costs millions of cycles, increasing computational cost in the client-side.

A. OUR CONTRIBUTION
The main contribution of this paper is to propose a new variant of Rasta, dubbed Masta. The Masta cipher takes as input a (master) secret key, a nonce, and generates a keystream block for each counter. The design principle behind our construction is two-fold: using modular arithmetic to support HE schemes over a non-binary plaintext space, and reducing random bits used in the affine layers by defining them with finite field multiplication. In this way, we aim at improving its performance in a transciphering framework with BGV/FV-style HE schemes.
Using Modular Arithmetic. The plaintext space of Masta is given as Z n t , where t is chosen such that it allows full batching on the power-of-two cyclotomic ring of the HE scheme. The encryption of Masta is defined by using modular operations (such as multiplication and addition modulo t). This arithmetic makes Masta outperform Rasta, when combined with BGV/FV-style HE schemes. Based on comprehensive analysis of our non-binary cipher, we will carefully specify the parameters, and compare the performance of Masta to Rasta for such parameters.
Reducing Randomness in the Affine Layers. Constant multiplication in a finite field F t n can be represented as a linear map on Z n t . In Masta, affine layers are generated by choosing a random element in F t n . In this way, a single encryption requires only O(log M ) pseudorandom bits, significantly smaller than O(log 2 M ) for Rasta, where M denotes the size of the plaintext space for each cipher. We will show that this technique does not degrade the security of the cipher against conventional attacks such as differential and linear cryptanalysis as well as algebraic attacks.
Implementation Results. We implemented Masta on both the client-side (with arithmetics on plaintexts) and the server-side (with arithmetics on HE-ciphertexts), and compared it to Rasta. We put our focus on speeds in both environments, and the results are summarized in Table 1.
In this table, client-side latency and throughput refer to latency and reciprocal of throughput on the client-side environment, respectively. Server-side latency and throughput refer to latency and throughput on the server-side environment, respectively. Ratio refers to the ratio of the speed of Masta to that of Rasta, so that ratio larger than 1 implies that Masta is faster than Rasta. Overall, Masta outperforms Rasta over all the sets of parameters. We note that Masta has not been fully optimized in order to make a fair comparison of Masta to Rasta, implemented with its public source code, which is not a fully optimized version. We provide fully optimized benchmarks of Masta in Section IV.

B. RELATED WORK
The transciphering framework has first been proposed in [48]. In this framework, the circuit of the AES block cipher is homomorphically evaluated [32]. This work was followed by the implementation of lightweight block ciphers SIMON [41] and PRINCE [25]. Since these ciphers have not been designed for the transciphering framework, the performance of the homomorphic evaluation was not satisfactory. In this line of research, low multiplicative complexity and depth become an important design principle, and LowMC is the first construction based on this design principle. However, it turned out that LowMC-80 and LowMC-128 are vulnerable to algebraic attacks and their variants [21], [24], [51].
Canteaut et al. claimed that stream ciphers might be advantageous in terms of online complexity compared to block ciphers, and proposed a new stream cipher Kreyvium [12]. However, its practical relevance is limited since the multiplicative depth (with respect to the secret key) keeps growing as keystreams are generated. A new stream cipher FLIP [47] is based on a novel design strategy that its permutation layer is randomly generated for every encryption without increasing the algebraic degree of the secret key. Rasta [23] is a stream cipher aiming at higher throughput at the cost of lower latency using random affine layers defined by an extendable output function (XOF).

II. PRELIMINARIES A. NOTATION
Throughout the paper, bold lowercase letters (resp. bold uppercase letters) denote vectors (resp. matrices). Usual dot product is denoted by ·, · . For two vectors (strings) a and b, their concatenation is denoted a b. For an integer q, we identify Z q with Z ∩ (−q/2, q/2]; for any integer z, [z] q denotes the mod q reduction of z into this interval. The notation [·] q is extended to vectors (resp. polynomials) to denote their component-wise (resp. coefficient-wise) reduction. Throughout the paper, ξ denote a 2N -th primitive root of unity over the finite field Z t , for fixed parameters N and t. We assume that t ≡ 1 (mod 2N ).
For a set S, we will write a ← S to denote that a is chosen from S uniformly at random. For a probability distribution D, a ← D will denote that a is sampled according to the distribution D. Unless stated otherwise, all logarithms are to the base 2.

B. BGV SCHEME
We will briefly review the BGV homomorphic encryption scheme with a special case: power-of-two cyclotomic ring. For more details, we refer to [9].
For a positive integer q, an integer M which is a power of two, and N = M /2, BGV uses as its ciphertext space, where M (X ) = X N + 1. We begin with its underlying hard problem.
LWE and RLWE. Let n and q be positive integers, and let D be a probability distribution over Z. For an unknown vector s ∈ Z n q , the LWE (Learning with Errors) distribution A LWE n,q,D (s) over Z n q × Z q is obtained by sampling a vector a uniformly at random from Z n q and an error e according to D, and outputting The search-LWE problem is to find s ∈ Z n q when independent samples (a i , b i ) are obtained according to A LWE n,q,D (s). The decision-LWE problem is to distinguish the distribution A LWE n,q,D (s) from the uniform distribution over Z n q × Z q . Lyubashevsky et al. [45] introduced the ring version of the LWE problem, which is also called Ring-LWE (RLWE). For a positive integer M , let M (X ) be the M -th cyclotomic q from the uniform distribution over R 2 q , where s ∈ R is a secret polynomial, a is sampled uniformly at random from R q and e is sampled according to a certain error distribution over R. The security of BGV is based on the hardness assumption of the RLWE problem.
Encoding and Decoding. The encoder Ecd : Key generation. Given a security parameter λ > 0, fix integers N , P, and q 0 , · · · , q L such that q i divides q i+1 for 0 ≤ i ≤ L − 1, and distributions D key , D err and D enc over R in a way that the resulting scheme is secure against any adversary with computational resource of O(2 λ ).
1) Sample a ← R q L , s ← D key , and e ← D err .
2) The secret key is defined as sk = (s, 1) ∈ R 2 , and the corresponding public key is defined as Given a public key pk and a plaintext m ∈ R t , 1) Sample r ← D enc and e 0 , Decryption. Given a secret key sk ∈ R 2 and a ciphertext ct ∈ R 2 q , Addition. Given ciphertexts ct 1 and ct 2 in R 2 q , their sum is defined as Multiplication. Given ciphertexts ct 1 = (a 1 , b 1 ) and ct 2 = (a 2 , b 2 ) in R 2 q , their product is defined as After the basic multiplication, the resulting ciphertext has 3 ring elements. For fully functional BGV, two additional procedures are required. The first procedure is key switching, which reduce the ciphertext to have 2 ring elements. The second procedure is modulus switching. After multiplication, errors in ciphertext is multiplied by t 2 . Modulus switching reduce the multiplier back again. For the details, we refer the reader to [9].

C. RASTA
The Rasta stream cipher of r rounds, denoted Rasta r , takes as input a secret key k ∈ Z n 2 and a nonce nc, and returns a keystream block k nc ∈ Z n 2 , where nc is fed to an extendable output function XOF, and the output binary sequence from XOF defines invertible affine layers Affine i , i = 0, . . . , r.
In the non-linear layers of Rasta, the χ -transformation [20] is used. So Rasta computes its keystream block by applying Affine i and χ in an interleaved manner, followed by the final key addition step, denoted AddKey (see Figure 1): 1 A primitive root of unity ξ exists if the characteristic t of the message space is an odd prime such that t ≡ 1 (mod M ). ∈ Z n 2 , namely, for x ∈ Z n 2 . All the bits of A (i) 's and b (i) 's are defined by the sequence from the extendable output function XOF using nc.
Final Key Addition. The secret key k ∈ Z n 2 is added to the output from the final affine layer. This step can be defined as follows.

III. A NEW STREAM CIPHER: MASTA
In this section, we propose a Z t -variant of Rasta [23], dubbed Masta, and analyze its security.

A. DESIGN RATIONALE
Why Non-binary Rings? Server-side efficiency heavily depends on careful selection of HE-parameters since the server will spend most of the time in homomorphic evaluation. A homomorphic encryption scheme is typically based on an RLWE problem over a ring R/qR, where R = Z[X ]/(f (X )) and f (X ) is a monic irreducible polynomial over Z. If f (x) is a cyclotomic polynomial whose degree is a power of two, then R/qR is called a powerof-two cyclotomic ring, and such rings are widely used in RLWE-based schemes due to their merits in efficiency and security (e.g., [2], [5], [9], [27], [29]). A plaintext space Z n t and a modulus polynomial f (X ) determine the number of distinct roots in Z t , and hence the number of slots in the HE scheme. However, over a powerof-two cyclotomic ring with characteristic two (i.e., t = 2), one cannot use the batching technique since For this reason, we opted for non-binary rings and the corresponding modular operations in the design of our cipher. Generating Affine Layers with Smaller Randomness. In Rasta, there are two major bottlenecks with respect to its efficiency: to generate large n × n binary matrices and to check if they are invertible (or in a better way, one can generate lower and upper triangular matrices and multiply them). This part amounts to over 99% of the total encryption time as shown in Table 2.
In order to address this issue, we identify the plaintext space of Masta with a finite field F t n , and define each affine layer by choosing a random nonzero element from F t n and multiplying it to the state. Any nonzero element ensures the invertibility of the affine layer. This approach quadratically speeds up the generation of affine layers. By sufficient cryptanalysis, we show that weak randomness used in our design does not significantly degrade the overall security of our cipher.

B. SPECIFICATION
The Masta stream cipher of r rounds, denoted Masta r , takes as input a secret key k ∈ Z n t and a nonce nc, and returns a keystream block k nc ∈ Z n t , where nc is fed to an extendable output function XOF, and the output binary sequence from XOF defines invertible affine layers Affine i , i = 0, . . . , r. So the structure of Masta is similar to Rasta; Masta computes its keystream block by applying Affine i and the χ-transformation in an interleaved manner, followed by the final key addition step AddKey: Affine Layers. An extendable output function XOF will be used in the generation of Masta. In practice, it can be instantiated with AES in the counter mode or a sponge-type hash function. Since XOF does not use any secret key, it is allowed to have a high algebraic degree; it is the degree of the cipher with respect to k that affects the overall performance of the cipher in the hybrid framework. We also note that the sequence from XOF can be precomputed in the client side.
Each affine layer is decomposed as

. , r, each of which represents a linear transformation A (i)
: Z n t → Z n t and constant addition by b (i) ∈ Z n t , respectively. Namely, for x ∈ Z n t and i = 0, 1, . . . , r. Similarly to Rasta, both A (i) and b (i) are generated by XOF, while, for Masta, A (i) is defined as finite field multiplication by a random element a (i) of F t n (being identified with Z n t ), i.e., A (i) (x) = a (i) · x (see Figure 2).
The finite field F t n is constructed by extending F t using an irreducible polynomial f (X ) of degree n over F t . In order to make the field multiplication as simple as possible, we will use an irreducible polynomial with the smallest number of nonzero coefficients, namely, f (X ) = X n − α for some α ∈ Z t . The following theorem guarantees existence of such irreducible polynomials.
Theorem 1 (Theorem 3.75 in [42]): For a prime power q, let α ∈ F × q with order e. Then X n − α is irreducible in F q [X ] if and only if the integer n ≥ 2 satisfies the following conditions: (i) each prime factor of n divides e, but not (q − 1)/e, (ii) if 4 divides n then 4 divides q − 1. Let t be a prime of the form 2 m + 1 for some positive integer m, and let n be a power-of-two. Then a generator α of a multiplicative group Z × t satisfies all the above conditions. Once Z n t and F t n are identified with respect to an irreducible polynomial f (X ) = X n − α, field multiplication by a = (a 1 , a 2 , . . . , a n ) can be represented by an n × n matrix VOLUME 8, 2020 a n−1 a n−2 . . . a 1 α · a n a n a n−1 . . .
All the vectors a (i) 's and b (i) 's are defined by the sequence from XOF, so one can assume that all the vectors are independent at random over the rounds. In this paper, we will not specify any correspondence between the pseudorandom sequence and the affine layers; all the analyses are based on the assumption that all the vectors are truly random and independent.
For simplicity of analysis, we will assume that all entries of the vectors a (i) and b (i) are nonzero; if t = 2 16 + 1 (as used in our implementation), then one can randomly choose a 16-bit element, and add 1 to the element modulo t. The result will always be a nonzero element of Z t .
Non-linear Layers. Masta uses a Z t -variant of the χ-transformation [20] as a single non-linear layer. Let x = (x 1 , . . . , x n ), where x i ∈ Z t for i = 1, 2, . . . , n. Then the i-th component of χ(x), denoted χ(x) i , is defined as follows.
Final Key Addition. The secret key k ∈ Z n t is added to the output from the final affine layer. This step can be defined as follows.

C. CRYPTANALYSIS
Modeling XOF as a random oracle, we assume that XOF outputs a truly random sequence for every distinct nonce.

1) ALGEBRAIC ATTACKS
The Masta cipher can be represented by a set of polynomials over Z t in unknowns k 1 , . . . , k n , where k i ∈ Z t denotes the i-th component of the secret key k ∈ Z n t . Since multiplication is more expensive than addition in HE schemes, most HEfriendly ciphers have been designed to have a low multiplicative depth. This property might possibly make such ciphers vulnerable to algebraic attacks. Indeed, some of recent constructions have been analyzed by algebraic attacks due to their low algebraic depth [1], [21], [28]. In this section, we will consider two different types of algebraic attacks: trivial linearization and the Gröbner basis attack.
Trivial Linearization. Trivial linearization is to make the system of polynomial equations linear by replacing all monomials by new variables. When the cipher is represented by a system of polynomial equations of degree d over Z t in n unknowns (and d < t), the number of monomials appearing in this system is upper bounded by Therefore, at most S equations will be enough to solve this system of equations. If the system is sparse, then it would require less equations to solve the system. We will show that almost all the monomials appear after r rounds of Masta, and hence this attack requires O(S) data and O(S ω ) time, where ω is the linear algebra constant such that 2 ≤ ω ≤ 3. An adversary might also try the guess and determine attack before trivial linearization. By guessing g variables, the number of possible monomials is reduced down to This approach will be useful in particular when almost every monomial appears in the system. In this case, the overall time complexity becomes O(t g S ω g ). Gröbner Basis Attack. The Gröbner basis attack is to solve a system of equations by computing a Gröbner basis of the system. If such a Gröbner basis is found, then the variables can be eliminated one by one. Gröbner basis can be computed with low data unlike the trivial linearization. However, its computation is slower than the trivial linearization with a small amount of data. For this reason, the Gröbner basis attack will be useful (compared to the trivial linearization) when either the data is limited or the number of monimials grows faster than the number of equations. For an example of the second case, we refer to [1]. From the perspective of the cipher design, it is enough to set the parameters such that O(S ω ) is large enough.
Number of Monomials. If the number of monomials is small, then algebraic attacks may work well with low data and time complexities. Here, we estimate the average number of monomials appearing in an r-round Masta.
Consider a single round χ • Affine(x) of Masta with input x = (x 1 , x 2 , . . . , x n ) and Affine(x) = a · x + b. Finite field multiplication by a = (a 1 , a 2 , . . . , a n ) is equivalent to multiplication by an n × n matrix L as defined in (1). Let L i,j be the entry in the i-th row and the j-th column of the matrix L. Then, the i-th component of the output χ • Affine(x) i is given as where u (i) Since every a j is nonzero, u (i) j = 0 for every i = 1, 2, . . . , n. This implies that every monomial x 2 j appears in the output χ • Affine(x).
When it comes to a monomial x j x l of degree two with j < l, it does not appear in the output if and only if v (i) j,l = 0 for every i = 1, 2, . . . , n. This is equivalent to the following system of equations: a n−j a n−l+1 + a n−j+1 a n−l = 0 a n−j+1 a n−l+2 + a n−j+2 a n−l+1 = 0 . . . a n−1 a n−l+j + a n a n−l+j−1 = 0 α · a n a n−l+j+1 + a 1 a n−l+j = 0 a 1 a n−l+j+2 + a 2 a n−l+j+1 = 0 a 2 a n−l+j+3 + a 3 a n−l+j+2 = 0 . . . a l−j a 1 + α · a l−j+1 a n = 0 a l−j+1 a 2 + a l−j+2 a 1 = 0 a l−j+2 a 3 + a l−j+3 a 2 = 0 . . . a n−j−2 a n−l−1 + a n−j−1 a n−l−2 = 0 a n−j−1 a n−l + a n−j a n−l−1 = 0 Once a n−j and a n−l ∈ Z × t are fixed, a n−j+1 is uniquely determined by a n−l+1 in the first equation. Given a n−j+1 and a n−l+1 , a n−j+2 is uniquely determined by a n−l+2 in the second equation. By repeating this process, one will find the smallest k such that either n−l+k = n−j or n−j+k = n− l mod n. From here, each equation uniquely determines the remaining one variable if any contradiction does not occur. Therefore, there are at most (t − 1) 2+min{l−j−1,n+j−l−1} ≤ (t − 1) 2+n/2 solutions and hence, which is close to 0 for sufficiently large n and t. In other words, all quadratic terms appear after a single round of Masta except with a negligible probability.
A monomial x j does not appear in the output if and only if w (i) j = 0 for every i = 1, 2, . . . , n, which is equivalent to the following system of equations: α · a n−j+1 + α · b 3 a n−j+2 + α · b 2 a n−j+3 = 0 α · a n−j+2 + α · b 4 a n−j+3 + α · b 3 a n−j+4 = 0 . . . α · a n−2 + α · b j a n−1 + α · b j−1 a n = 0 α · a n−1 + α · b j+1 a n + b j a 1 = 0 α · a n + b j+2 a 1 + b j+1 a 2 = 0 a 1 + b j+3 a 2 + b j+2 a 3 = 0 . . . a n−j + b 2 a n−j+1 + b 1 a n−j+2 = 0 (5) where b j = b j + 1 for j = 1, 2, . . . , n. Once a n−j+1 and a n−j+2 are fixed, a n−j+3 is uniquely determined in the first equation, provided that b 2 = 0. When b 2 = 0, one can choose an arbitrary value for a n−j+3 . For the second equation, a n−j+4 is uniquely determined by a n−j+2 and a n−j+3 , provided that b 3 = 0. Repeating this process, we see that there are at most (t − 1) p+2 solutions satisfying (5), where p is the number of b j such that b j = 0. Therefore, This probability is close to 0 for sufficiently large n and t, which implies that almost all linear terms appear after a single round except with a negligible probability. Since all the monomials of degree at most 2 appear after a single round with overwhelming probability, one can conclude that almost all monomials of degree at most 2 r appear after r rounds of Masta. Parameters. With respect to the algebraic attacks, recommended parameters are given in Table 3 for various security levels. These parameters have been computed using the estimation of S ω with ω = 2. On the other hand, guessing variables will not affect the security of Masta when t = 2 16 + 1. For t = 2 16 + 1, we can use α = 3 as the smallest generator of the multiplicative group Z × t .

2) LINEAR ATTACKS
Linear cryptanalysis [46] is based on a linear approximation of a fixed (keyed) permutation using a sufficient number of plaintext-ciphertext pairs, while this attack does not apply to Masta since this cipher uses independent random affine layers for every encryption.
In [23], a variant of the linear attack has been considered to be applied to Rasta; an adversary constructs an overdefined system of linear approximations that are valid with a certain probability, and recovers the secret key by using the LPN-problem solver. Considering this attack, the block size and the algebraic depth of Rasta have been newly estimated for each security level.
One might want to apply this type of attack to Masta (operating on a vector space Z n t ) by using an LWE-problem solver instead of an LPN-problem solver. However, in the LWE-problem, an error e is sampled according to the Gaussian distribution, while the error introduced in the linear approximations of Masta will follow the uniform random distribution. For this reason, a straightforward application of the LWE-problem solver does not seem to work.

3) DIFFERENTIAL ATTACKS
The resistance of a substitution-permutation-cipher to differential cryptanalysis is typically estimated by analyzing the maximum probability of differential trails [4]. Such a trail is detected by obtaining ciphertexts for multiple pairs of plaintexts with a fixed difference. However, the affine layers of Masta are generated independently at random for every encryption. For this reason, similarly to Rasta, the classical differential attack will not be applicable to Masta.
All these attacks are based on the evaluation of a fixed permutation on a large number of plaintexts and ciphertexts. However, Masta uses different affine layers for every encryption, so such attacks would not be applicable to Masta.

IV. IMPLEMENTATION
In this section, we implement Masta and Rasta, and compare their performance. For selected parameter sets, we evaluate their performance in terms of speed on both the client-side and the server-side.
• r: the number of rounds; • n: the blocksize; • t: the plaintext modulus; • q: the ciphertext modulus of the BGV schemes at first.
• N : the degree of the polynomial modulus in the BGV schemes; • : the multi-dimensional array structure of the slot in a BGV plaintext; • λ : the security level of the BGV schemes with given parameters.

Client-side Computation. Comparison between Masta and
Rasta on the client-side is summarized in Table 4. The source code of Rasta is brought from the github repository [50] which is officially written in the paper [23]. The total time consists of the time for affine layer generation and the encryption time, all measured in the number of cycles. All the affine layers and the round constants are generated by SHAKE256. The encryption time covers all the computation except generation of affine layers. Throughput is the total time divided by the number of bits per block (i.e., n log t), measured in cycles per byte. 'Optimization level' specifies which part has been optimized. The first one is for SHAKE256, where 'AVX2' stands for the use of AVX2optimized code in XKCP while 'Naive' stands for the use of SHAKE256 code in Rasta github repository (which is not AVX2-optimized). The second one is for the encryption part, where 'Opt' stands for a fully optimized encryption code (including AVX2) and 'Naive' stands for a non-optimized code. For Rasta, we use the code in the repository as it is (without any further optimization).
Our result shows that Masta outperforms Rasta in any set of parameters. For a ''fair'' comparison with the ''Naive-Naive'' optimization level in the 80-bit security level, the latency of Masta is 214 to 645 times faster than Rasta, and the throughput is 443 to 571 times faster. When it comes to fully optimized codes, we can find the benchmark  recording 900, 000 cycles for 219-bit 6-round Rasta (with the 80-bit security level) in [23]. Although microarchitectures are different, we can simply compare it to 16-word 5-round Masta, which is about 100 times faster than Rasta in latency.
Server-side Computation. The server-side is implemented by using the HElib library. The performance of Masta and Rasta is summarized in Table 5; (a) has tight BGV parameters for evaluating Masta and Rasta, and (b) has BGV parameters allowing additional 7 multiplications after evaluating Masta and Rasta, as considered in [23].
As mentioned in the design rationale of Masta, powerof-two cyclotomic rings are used in our implementation of Masta. The ciphertext modulus q is chosen to guarantee at least a security level of λ bits and enough noise capacity to evaluate Masta. In case of Rasta, the slot structure is also important in the choice of the BGV parameters. When is s × d, s/n × d blocks are packed in each BGV plaintext, so s should be at least n. 2 In the tables, the columns 'Affine' and 'Non-linear' refer to the total time to evaluate affine and non-linear layers, respectively. Both affine and non-linear layers are implemented using addition, multiplication and rotation. The column 'Eval' refers to the total time to evaluate the given cipher, including time not only for affine and non-linear layers but also for the final key addition.
Our implementation shows that Masta is also better than Rasta in homomorphic evaluation. For the BGV parameters with tight capacity, the throughput of Masta is 3729 to 11758 times larger than Rasta with the 80-bit security level, while 3317 to 8531 times larger with the 128-bit security level. For the BGV parameters with additional capacity, the throughput of Masta is 3806 to 9608 times larger than Rasta.

V. CONCLUSION
In this paper, we proposed a new HE-friendly cipher Masta by modifying the Rasta cipher. Masta uses modular arithmetic in its encryption in order to support HE schemes over a non-binary plaintext space; it allows a larger number of slots and smaller number of multiplication per bits on the server side. The affine layers of Masta are defined by finite field multiplication, which requires a smaller number of random bits compared to Rasta. By comprehensive analysis, we showed that this modification does not degrade the overall security of the cipher. We also implemented Masta and Rasta on the client and the server side to compare their performance. Compared to Rasta, Masta turns out to be hundreds of times faster on the client-side, and thousands of times faster on the server-side.