Some New Sequential-Recovery LRCs Based on Good Polynomials

We propose a new construction of sequential-recovery Locally Repairable Codes (LRCs) of length <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> with even locality <inline-formula> <tex-math notation="LaTeX">$r$ </tex-math></inline-formula> for two erasures, based on some ‘good’ polynomials, over a relatively small alphabet of size <inline-formula> <tex-math notation="LaTeX">$q \approx \frac {(r+1)n}{r+2}$ </tex-math></inline-formula>, which becomes rate optimal in some cases. We also derive an explicit form of the upper bound on the minimum distance of these codes with some additional constraints. The minimum distance of the proposed sequential-recovery LRCs for <inline-formula> <tex-math notation="LaTeX">$r=2$ </tex-math></inline-formula> achieves this explicit bound when <inline-formula> <tex-math notation="LaTeX">$k = \frac {n}{2}$ </tex-math></inline-formula> and is one less than the bound when <inline-formula> <tex-math notation="LaTeX">$k < \frac {n}{2}$ </tex-math></inline-formula>.


I. INTRODUCTION
For the reliability of the distributed storage systems (DSSs), the locally repairable codes (LRCs) has drawn much attention, since any erased symbol can be repaired by only a few other symbols. Let C be an [n, k, d] linear code over F q , whose length is n, dimension is k, and minimum distance is d. Let c = (c 0 , c 1 , . . . , c n−1 ) be a codeword of C. The code C is said to be an LRC with locality r [1] if, for each i = 0, 1, . . . , n − 1, the coded symbol c i is a linear combination of r other symbols, and denoted by an [n, k, d, r] LRC.
In this paper, we will consider LRCs with multiple erasures. They are divided into sequential-and parallel-recovery LRCs based on whether the repair process is either sequential or parallel. Now, let C be an [n, k, d, r] LRC with a codeword c. Then it is said to be a t-sequential-recovery (t-seq) LRC [2] if, for any s (≤ t) erased symbols, there exists an arrangement of s erased positions given by (j 0 , j 1 , . . . , j s−1 ) such that, for each l = 0, 1, . . . , s − 1, there is a subset R l ⊂ {0, 1, . . . , n − 1} satisfying 1) j l ∈ R l and |R l | ≤ r + 1, 2) R l ∩ {j l+1 , j l+2 , . . . , j s−1 } = ∅, and 3) c j l = i∈R l \j l a i c i , for some a i ∈ F q .
The t-parallel-recovery (t-para) [n, k, d, r] LRCs [2] is defined similarly as the sequential-recovery LRCs by The associate editor coordinating the review of this manuscript and approving it for publication was Zilong Liu .
It is obvious that the t-para LRCs is also the t-seq LRCs since it can be locally repaired by any arrangement. As shown in Fig. 1, various types of parallel-recovery LRCs are proposed.
where C | S i is the punctured subcode of C by deleting code symbols c j , j ∈ {1, 2, . . . , n} \ S i .
The sequential-recovery LRCs have a much more advantage on the erasure tolerance than the parallel-recovery LRCs with the same n and k [18]. The bounds on the code rate and/or the block-length of t-seq LRCs were proposed in [2], [19]- [22]. In [2], binary rate optimal 2-seq LRCs is constructed based on the regular graph. For any t, three classes of t-seq LRCs have been proposed: 1) graph based construction [19]; 2) resolvable configurations based construction [20]; 3) the generalized direct product construction [20]. The first two constructions are rate optimal when t = 2, 3, and the last one is rate optimal for any t. Binary 2-seq and 3-seq length optimal LRCs are proposed based on the graph [2], [21].
The upper bound on the minimum distance of 2-seq LRCs was proposed in [2], which is the only bound on the distance of the t-seq LRCs as far as we know. Four explicit constructions of 2-seq LRCs for any r that achieve the upper bound on the minimum distance were also proposed in [2]. Note that r = 2 is the most interesting situation in practice. The parameters of all the known distance-optimal 2-seq [n, k, d, 2] LRCs are shown as follows [2]. From the above parameters, we know that the options for the values of n and k are very limited. It would be better if we could have a large family of 2-seq LRCs with a larger minimum distance over a relatively small field size. In this paper, we focus on the linear 2-seq LRCs with k > r > 1. Our first contribution of this paper is a construction of 2-seq [n, k, d, r] LRCs for even r, based on some good polynomials, and show several properties. Our second contribution of this paper is the derivation of an explicit upper bound on the minimum distance of a certain class of 2-seq LRCs. We also prove the proposed 2-seq LRCs for r = 2 is optimal or near-optimal in the sense of attaining the upper bound on the minimum distance that we derived.
Section II introduces some preliminaries about the good polynomial-based LRCs for single erasure and several bounds of 2-seq LRCs. Section III describes two main contributions of the paper in detail. An open problem for the near future is given in Section IV.

II. PRELIMINARIES
be a polynomial of degree r + 1. If there exist t n disjoint subsets A 0 , A 1 , . . . , A t n −1 of F q , each of size r + 1, such that g(x) is constant on each subset, then g(x) is called good [1]. Note that a subset of size j is called a j-subset.
Known Fact 1: (Construction 1 in [1]) Let r be a positive integer. Let k and n be positive integers with r|k, (r +1)|n, and k n ≤ r r+1 . Let µ = n r+1 and A 0 , A 1 , . . . , A µ−1 be µ disjoint (r + 1)-subsets of F q , and let g(x) be a good polynomial with respect to these disjoint subsets. Then, the good polynomialbased [n, k, d, r] LRC C 1 over F q is defined as the set of codewords given as follows: where a ∈ F k q is an information vector written as a = (a i,j , i = 0, 1, . . . , r − 1; j = 0, 1, . . . , k r − 1), and f a (x) is the encoding polynomial of a, given as The good polynomial-based LRCs is a class of the optimal [n, k, d, r] LRCs for single erasure in the sense that the minimum distance d = n − k − k r + 2 over the field of size q ≈ n. Suppose the symbol c γ (j) f a (γ ) of a codeword is erased, where γ ∈ A j for some j = 0, 1, . . . , µ − 1, then its decoding polynomial is given as [1] Then, c γ (j) = δ(γ ).

B. RECURSIVE UPPER BOUND OF 2-SEQ LRCs
The support set of a vector u = (u 0 , u 1 , . . . , u n−1 ) is defined as supp(u) = {i|u i = 0}, and w(u) = |supp(u)| is the weight of u. The support set of a subcode D of a code C is defined as supp(D) = ∪ c∈D supp(c). For i = 1, 2, . . . , k, the i th generalized Hamming weight (GHW) of an [n, k, d] linear code C is defined as [2], [23] It is well-known that The n − k remaining numbers when d 1 (C), d 2 (C), . . . , d k (C) are removed from {1, 2, . . . , n} are called the gap numbers g 1 (C), g 2 (C), . . . , g n−k (C), with g 1 (C) < g 2 (C) < · · · < g n−k (C). The gap number is also called the gap for simplicity. For a given 2-seq [n, k, d, r] LRC C, its local dual subcode C ⊥ is defined as [2] where C ⊥ is the dual code of C. It is also called the local dual for simplicity. It is well-known that [2] dim The necessary and sufficient condition of the equality in (3) is widely open. It is known that the code becomes rate-optimal when the equality is satisfied [2].
where e i can be obtained recursively as follows: Furthermore, if there exists a unique integer l such that then the upper bound on the minimum distance of C is given as The 2-seq LRC C is said to be distance optimal if (6) holds with equality, and is said to be distance near-optimal if d min (C) is one less than the RHS of (6).

Known Fact 3: (Theorem 2 in [2]) Let C be a 2-seq [n, k, d, r] LRC. The upper bound on the rate of C is given by
We say that the 2-seq LRC C is rate optimal when its code rate achieves the equality in (7).

III. MAIN RESULT A. NEW CONSTRUCTION OF 2-SEQ LRCs
In this subsection, we propose a construction of 2-seq LRCs with even locality r over F q , which can be seen as an ''extended code'' of the good polynomial-based LRCs. Furthermore, we calculate its minimum distance for r = 2 and show that it is either optimal or near-optimal in terms of the minimum distance.
Let r be an even integer and q be a prime or a prime power with r + 1|q − 1. Let t n and t k be positive integers [1]. For the construction below, any codeword u ∈ C 1 is now written as an array u = (u i,j ) for Construction 1: We assume the same r (even), q, t k , and t n as above for the LRC C 1 over F q with parameters [(r + 1)t n , rt k , d 1 , r]. The code C 2 is an [(r + 2)t n , rt k , d 2 , r 2 ] LRC over F q with codewords c as the following array where, for i = 0, 1, . . . , t n − 1, Remark 1: The construction will work if c i,r+1 is the sum of ANY r symbols among u i,0 , u i,1 , . . . , u i,r . We call this an ''overall'' parity bit.
Theorem 1: The code C 2 from Construction 1 is a 2-seq [n = (r + 2)t n , rt k , d 2 , r 2 ] LRC over F q with d 2 ≥ d 1 , r 2 = r and q ≈ n r+1 r+2 . Proof: The ''overall'' parity bit c i,r+1 will not decrease its minimum distance, and hence, d 2 ≥ d 1 . For C 1 we have q ≈ (r + 1)t n . The same value of q is used for C 2 with t n = n/(r + 2). Therefore, q ≈ n r+1 r+2 . We now prove that any 2 erasures of C 2 can be repaired locally and sequentially. We write any codeword of C 2 as an array c = (c i,j ) for i = 0, 1, . . . , t n −1 and j = 0, 1, . . . , r +1. That is, we may view the codeword written as a matrix of size t n × (r + 2). When two erasures occur in two distinct rows, each erasure can be repaired in any order one by one individually because they belong to different, and hence, disjoint repair sets. We now consider the case with two erasures. Without loss of generality, assume that these two erasures belong to the top row of a codeword of C 2 . We will denote c j c 0,j for 0 ≤ j ≤ r + 1 for simplicity and convenience. Assume c x and c y are two erasures, and defined by e x and e y , respectively. We will distinguish the following two cases: 1) 0 ≤ x ≤ r, y = r + 1 and 2) 0 ≤ x < y ≤ r.
The case 1) is easy since e x can be repaired by other r symbols from (2) first and then e y is the sum of the first r symbols.
The case 2) has two subcases: y = r and y < r. When y = r, the sequential recovery is easy since e x can be repaired by r symbols first as and then e y can be repaired by r symbols from (2). We now consider the case where 0 ≤ x < y < r in the following, which is repaired by the remaining r unerased symbols in the top row. Using the decoding polynomial (2) for the code C 1 , we have e x = δ(β x ) and e y = δ(β y ), where the polynomial δ(β j ) is determined as Adding these two relations, we have e x + e y = δ(β x ) + δ(β y ).
Using the relation of C 2 in (8), we have It is now enough to show that two equations (10) and (11) in two unknowns e x and e y have a unique solution. The first equation (10) can be written as for some u 1 ∈ F q . Similarly, (11) can be written as e x + e y = u 2 for some u 2 ∈ F q . Some simple row-operations give the following: for some constants u 2 and u 3 ∈ F q . This equation will have a unique solution if the coefficient matrix is non-singular, or It is straightforward to show that, for any 0 ≤ x < y ≤ r, we have r τ =0 τ =x,y β y − β τ β x − β τ = ±β m , for (r + 1) m, and the result is also an element of A 0 , since A 0 is a multiplicative subgroup. Then we can get that since 2m ≡ 0 (mod r + 1) for even r. Therefore, (12) has a unique solution. From Known Fact 2, we have the following condition for the rate optimality of the proposed LRCs.
Corollary 1: The 2-seq LRC C 2 over F q in Theorem 1 is rate optimal when t k = t n .

B. EXPLICIT UPPER BOUND ON THE MINIMUM DISTANCE
In this subsection, we derive an explicit form of the upper bound on the minimum distance of C, and show in Corollary 3 that the proposed 2-seq LRCs with r = 2 is distance near-optimal when t k < t n , and is distance optimal when t k = t n .
Lemma 1: Let C be a 2-seq [n, k, d, r] LRC and C ⊥ be its local dual. Denote by d i (C ⊥ ) the i th GHW of C ⊥ as defined in Subsection II-B. When n = (r + 2)t for some t and (r − 1) | 2(t − 1), the upper bound of d i (C ⊥ ) is given as, for i = 1, 2, . . . , 2t, with h 2(t−1) r−1 , Proof: Observe that 2n r+2 = 2t. Denote by ψ i the RHS of (14), and we will prove that ψ i satisfies the same recursion of e i in (4) for i = 1, 2, . . . , 2t. We will distinguish the cases where h > 1 and h = 1.
For h = 1, the recursive relationship of ψ m can be proved similarly.
LRCs with r = k can be seen as the maximum distance separable code, so we do not consider this case in this paper. Further, if r(r −1) | 2(t −1) then (r −1) | 2(t −1). Therefore, the explicit upper bound on the minimum distance of 2-seq [n, k, d, r] LRC C with r + 2|n and r < k can be derived based on the above explicit upper bound on the GHW of its local dual C ⊥ .
Corollary 2: For the 2-seq LRC C in Theorem 3, when r = 2 and k = 2t k for some t k ≤ t, we have, Proof: We note that h = 2(t −1), j = 0 and r = 2 hence = 2. Corollary 3: The 2-seq LRC C 2 over F q in Theorem 2 is distance near-optimal when t k < t n , and is distance optimal when t k = t n .

IV. CONCLUDING REMARK
This paper constructed the near-optimal 2-seq [(r + 2)t n , rt k , d, r] LRCs for even r over a relatively small alphabet of size q ≈ (r+1)n r+2 , where t k ≤ t n ≤ q−1 r+1 . For comparison, we show the various parameters of the 2-seq LRCs in Table 1.
The proposed 2-seq LRCs is rate optimal or distance optimal or distance near-optimal for some cases. In the future, it may be important to find a construction for the optimal 2-seq LRC with any locality r ≥ 2 over a smaller alphabet. .

APPENDIX A THE PROOF OF THEOREM 2
We will fix the notations for C 2 in Theorem 2 and hence the corresponding C 1 also. Any codeword c ∈ C 2 in the array representation of size t n × (r + 2) consists of the codeword u ∈ C 1 of size t n × (r + 1) on the left and the right-most column of length t n . Here, we write u i = (u i,0 , u i,1 , u i,2 ) as i th row of u and c i,3 = u i,0 + u i,1 for i = 0, 1, . . . , t n − 1. Therefore, we may write, as an array, and where u add is the last column of c which consists of the ''overall'' parity bits. We first take a look at the encoding polynomial of element u i,j of u ∈ C 1 . From (1) with r = 2, We note that g(α i β j ) = α 3i for all j = 0, 1, 2. Therefore, for each i = 0, 1, . . . , t n − 1, the encoding polynomial f a,i (x) of the i th row u i is given as where We will denote by τ m the number of rows of u with weight m in u for m = 0, 1, 2, 3. Lemma 2: Consider C 1 and C 2 in Theorem 2 and assume all the notation in the discussion after Theorem 2, leading to f a,i (x) in (18). Let a ∈ F 2t k q be a non-zero information vector, and let u be its codeword of C 1 . Let u add be the last column of the corresponding codeword c ∈ C 2 . Then, the following holds: . For a row u i of weight 3, 2) In any u, the number of rows with w(u i ) = 3 and u i,0 + u i,1 = 0 is at most min(τ 3 , 2t k − 2τ 0 − 1). 3) For any c, w(u add ) ≥ τ 2 + τ 3 − min(τ 3 , 2t k − 2τ 0 − 1).
Proof: We recall that |A i | = 3 for all i. We skip the proof of a) of Case 1). For the subcase b), we note that w(u i ) = 1 implies f a,i (x) in (18) of degree at most 1 must have two roots. For the subcase c), we note that w(u i ) = 2 if and only if u i,j = H i + B i α i β j = 0 or equivalently, For the second assertion, we assume that w(u i ) = 2 for some i. Then, − H i B i ∈ A i implies H i = 0. Suppose that u i,0 + u i,1 = 0 on the contrary. Then, u i,0 , u i,1 must be both non-zero and u i,2 = 0, and hence, u i,0 + u i,1 + u i,2 = 0, which contradicts to the following: For the subcase d), we note that w(u i ) = 3 if and only if B i and H i satisfy the condition which is the complement of the union of the previous cases. For the second assertion, we observe first that u i, 2 for some τ 0 values of i. This is equivalent to saying that some 2τ 0 elements in such a are linear combinations of the remaining 2t k −2τ 0 elements. When 2t k −2τ 0 < τ 3 , therefore, the number of additional linear dependencies of elements in such a can be at most 2t k − 2τ 0 − 1. The necessary and sufficient condition for u i,0 + u i,1 = 0, is also a linear dependency in a. Therefore, the number of rows u i with weight 3 satisfying 2H i − α i β 2 B i = 0 can be at most 2t k − 2τ 0 − 1. When 2t k −2τ 0 ≥ τ 3 on the other hand, it is obvious that the number of rows u i with weight 3 satisfying 2H i − α i β 2 B i = 0 can be at most τ 3 . Case 3) comes easily from the previous cases. Remark 2: The equality of Case 3) in Lemma 2 holds when, for information a ∈ F 2t k q , there is only one choice of freedom in a and the remaining 2t k − 1 elements of a are decided by 2τ 0 equations of the type in (19) and 2t k − 2τ 0 − 1 equations of the type in (20). Now, we continue the proof of Theorem 2. Let u, u ∈ C 1 be codewords with the same weight such that 2τ 2 + 3τ 3 = 2τ 2 + 3τ 3 . If τ 2 > τ 2 then τ 2 − τ 2 = 3l, τ 0 − τ 0 = −l and τ 3 − τ 3 = −2l, for some positive integer l, and hence, we have w(u add ) > w(u add ) by Lemma 2, which implies that w(c) > w(c ).
We now claim that w(c) ≥ 4(t n − t k ) + 3 − min(t n − t k , 1) w min (21) VOLUME 10, 2022 for any c ∈ C 2 . Observe that it is enough to prove (21) for all the codewords c corresponding to u ∈ C 1 with τ 2 ∈ {0, 1, 2}. We will prove this by induction on the weight of u.