Constructions of Binary Locally Repairable Codes With Multiple Recovering Sets

Locally repairable codes (LRCs) with multiple recovering sets are highly demanded in distributed storage systems. In this letter, we generalize the construction of WZL code proposed by Wang et al. and give a construction of optimal binary LRCs with multiple disjoint recovering sets which can reach the upper bound on the code rate given by Kadhe et al.. Then we further generalize our idea to obtain a construction of binary LRCs with intersect recovering sets. The code rate is much higher than that of WZL code and is very closed to the construction of Kruglik et al. Moreover, two special cases of this construction can reach the upper bound on the minimum distance.


I. INTRODUCTION
Distributed storage systems use redundancy to ensure data reliability, such as replication and MDS codes. Compared with traditional [n, k] MDS codes, locally repairable codes (LRCs) only need to access r k active nodes to recover a failure node at the cost of a small amount of storage overhead, where r is called locality. The set of these r nodes (symbols) participating in the recovery of a failure node is referred to as a recovering set of the node.
The formal definition of LRCs was first introduced by Gopalan et al. [2]. Analogous to the classical Singleton bound, they established a tradeoff between minimum distance and locality, referred to as the Singleton-type bound. A code achieving this Singleton-type bound is called optimal. After their work, other bounds were given in [3]- [6]. A tighter upper bound on the dimension k of the LRCs depending on the alphabet size was given in [7], [8]. For more studies on the bound of LRCs, one can refer to [9]- [11]. The first breakthrough construction of optimal LRCs is given in [12] by generalizing the Reed-Solomon codes. For more constructions on optimal LRCs one can refer to [13]- [17].
However, if some nodes in the recovering set are not available, we have to find an alternative set of nodes to repair the The associate editor coordinating the review of this manuscript and approving it for publication was Xueqin Jiang . failure node. Thus, it is desirable to have multiple disjoint sets of nodes available to repair data in each node. The number t of the disjoint sets is called availability. A code is said to have locality r and availability t if every symbol has t disjoint recovering sets, denoted as (r, t)-LRCs [18]. (r, t)-LRCs also support parallel reading of data, which is very effective for solving the problems of degraded reading and hot data. The first upper bound on the minimum distance of the (r, t)-LRC is given in [19].
In [20], the authors gave a bound on the code rate of (r, t)-LRCs.
This bound applies to both linear codes and non-linear codes. Kadhe et al. gave a tighter bound on the rate for (r, 3)-LRCs over F 2 in [21].
In 2017, Kruglik et al. [22] generalized the definition of (r, t)-LRCs, allowing the recovering sets of each coordinate to intersect at most x coordinates. We refer to these VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ codes as (r, t, x)-LRCs. This feature can increase the maximum achievable code rate [20] and still meet load balancing requirements. The bound on the minimum distance of (r, t, x)-LRCs is given in [1]. Note that this bound is also valid for the case of standard (r, t)-LRCs.
In this paper, we generalized the construction of WZL codes [23], and proposed a construction of binary (r, t)-LRCs.We refer to it as Construction 1. The code rate of Construction 1 reaches the upper bound (3). Noted that our Construction 1 is similar with the construction given in [24]. Although both of the constructions are based on the same inclusion matrix of linear subspaces in F m q , we additionally provide the block form of parity-check matrix. Also, we assert the rank of the parity-check matrix in Lemma 4 in a more complete and general way, so that the same arguement can be reused in Lemma 7.
Then, we use the same method to construct binary (r, t, x)-LRCs, which is referred to as Construction 2. The code rate is much higher than that of WZL codes and is very closed to the construction of Kruglik et al. [22]. The minimum distance of this code has 2× greater than that of Kruglik's construction. Moreover, the minimum distance of the two special cases of Construction 2 can attain the upper bound (4).

II. PRELIMINARIES
Let C be an [n, k, d] q linear code over F q . Denote R = k n as the rate of the code C. Denote [m] = {1, 2, · · · , m} for a positive integer m. For any subset I ⊆ [n] of coordinates of a code C, denote C I the restriction of C on I . Given α ∈ F q , define C(i, α) = {c ∈ C : c i = α}. The support of a vector v is defined as supp(v) := {i : v i = 0}.

A. GAUSSIAN BINOMIAL COEFFICIENTS
The Gaussian binomial coefficient n k q is q-analogs of the binomial coefficients. Let n, k and q > 1 be positive integers, n k q is defined to be n k q counts the number of subspaces of dimension k in a vector space of dimension n over a finite field F q .

B. LRC WITH AVAILIBLILITY
If every symbol of the code C can be recovered from t disjoint subsets of size r, then the code C is said to have locality r and availability t. The formal definition is as follows.
Wang et al. [23] gave a construction of binary (r, t)-LRCs for arbitrary r and t. We refer to the code as WZL code, and its parameters are n w = r+t t , d w = t + 1, R w = r r+t . They also gave the relation between (r, t)-LRCs and block design. We conclude in the following lemma.
Lemma 1 ( [23]): The incidence matrix of a 1-(n, r + 1, t) design can be taken as the parity check matrix of the (r, t)-LRC with length n if its blocks B 1 , B 2 , · · · , B b satisfying the following condition: where b = nt r+1 .

C. CODES WITH AVAILABILITY AND INTERSECT RECOVERING SET
If the recovering set of (r, t)-LRC can intersect at most x positions, this code is defined as (r, t, x)-LRC. Definition 2: A code is said to be (r, t, x)-LRC if for any coordinate i ∈ [n], there exist t subsets of coordinates and for all j ∈ [t], |R j i | ≤ r, every pair of symbols α, β ∈ F q , α = β we have We generalize the Lemma 1 to the following result. Corollary 1: The incidence matrix of a 1-(n, r + 1, t) design can be taken as the parity check matrix of the (r, t, x)-LRC with length n if its blocks B 1 , B 2 , · · · , B b satisfies the following condition: where b = nt r+1 . Proof: If the given conditions are satisfied, then it is obvious that each row (resp. column) of the incidence matrix of such design has r + 1 (resp. t) 1s, and any two different rows can intersect at most x + 1 positions. The rest of the proof is similar to Lemma 1, so we omit it here.

III. CODE CONSTRUCTIONS
For any positive integers m, a, b, such that a < b, a + b ≤ m, we define a matrix over F 2 , denoted as H q (m, a, b), containing m a q rows and m b q columns. Each row of H q (m, a, b) is associated with an a-dimensional subspace of F m q , and each column of suppose the i-th row is associated with the subspace W i and the j-th column is associated with the subspace V j , then the (i, j)-th element h ij of H (m, a, b) is defined as follows:

A. A GENERAL CONSTRUCTION OF BINARY (r, t)-LRCs
When a = b − 1, the matrix H q (m, a, b) can be regarded as a parity check matrix of an LRC with availability. We have the following theorem.

) as a parity check matrix is a binary [n, k, d] LRC with locality r and availability t where n
b q rows which have 1 in this column. We claim that excluding the coordinate i, supports of these m b q rows are pairwise disjoint. Otherwise, assume there are two rows, say the j-th row (denoted as h j , associated with the subspace W j ) and the l-th row (denoted as h l , associated with the subspace subspaces is of dimension at least b, the intersection of two different b-dimensional subspaces is of dimension at most b, which leads to a contradiction. Therefore, the matrix H q (m, b − 1, b) satisfies the conditions of Lemma 1, and this completes the proof.
In the binary case, it is equivalent to show that the sum of any s columns of Hence the sum of the first s columns of H 2 (m, b − 1, b) is not 0.
To see its properties more directly, we need to sort all the subspaces in the following manner. It is well known that every k-dimensional subspace of F n q is the row space of a k × n matrix of rank k, so we can use reduced row echelon form (RREF) to represent each subspace. Then take out each row of the matrix separately, assembled into a 0-1 sequence in row order, and sort the sequence in the lexical order to get the order of the subspaces. Below is an example of H 2 (m, 1, 2). Example 1: Suppose m = 3, the code C with H 2 (3, 1, 2) (Fig.1) as a parity check matrix is a (2, 3)-LRC with length n = 7, dimension k = 3, minimum distance d = 4. We label all vectors in the vector space at the beginning of rows and columns (regard a vector as a binary number and convert it to a decimal number). The above example has the same parameters as the Simplex code with m = 3. Next we prove some properties of H 2 (m, 1, 2) to help understand the code C.
Proof: It is easy to see H 2 (2, 1, 2) = (1, 1, 1) τ . For the case m ≥ 3, noticed that the RREF matrices corresponding to the former m−1 1 2 = 2 m−1 − 1 rows of the matrix H 2 (m, 1, 2) all have 0s as their first entries, thus these rows can be regarded as 1-dimensional subspaces of (m − 1)-dimensional space. Since the subspaces associated with columns are sorted, the subspaces correspond to the RREF matrices whose first column is all 0s must be at the top, and these columns can be regarded as 2-dimensional subspaces of (m − 1)-dimensional space. There are a total of m−1 2 2 of such subspaces. Therefore, the upper left block of H 2 (m, 1, 2) (i.e. the former m−1 1 2 rows and the former m−1 2 2 columns) is the matrix H 2 (m − 1, 1, 2). The RREF matrices corresponding to the latter 2 m−1 rows of the matrix H 2 (m, 1, 2) all have 1s as their first entries, but the RREF matrices corresponding to the former m−1 2 2 columns of the matrix H 2 (m, 1, 2) all have 0s as their first entries. As a result, the bottom left block is a 2 m−1 × m−1 2 2 zero submatrix. Regarding the matrix A, note that the RREF matrix of the first row of A are (1, 0, · · · , 0), which has m − 1 zeros. Since the subspaces are sorted, the subspaces correspond to the RREF matrices whose first row is (1, 0, · · · , 0) must be ranked after the RREF matrices whose first column is 0, before other subspaces. There are a total of m−1 1 2 of such subspaces. Therefore, the former m−1 1 2 columns of the first row of matrix A are all 1s. Moreover, if we add the two rows of the RREF matrix corresponding to the former m−1 1 2 columns of matrix A, we can get the vectors whose binary representation from (1, 0, · · · , 1) to (1, 1, · · · , 1) (decimal representation from 2 m−1 + 1 to 2 m − 1). The vectors corresponding to the rows start from the second row of matrix A whose binary representation also from (1, 0, · · · , 1) to (1, 1, · · · , 1). Therefore, the bottom left block of matrix A is an identity matrix of size m−1 1 2 .
According to the block form of the matrix H 2 (m, 1, 2), it is easy to see rank(H 2 (m, 1, 2)) ≥ In fact, the rows of H 2 (m, 1, 2) are linearly dependant, so some rows can be deleted. We define the following set.
From the definition of the set E i , if we fix m = 4, we can get,  If all the elements in the E i are converted into binary form, we will find that the first digit of all elements in E 0 is 1, that is, all odd numbers; the second digit of all the elements in E 1 is 1; the third digit of all elements in E 2 is 1; the fourth digit of all elements in E 3 is 1. In fact, E i is the set of elements in the binary form of all elements in the complete set U = {1, 2, · · · , 2 m − 1} whose (i + 1)th digit is 1. For convenience, we also define In the following, we show that each 2 i -th row in matrix H 2 (m, 1, 2) is an F 2 -linear combination of all rows in R i for i = 0, 1, · · · , m − 1.
Theorem 1 has showed that every column of matrix H 2 (m, 1, 2) has three 1s. It is sufficient to show that the number of 1 in any column of the rows in R i is 0 or 2. Since the subspaces are sorted in lexical order, the binary representation of the vector corresponding to each row can be regarded as row number, and the (i + 1)-th digits of the binary representation of all elements in the set E i are all 1. For any columns in R i , say j-th column, j ∈ {1, 2, · · · , m 2 2 }. Suppose the number of 1 in j-th column is 1 (resp. 3), this means there is only one (resp. three) vector whose (i + 1)-th coordinate is 1 in the subspace corresponding to the j-th column, which is impossible. Because the (i+1)-th coordinate of the remaining two vectors in the subspace is 0 (resp. 1), so the (i + 1)-th coordinate of the third vector obtained by adding these two vectors must also be 0, which leads to a contradiction, from which the result follows.
Lemma 5: When m ≥ 3, the code C which has the parity check matrix H 2 (m, 1, 2) has minimum distance d = 4.
Proof: From Lemma 3, the upper left block of the matrix H 2 (m, 1, 2) contains H 2 (3, 1, 2) for m ≥ 3, and the bottom left block is a zero matrix, The matrix H 2 (3, 1, 2) has 4 columns that are linearly dependant (see Fig.1). So there are 4 columns in the matrix H 2 (m, 1, 2) that are linearly dependant. Combine with the Lemma 2, the result follows.

C. CONSTRUCTIONS OF (r, t, x)-LRCs
When q = 2, a = 1, b ≥ 3, we can get the LRC with availability in which the recovering sets can intersect in a small number of coordinates. We refer to this construction as Construction 2.
Let us give an example of H 2 (m, 1, b).
Theorem 2: The code C which has the parity check matrix b-dimensional subspace which contains such 2-dimensional subspace W i ∩ W j , that is the supports of any two rows of H 2 (m, 1, b) satisfies the conditions of Corollary 1, and it complete the proof.
Then we give a brief analysis of the structure of the matrix Theorem 3: For any positive integer 3 ≤ b < m, the matrix H 2 (m, 1, b) is of the block form where 0 is a zero matrix, * represent arbitrary matrix over F 2 .
In particular H 2 (b, 1, b) = (1, · · · , 1) τ is a column vector of length b 1 2 . Proof: The proof is similar to the proof of Theorem 3, so we omit it here.
Next, we generalize the method of proving the rank of H 2 (m, 1, 2) to prove the rank of H 2 (m, 1, b ≥ 3). Since all the subspaces are defined over F 2 , for the convenience of proof, we also view the vector in the subspace as a binary number.
Lemma 6: For any m-dimensional vector space V over F 2 . Let S be a set of column indices such that 1 ≤ |S| ≤ m − 1.
Proof: Let n ≥ m be the vector length of V . We represent V as a RREF matrix G of size m × n. Denote G S as a submatrix formed by the columns of indices in S. We transform G to a matrix G , such that G S is in its RREF. If k is the rank of G S , 0 ≤ k ≤ |S|, then the last (m−k) rows of G S are all zero vectors. Notice that the vector space generated by the first k rows of G S contains at most one all 1s vector. Now, consider the vector space generated by the last (m − k) rows of G , its cardinality is 2 m−k , and all vectors have 0 entries at the columns of indices in S. Therefore, the vector space generated by G contains either 0 or 2 m−k vectors that have all 1s in the columns of indices in S. From which the result follows.
In fact, rows of the matrix H 2 (m, 1, b) are linearly dependant, so some rows can be deleted. We define the following set consisting of s-tuples of positive integers: j be a set of all the m-bit binary number whose i-th bit is 1, for all i ∈ (α 1 , α 2 , · · · , α s ) j , where (α 1 , α 2 , · · · , α s ) j is the j-th tuple in In the following, we show that each (E s j ) min -th row in matrix H 2 (m, 1, b) is an F 2 -linear combination of rall rows in R s j for j = 1, 2, · · · , m s and s = 0, 1, · · · , b − 1. The matrix H 2 (m, 1, b) is defined over F 2 , it is sufficient to show that the number of 1 in any column of the rows in R s j is even. Since the subspaces are sorted in lexical order, the binary representation of the vector corresponding to each row can be regarded as a row number. According to Lemma 6 and the definition of E s j , for each row in R s j , its i-th column is 1 for all i ∈ (α 1 , α 2 , · · · , α s ) j , where (α 1 , α 2 , · · · , α s ) j is the j-th tuple in C m s . Therefore, the number of 1 in any column of the rows in R s j is even. Moreover, because all these (E s j ) min are different, we can sort (E s j ) min in increasing order, then all (E s j ) min -th rows of the H 2 (m, 1, b) can be deleted in this order. From which the result follows.
Corollary 3: The code C which has the parity check matrix VOLUME 9, 2021 so the code rate is, (13) Proof: It is obvious that any two columns of the matrix cannot be equal. It is sufficient to show that the sum of any 3 columns of H 2 (m, 1, b) is not 0. We denote the b-dimensional subspace corresponding to ith column of This means that the sum of any 3 columns cannot be 0, from which the result follows.

IV. COMPARISON WITH OTHER CONSTRUCTIONS A. GENERAL CONSTRUCTION
Our general construction is a binary regular LDPC code with girth > 4. Hao et al.'s [25] proposed a construction of LRC codes with information symbols by combining an existing regular LDPC and an identity matrix. But we directly construct the parity-check matrix to obtain an LRC code, and this matrix can be viewed as a incidence matrix of BIBD with λ = 1, so our construction can also be regarded as a kind of BIBD-LDPC codes.

B. CONSTRUCTION 1
Among (r, t)-LRCs that have the same availability t = 3 as our Construction 1, WZL code is the one that has good parameters. It has been shown that WZL code has a higher rate than that of direct product code and Prakash et al.'s construction [3].
Recall that the parameters of a WZL code are n w = r+t t , d w = t + 1, R w = r r+t . For all r > 0, the code length of Construction 1 is n 1 = (r+1)(2r+3) 3 , which is shorter than that of WZL code n w = r+3 3 . Both constructions have the same minimum distance. To compare the code rate, Construction 1 is rate optimal. Indeed, it is easy to see that we always have a greater code rate than R w for r > 0.
C. CONSTRUCTION 2 There are few works on constructions of (r, t, x)-LRCs. Kruglik et al.'s construction is based on WZL code, its parameters are n k = (x + 1) r+t t , d k = 2, r k = (r + 1)(x + 1) − 1, R k = r+(t−1)x r+t+(t−1)x , so our Construction 2 has a much shorter code length and 2× greater minimum distance than Kruglik's construction. We also compare our code rate to that of Kruglik et al. and WZL code with the same locality r and availability t, as shown in Fig. 2 and Fig. 3.  The figures show our code rate are greater than WZL code, but slightly less than Kruglik et al. Moreover, when m = 4, b = 3 and m = 5, b = 4, our construction reaches the minimum distance bound (4) of (r, t, x)-LRC codes. Note that since the matrix H 2 (m, 1, b) has the above-mentioned block form (see Theorem 3), when b is fixed, the minimum distance is also fixed, which is the same as the code has parity check matrix H 2 (b + 1, 1, b).

V. CONCLUSION
In this paper, we generalize the construction of WZL codes and propose two constructions of LRC codes. Construction 1 can produce optimal (r, t)-LRCs, which can reach the upper bound of code rate (3). Construction 2 has much higher rate than that of WZL codes and attain the upper bound on minimum distance (4) in two special cases. Moreover, we give a sufficient condition for a 1-design's incidence matrix that can be the parity-check matrix of a (r, t, x)-LRCs. degree from Nanyang Technological University, Singapore. She is currently an Associate Professor with the School of Computer Science, Fudan University, China. Her research interests include quantum error correcting codes, codes for distributed storage, and cryptography. VOLUME 9, 2021