Optimal Locally Repairable Codes for Parallel Reading

Locally repairable codes (LRCs) have important applications in distributed storage systems. In this paper, we study <inline-formula> <tex-math notation="LaTeX">$q$ </tex-math></inline-formula>-ary <inline-formula> <tex-math notation="LaTeX">$[n,k,d]$ </tex-math></inline-formula> LRCs with <inline-formula> <tex-math notation="LaTeX">$(r,t,\delta)$ </tex-math></inline-formula>-information-locality, where each of the <inline-formula> <tex-math notation="LaTeX">$i$ </tex-math></inline-formula>-th <inline-formula> <tex-math notation="LaTeX">$\! (1 \! \le \! i \! \le \! k)$ </tex-math></inline-formula> information symbol is contained in <inline-formula> <tex-math notation="LaTeX">$t$ </tex-math></inline-formula> punctured subcodes with length <inline-formula> <tex-math notation="LaTeX">$\le r +\delta -1$ </tex-math></inline-formula>, minimum distance <inline-formula> <tex-math notation="LaTeX">$\delta $ </tex-math></inline-formula>, and the <inline-formula> <tex-math notation="LaTeX">$i$ </tex-math></inline-formula>-th information symbol is the unique common code symbol of these <inline-formula> <tex-math notation="LaTeX">$t$ </tex-math></inline-formula> subcodes, furthermore, each subcode contains exactly <inline-formula> <tex-math notation="LaTeX">$\delta -1$ </tex-math></inline-formula> parity symbols. Firstly, an upper bound on the minimum distance of such <inline-formula> <tex-math notation="LaTeX">$q$ </tex-math></inline-formula>-ary LRCs with <inline-formula> <tex-math notation="LaTeX">$(r,t,\delta)$ </tex-math></inline-formula>-information-locality is given. Then, we propose a general construction framework of <inline-formula> <tex-math notation="LaTeX">$q$ </tex-math></inline-formula>-ary optimal LRCs with <inline-formula> <tex-math notation="LaTeX">$(r,t,\delta)$ </tex-math></inline-formula>-information-locality and minimum distance <inline-formula> <tex-math notation="LaTeX">$d=t(\delta \!-\!1)\!+\!1$ </tex-math></inline-formula>, where the required field size is just <inline-formula> <tex-math notation="LaTeX">$q \ge r \!+\!\delta \!-\!2$ </tex-math></inline-formula>. The proposed optimal LRCs can always repair a failed information node locally in case of at most <inline-formula> <tex-math notation="LaTeX">$t\delta -1$ </tex-math></inline-formula> node failures. Moreover, multiple repair subcodes can support parallel readings of data, thus make the proposed codes attractive for distributed storage systems with hot data.


I. INTRODUCTION
Modern large distributed storage systems usually store redundant data to ensure data reliability in case of storage node failures. Due to the large volume of data, redundancy schemes based on erasure codes become more attractive because of the higher storage efficiency compared to replications. However, when some storage node fails, the repair process of traditional erasure codes, especially maximum distance separable (MDS) codes, usually requires reading large amount of data from surviving nodes. In recent years, locally repairable codes (LRCs) [1] which can repair failed nodes efficiently have attracted a lot of interest.
In an [n, k, d] linear code over F q , a code symbol is said to have r-locality if it can be repaired by accessing at most r other code symbols. LRCs are linear codes with locality properties for code symbols. For an [n, k, d] linear code with r-locality for information symbols, Gopalan et al. proved the well-known Singleton-like bound [1] The associate editor coordinating the review of this manuscript and approving it for publication was Zihuai Lin .
For optimal code constructions, Tamo and Barg proposed a family of optimal LRCs using polynomial methods [2]. Complete enumerations of all optimal binary LRCs meeting the bound (1) were given in [3]. Optimal cyclic LRCs were proposed in [4], [5]. Classifications of optimal ternary LRCs were given in [6]. Wang and Zhang proposed a refined bound of LRCs and gave corresponding optimal codes [7]. Optimal LRCs with single parity symbol in each group were studied in [8]. Sometimes multiple node failures occur in distributed storage systems. For a failed node with r-locality, if one of the r repairing nodes also fails, the local recovery can not be accomplished. In order to accomplish local recovery in case of multiple node failures, two parallel generalizations of locality, i.e., (r, t)-locality [9]- [11] and (r, δ)-locality [12], were proposed. The type of (r, t)-locality also has advantages to support parallel readings of data. A code symbol in an [n, k, d] linear code is said to have (r, t)-locality if there exist t disjoint groups of other symbols, each with size at most r, to repair this symbol. By this definition, a symbol with (r, t)-locality can always be repaired locally in case of t −1 other node failures. Moreover, multiple disjoint repair groups can support parallel readings of data. This functionality is quite important for storage systems with VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ hot data. For an [n, k, d] linear code with (r, t)-locality for information symbols, its minimum distance [9], [11] Constructions of LRCs with (r, t)-locality were given in [9]- [11], [13], [14]. Existing optimal LRCs meeting the bound (2) have low code rate [9]. Specially, Rawat et al. proved that when only information symbols have (r, t)locality and every repair group contains exactly one parity symbol, the minimum distance [11] Corresponding optimal LRCs meeting the bound (3) with relatively high code rate were proposed in [11], [15] and optimal binary LRCs were given in [16], [17].
For the other type of generalization, a code symbol in a linear code is said to have (r, δ)-locality if it is contained in a punctured subcode, which has length at most r + δ − 1 and minimum distance at least δ. When the number of erasures is less than δ, a code symbol with (r, δ)-locality can always be repaired from at most r code symbols. For an [n, k, d] linear code with (r, δ)-locality for information symbols [12], Corresponding optimal LRCs with (r, δ)-locality were given in [12], [18]- [24]. Recently, Cai et al. proposed a more generalized concept of locality which combined (r, t)-locality and (r, δ)-locality, it was shown that the bound (2) and (4) are extreme cases of the bound in [25]. However, optimal LRCs attaining the bound given in [25] still have low code rate. In the following of this paper, we assume the first k symbols c i , for i = 1, 2, · · · , k, in an [n, k, d] linear code C are information symbols, and the last n − k symbols are parity symbols. We follow the line of [25] to combine the notion of (r, δ)-locality with (r, t)-locality, where we focus on the case that the number of parity symbols in each local subcode is fixed. A linear code is said to have (r, t, δ)-informationlocality if the i-th information symbol, for 1 ≤ i ≤ k, is contained in t punctured subcodes, each has length at most r +δ −1, minimum distance δ and their supports only intersect on the i-th coordinate, furthermore, each local subcode contains exactly δ − 1 parity symbols. More precisely, there exist t subsets 1 , . . . , t ⊂ [n] such that • | j | ≤ r + δ − 1, and the minimum distance of the subcode C j obtained by puncturing code symbols in [n] \ j equals δ, for all j ∈ [t]; • j ∩ s = {i}, for any j = s ∈ [t]; In this paper, we study the bounds and constructions of [n, k, d] LRCs with (r, t, δ)-information-locality where each local subcode contains exactly δ − 1 parity symbols. Firstly, we give an upper bound on the minimum distance of such [n, k, d] LRCs with (r, t, δ)-information-locality. The proposed bound can reduce to the bound (3) when δ = 2 and reduce to the bound (4) when t = 1. Then, we proposed a general construction framework of q-ary optimal LRCs with (r, t, δ)-information-locality by exhibiting their explicit parity-check matrices. Our optimal LRCs have minimum distance d = t(δ − 1) + 1, which attains the proposed upper bound. The required field size is just q ≥ r + δ − 2. The proposed codes can always repair a failed information node locally in case of at most tδ − 1 node failures, and these t repair subcodes of each information symbol can support parallel readings of data, which benefits storage systems with hot data. Moreover, the relatively small field size implies fast encoding, decoding and repairing operations, thus make the code appealing for practical implementations of distributed storage systems with hot data.
The rest of this paper is organized as follows. In Section II, we present the upper bound. Section III gives the optimal code constructions and Section IV concludes the paper.

II. THE UPPER BOUND
Let C be a q-ary [n, k, d] LRC with (r, t, δ)-informationlocality and each local subcode contains exactly δ − 1 parity symbols. In this section, we present an upper bound on the minimum distance of such [n, k, d] LRCs with (r, t, δ)information-locality.
The upper bound is obtained by analyzing the standard form of the generator matrix of C according to locality property. Consider its generator matrix in standard form where e i , for i = 1, 2, · · · , k, is the i-th standard basis vector, and g i , for i = 1, 2, · · · , n − k, called parity column, corresponds to parity symbol. The support supp(v) of a vector v is the set of coordinates of its non-zero entries and its weight is defined to be wt(v) = |supp(v)|. In this section, all generator matrices are in the standard form of (5). We begin with a simple observation on the structure of these local subcodes in C from the view of generator matrix.
Proposition 1: For a local subcode C with length ρ + δ − 1 and minimum distance δ which contains 1 ≤ ρ ≤ r information symbols and δ − 1 parity symbols in C. Let be the set of the coordinates of these ρ information symbols.
i.e., all parity columns of C has uniform weight | | ≤ r.
Proof: Since C has minimum distance δ, each of these nonzero rows of [e j 1 , · · · , e j ρ , g i 1 , g i 2 , · · · , g i (δ−1) ], i.e., the generator matrix of C, has weight at least δ, thus in each nonzero row of the last δ − 1 columns [g i 1 , g i 2 , · · · , g i (δ−1) ], all the δ − 1 entries must be nonzero. In other words, each parity column In the following, we call the set of such δ−1 parity columns = [g i 1 , g i 2 , · · · , g i (δ−1) ] of a local subcode C in the standard form of generator matrix G as its local-parity-block. These δ − 1 parity columns in a local-parity-block are called as locality-columns. By Proposition 1, all locality-columns in have the same support. Denote the support of a local-parityblock of a local subcode as where is the set of coordinates of ρ information symbols in C. Each local-parity-block contains a ρ ×(δ −1) submatrix with exactly ρ · (δ − 1) nonzero entries. By the definition of (r, t, δ)-information-locality, in the standard form of generator matrix G, each of the k information symbols c i (1 ≤ i ≤ k) has t local-parity-blocks 1 , . . . , t corresponding to its t local subcodes, and these t local-parity-blocks satisfy By investigating the structure of G according to locality properties, we can obtain the following upper bound on the minimum distance of C.
Proof: Let G = (I k , B) be the standard form of generator matrix of C, where I k consists of k information columns and B consists of n − k parity columns including those localitycolumns. Next, we investigate the structure of B by rearranging the orders of locality-columns with weight ≤ r to the left part of B according to locality property.
Let L be a set of columns in B, initialized by an empty set. For the first information symbol, we add its t localparity-blocks 1 , . . . , t of totally t(δ − 1) parity columns into L. Then for the second information symbol, it also has t local-parity-blocks. Suppose there are already t 2 local-parityblocks of the second information symbol in L, where 0 ≤ t 2 ≤ t, we add its remaining t − t 2 local-parity-blocks into L. Repeat this process until all the k information symbols have at least t local-parity-blocks of locality-columns in L. Note that at the end of this process, some information symbols might have more than t local-parity-blocks contained in L, but there must exist at least one information symbol which has exactly t local-parity-blocks in L.
Let s be the number of local-parity-blocks in L. Let l = s(δ − 1) be the number of locality-columns of these s localparity-blocks in L = [ 1 , · · · , s ]. Denote the remaining parity columns in B as [g l+1 , · · · , g n−k ]. Then, we rearrange the orders of the columns in B as follows, Let be the number of the nonzero entries in L = [ 1 , · · · , s ]. Now let us determine the value of from the view of rows. Each of the i-th (1 ≤ i ≤ k) information symbol has at least t local-parity-blocks in L and each local-parityblock contains exactly δ −1 locality-columns whose supports cover the i-th information symbol. Hence, each of the i-th Then, let us determine the value of from the view of columns. Each local-parity-block m for 1 ≤ m ≤ s in L has δ − 1 locality-columns. By Proposition 1, each localitycolumn in m has uniform weight |supp( m )| ≤ r, then each local-parity-block contains at most r(δ − 1) nonzero entries. There are s local-parity-blocks in L in total. Hence, Combining (8) and (9), we have i.e., The number of these locality-columns in L is Since there must exist one information symbol which has exactly t local-parity-blocks of locality-columns in L, we know at least one row of the submatrix L has weight t(δ − 1). Combining that every row in I k have weight one and each row in [g l+1 , · · · , g n−k ] has weight at most n − k − l, there must exist a row with weight at most 1 + t(δ − 1) + (n − k −l) in G. Thus, the minimum weight of all codewords, or the minimum distance, of C satisfies this completes the proof. We use a simple example to illustrate the above proof idea. Example 1: For the standard form of generator matrix G of a [12,3] LRC with (r = 2, t = 2, δ = 3)-informationlocality, each information symbol is contained in two [4,2,3] local subcodes which only intersect on this information symbol. Then, each of the three information symbols has two local-parity-blocks, and the nonzero entries of each localparity-block give a 2×2 submatrix. By rearranging the orders of the parity columns in G, suppose the structure of G is in the following form, locality−columns * * * * * * * * * * * * other columns where * 's denote nonzero entries and #'s can be zero or nonzero. There are s = 3 local-parity-blocks in L and the number of locality-columns in L is l = s(δ − 1) = 6. For each row in G, the weight in I 3 is one, the weight in L is t(δ − 1) = 4, and the weight in the last part is at most 3. Hence, each row of G has weight at most 8. In fact, according to the bound (6), the minimum distance which is upper bounded by the maximal weight of rows in G. Remark 1: Note the proposed upper bound (6) for LRCs with (r, t, δ)-information-locality is a generalization of the bound (3) for (r, t)-locality and bound (4) for (r, δ)-locality: • when δ = 2, the proposed upper bound (6) can reduce to the upper bound (3), • when t = 1, the proposed upper bound (6) can reduce to the upper bound (4).

III. OPTIMAL CODE CONSTRUCTIONS
In this section, we propose a general construction framework of q-ary optimal [n, k, d] LRCs with (r, t, δ)-informationlocality and minimum distance is d = t(δ − 1) + 1, which attains the upper bound (6). The optimal codes are presented by exhibiting their explicit parity-check matrices. A binary matrix A is said to be (t, r)-regular if it has uniform column weight t and uniform row weight r. Furthermore, if the supports of any two rows of A intersect on at most one common coordinate, A is said to have girth g > 4. There are various constructions of (t, r)-regular matrices with girth g > 4. One important class of such matrices are incidence matrices of certain objects from combinatorial designs, e.g., generalized quadrangle, 2-design, or projective plane, etc [26]. Another large class of regular matrices with g > 4 are parity-check matrices of regular LDPC codes, e.g., array LDPC codes [27]. A simple connection between regular matrices with g > 4 and optimal binary LRCs with (r, t)-locality has been established in [16]. Inspiring by this idea, we employ regular matrices with girth g > 4 to construct q-ary optimal LRCs with (r, t, δ)-information-locality.
Next, we use a simple example to illustrate the construction framework in Construction 1.

in each row of A 4×6 with the j-th column of H and the 0's with two-dimensional all-zero column vectors. By Construction 1, the parity-check matrix H = [A *
8×6 , I 8×8 ] is (17), shown at the bottom of the next page. Then, the linear code C with H as parity-check matrix is an optimal quaternary [n = 14, k = 6, d = 5] LRC with (3, 2, 3)-information-locality, and each local subcode contains exactly 2 parity symbols.
In the following, we verify the correctness of Construction 1. Clearly, the parity-check matrix H in (15) has full rank and C has code length ν + µ(δ − 1) and dimension ν. The first ν columns of H correspond to the ν information symbols of C, and the last µ(δ−1) columns correspond to parity symbols. The parity-check H in (15) is divided into µ row blocks, each containing δ − 1 rows. The i-th, for 1 ≤ i ≤ µ, row block contains the [(i − 1)(δ − 1) + 1]-th row to the i(δ − 1)-th row of H . Moreover, the nonzero columns in the i-th row block constitute the q-ary (δ − 1) × (r + δ − 1) matrix H in (14), where any δ − 1 columns of H are linearly independent. By puncturing the n − (r + δ − 1) columns of C corresponding to these all-zero columns of the i-th row block, the resulting q-ary local subcode C with H as local parity-check matrix has length r + δ − 1, dimension r and minimum distance δ. Moreover, this local code C contains exactly δ − 1 parity symbols which correspond to the last δ − 1 nonzero columns of H . Since A µ×ν has uniform column weight t and A * µ(δ−1)×ν is obtained by replacing all the 1's in A µ×ν , each of the ith (1 ≤ i ≤ ν) information symbol is contained in t such q-ary [r + δ − 1, r, δ] local subcodes. Since A µ×ν has girth g > 4, it is not hard to see that the supports of these t local subcodes only intersect on the i-th coordinate. Therefore, all the ν information symbols satisfy (r, t, δ)-informationlocality, and each local subcode contains exactly δ − 1 parity symbols.
We divide the subsequent analyses into two cases. If all these t(δ − 1) columns are from the last µ(δ − 1) columns of H , apparently they are linearly independent. Otherwise, suppose that among these t(δ − 1) columns, there is at least one column ξ with weight t(δ − 1) which lies in the first ν columns of H . These t(δ − 1) nonzero entries in ξ are contained in t local parity-check matrices. We partition these t(δ − 1) nonzero entries of the column vector ξ into t subvectors of weight δ − 1, such that each sub-vector of weight δ − 1 is contained in the same local parity-check matrix. Since any δ − 1 columns of a local parity-check matrix are linearly independent, in order to eliminate the δ − 1 nonzero entries in a sub-vector, at least δ − 1 other columns from this local parity-check matrix are needed. Since the t local parity-check matrices containing the coordinate of ξ only intersect on this coordinate, it involves at least t(δ − 1) other columns to eliminate these t sub-vectors, i.e., all the t(δ − 1) nonzero entries of ξ . This implies that any t(δ − 1) columns including ξ must be linearly independent. Combining all the above discussions, C has minimum distance d = t(δ − 1) + 1.
Proof: By the above discussion, the [n = ν + µ(δ − 1), k = ν] linear code C has (r, t, δ)-information-locality and each local subcode contains exactly δ − 1 parity symbols. By Lemma 1, C has minimum distance d = t(δ−1)+1. Consider the submatrix A * µ(δ−1)×ν in H of (15), which contains the first ν columns of H . Let be number of nonzero entries in A * µ(δ−1)×ν . Since each row of A * µ(δ−1)×ν has uniform weight r, = µ(δ−1)·r. On the other hand, each column of A * µ(δ−1)×ν has uniform weight t(δ − 1). Then = ν · t(δ − 1). Hence, By the upper bound (6), the minimum distance satisfies Thus, we can see that the minimum distance of C attains the upper bound (6) with equality, this completes the proof. Note that since A can be chosen as various regular matrices with g > 4, the construction framework in Construction 1 is very flexible in general. Different choices of regular matrix A will result in optimal LRCs with different parameters. For example, if we choose A to be the incidence matrices of specific generalized quadrangle or projective plane, we can obtain the following optimal LRCs by Construction 1.

IV. CONCLUSION
In this paper, we studied the bounds and constructions q-ary [n, k, d] LRCs with (r, t, δ)-information-locality where each local subcode contains exactly δ−1 parity symbols. An upper bound on the minimum distance of such q-ary LRCs with (r, t, δ)-information-locality was given, which can reduce to the bound (3) when δ = 2 and reduce to the bound (4) when t = 1. Then, we proposed a flexible construction framework of q-ary optimal LRCs with (r, t, δ)-information-locality and minimum distance d = t(δ − 1) + 1, which attains the proposed bound (6) with equality. Our optimal LRCs only require the field size to be q ≥ r + δ − 2, which is relatively small and thus implies fast encoding and decoding speed. The proposed optimal LRCs can always repair a failed information node by accessing at most r surviving nodes in case of at most tδ − 1 node failures. Multiple repair subcodes of each information symbol guarantee that the proposed codes possess the property to support parallel readings of data, thus are promising for practical implementations of distributed storage systems with hot data.  SHU-TAO XIA received the B.S. degree in mathematics and the Ph.D. degree in applied mathematics from Nankai University, Tianjin, China, in 1992 and 1997, respectively. Since January 2004, he has been with the Tsinghua Shenzhen International Graduate School, Tsinghua University, Guangdong, China. He is currently a Full Professor. From March 1997 to April 1999, he was with the Research Group of Information Theory, Department of Mathematics, Nankai University. His current research interests include coding and information theory, networking, and machine learning.
DEYIN LI received the B.S. degree from the Beijing University of Posts and Telecommunications, Beijing, China, in 2018, where he is currently pursuing the master's degree with the Information Security Center. VOLUME 8, 2020