Constructions of Binary MDS Array Codes With Optimal Repair/Access Bandwidth

Maximum distance separable (MDS) codes are commonly deployed in distributed storage systems as they provide the maximum failure tolerance for some given redundancy. The repair problem of MDS codes has drawn much attention and various constructions of MDS array codes with optimal repair bandwidth have been proposed in the last decade. However, few of the existing codes are constructed over the binary field. In this paper, we propose new constructions of binary MDS array codes with optimal repair (or access) bandwidth for single-node failure. Specifically, by stacking multiple Blaum-Roth code instances of which the parity-check matrices are judiciously designed, we obtain three families of binary MDS array codes with optimal repair bandwidth; using the permutation matrices as building blocks, we also construct two families of binary MDS array codes with optimal access bandwidth. Moreover, error-resilient capability while achieving the lower bound on repair (or access) bandwidth is obtained when the number of helper nodes $d < n-1$ . All the codes in this paper are constructed over a particular ring of binary polynomials. Consequently, computation operations involved in the encoding, decoding and node repair procedures for these codes are only XORs and cyclic shifts, avoiding complex multiplications and divisions over large finite fields.


Constructions of Binary MDS Array Codes With
Optimal Repair/Access Bandwidth Lei Li , Graduate Student Member, IEEE, Xinchun Yu, Member, IEEE, Liang Chen , Yuanyuan Dong , and Yuan Luo , Member, IEEE Abstract-Maximum distance separable (MDS) codes are commonly deployed in distributed storage systems as they provide the maximum failure tolerance for some given redundancy.The repair problem of MDS codes has drawn much attention and various constructions of MDS array codes with optimal repair bandwidth have been proposed in the last decade.However, few of the existing codes are constructed over the binary field.In this paper, we propose new constructions of binary MDS array codes with optimal repair (or access) bandwidth for singlenode failure.Specifically, by stacking multiple Blaum-Roth code instances of which the parity-check matrices are judiciously designed, we obtain three families of binary MDS array codes with optimal repair bandwidth; using the permutation matrices as building blocks, we also construct two families of binary MDS array codes with optimal access bandwidth.Moreover, errorresilient capability while achieving the lower bound on repair (or access) bandwidth is obtained when the number of helper nodes d < n − 1.All the codes in this paper are constructed over a particular ring of binary polynomials.Consequently, computation operations involved in the encoding, decoding and node repair procedures for these codes are only XORs and cyclic shifts, avoiding complex multiplications and divisions over large finite fields.
Index Terms-Distributed storage system, repair bandwidth, optimal access, MSR codes, binary MDS codes.

I. INTRODUCTION
D ISTRIBUTED storage systems (DSS) are built upon a large number of individually unreliable nodes to store and analyze massive amount of data, where transient and permanent node failures may occur as daily events.To provide Lei Li and Yuan Luo are with the Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: staroverseas@sjtu.edu.cn;yuanluo@sjtu.edu.cn).
Xinchun Yu is with the Institute of Data and Information, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China (e-mail: yuxinchun@sz.tsinghua.edu.cn).
Color versions of one or more figures in this article are available at https://doi.org/10.1109/TCOMM.2023.3343421.
Digital Object Identifier 10.1109/TCOMM.2023.3343421reliability and availability in the face of node failures, the system need to store some redundant data.The traditional mechanism for introducing redundancy in DSS, such as the Google File System [1], Hadoop Distributed File System [2], and Microsoft Azure [3], is replication.Clearly, the replication scheme will be very costly as the amount of data is increasing exponentially.Error-correcting codes (ECC) have been introduced into DSS as a viable alternative to replication since they can achieve higher reliability for some given redundancy.
An important class of ECC that are widely deployed in DSS are maximum distance separable (MDS) codes, which achieve the optimal trade-off between storage efficiency and fault tolerance.For a file of size M, the system using an (n, k) MDS code first divides the file into k equal-size packets, and then encodes them into n packets which are distributed over n distinct storage nodes.The MDS property guarantees reconstruction of the original file as long as any k out of these n packets are accessible.For systems using an (n, k) MDS code, the failed node can be repair by accessing and communicating an amount of data equal to the size of the original file with the help of any k out of the surviving nodes.However, this is not "efficient" for single-node failure in that the amount of data communicated, named repair bandwidth, is k times of the amount of data lost.
Among the failures, single-node failure is the most common scenario in real-world storage systems [21], and it is important to construct codes that can repair any node efficiently.Most of the existing MSR codes are constructed over some finite field whose size can not be too small.Thus, a lot of multiplications and divisions over the field are needed during node repair and file reconstruction procedures.In the present paper, we focus on the single-node repair and construct binary MDS array codes with optimal repair bandwidth, i.e., the field used is F 2 .
Few binary MDS array codes with optimal repair bandwidth have been proposed in the literature [22], [23].In [22], the authors proposed a general transformation framework to construct binary MDS array codes with optimal repair bandwidth for k + 1 ≤ d ≤ n − 1, where some of the helper nodes are specific.In [23], another generic transformation that can convert any (n, k) binary MDS array code into a new one with r = n − k chosen nodes having optimal repair bandwidth.After multiple transformations, the original binary MDS array code becomes a binary MDS array code with optimal repair bandwidth for all nodes where the number of helper nodes d is n − 1.Note that the codes in [22] and [23] also have the optimal-access property, i.e., the amount of data accessed during node repair is equal to the minimum amount of data that need to be communicated.
Motivated by the code constructions over nonbinary fields in [24], we propose new constructions of binary MDS array codes with optimal repair/access bandwidth.Specifically, by stacking multiple Blaum-Roth code instances, we construct three families of binary MDS array codes, named C 1 , C 2 and C 3 , with optimal repair bandwidth.Code C 1 achieves the optimal repair bandwidth for d = n − 1, and with a slight modification to C 1 , we obtain code C 2 which has optimal repair bandwidth for any arbitrary fixed The remainder of the paper is organized as follows.Section II presents some necessary preliminaries.Binary MDS array codes with optimal repair bandwidth and binary MDS array codes with optimal access bandwidth are constructed in Section III and Section IV, respectively.Evaluations and comparisons are made in Section V. Finally, Section VI draws the conclusion.

II. PRELIMINARIES
Given two integers i and j with i < j, denote by [i] and [i, j] two ordered sets {1, 2, • • • , i} and {i, i + 1, • • • , j}, respectively.For an (n, k) code, denote by r := n − k the number of parity nodes.Following the literature of codes for distributed storage, we use the words coordinate and node interchangeably.As a result, repairing failed nodes in a distributed storage system can be viewed as correcting erasures of a codeword.In this paper, the boldface 0 is to denote a zero vector, and the plain 0 is to denote a scalar 0. Given M distinct positive integers i 1 , i 2 , . . ., i M , denote by lcm(i 1 , i 2 , . . ., i M ) and gcd(i 1 , i 2 , . . ., i M ), respectively, the least common multiple and the greatest common divisor of these integers.

A. A Binary Polynomial Ring
For a prime p and the binary field F 2 , let R be the ring of polynomials of degree < p − 1 over F 2 with multiplication Clearly, the multiplication operation in R is commutative.
In this paper, we use special elements in R * to construct the codes with desired properties.So, it is beneficial to introduce some of the elements in R * here.First observe that x ∈ R * since gcd(x, 1

B. Blaum-Roth Codes
An (n, k, l) binary array code can be viewed as a set of matrices of size l × n over F 2 .Let p be a prime number, a Blaum-Roth code C BR is a code over the polynomial ring R.
Blaum-Roth code is defined by its paritycheck matrix where 1, x, x 2 , . . ., x n−1 are n distinct nonzero elements that have multiplicative inverse in R. Note that the matrix H BR can be defined by 1, x, x 2 , . . ., x n−1 and we call these n elements the "evaluation points" of the (n, k, l) Blaum-Roth code.Lemma 1: Every r columns of H BR are linearly independent over the ring R. Proof of Lemma 1 is omitted and the reader can refer to [25] for more details.
Since every r columns in H BR are linearly independent over R, we have that code C BR is a code of length n, dimension n − r and minimum distance r + 1 over R. Thus, code C BR is MDS code.

C. Lower Bounds on Repair/Access Bandwidth of MDS Codes
An (n, k, l) MDS array code can tolerate any r = n − k erasures because any k coordinates of a codeword can reconstruct the whole codeword, meaning that the storage system is reliable even when r storage nodes are failed.However, when there is only one failed node, downloading data from k nodes to repair the failed node is not efficient in terms of repair Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
bandwidth.Specifically, the amount of data communicated during node repair procedure is kl, which is k times the amount l of data stored on the failed node.
In [4], the authors analysed the information flow graph of storage systems and derived the lower bound on repair bandwidth γ of any (n, k, l) MDS code for single-node repair.The bound is also called cut-set bound and can be written as where d ∈ [k, n−1] is the number of helper nodes.The optimal repair bandwidth will decrease as the number of helper nodes d increases.In particular, when d = n − 1, the optimal repair bandwidth of an (n, k, l) MDS code gets the minimum value, which is (n−1)l n−k .In [26], error-resilient capability was considered in the repair process of single-node failure, i.e., there maybe erroneous or malicious nodes among the helper nodes.For e ≤ n−k 2 , it is proved that the lower bound on the repair bandwidth γ e for single-node failure with e-error resilient capability can be written as where d is the number of helper nodes, among which there are e errors.

III. BINARY MDS ARRAY CODES WITH OPTIMAL REPAIR BANDWIDTH
In this section, we present explicit constructions of three families of codes, named C 1 , C 2 , C 3 , over the binary polyno- Given integers n, r and a prime p, let C ⊂ R r n ×n be an (n, k = n − r, l) binary MDS array code with l = (p − 1)r n .Node where C i is a column vector consisting of r n polynomials in R. In the present paper, we define a code by its parity-check equations as follows: where The parity-check matrix H of code C can be written as We note that the sub-packetization levels of the codes in the present paper are different and are not necessarily equal to (p − 1)r n .
A. Binary MDS Array Code With Optimal Repair Bandwidth for d = n − 1 In this subsection, we present construction of binary MDS array codes with optimal repair bandwidth for d = n − 1.
Before presenting the general code construction, we give toy examples to highlight the main ideas behind the construction.
To illustrate how the repair bandwidth is reduced simply by stacking multiple Blaum-Roth code instances, we first give an example code with (n = 4, k = 2).This example code is obtained by stacking two Blaum-Roth codes whose "evaluation points" are x 1,1 = x, x 2 = x 2 , x 3 = x 3 , x 4 = x 4 and x 1,2 = x 5 , x 2 = x 2 , x 3 = x 3 , x 4 = x 4 , respectively.These five "evaluation points" are chosen from the binary polynomial ring R = F 2 [x]/(1 + x + • • • + x 6 ).The paritycheck equations can be written as and where t ∈ [0, 1].Summing up equations ( 6) and ( 7), we have where t ∈ [0, 1].Obviously, equation ( 8) defines a new Blaum-Roth code with (n = 5, k = 3) whose "evaluation points" are x 1,1 , x 1,2 , x 2 , x 3 , x 4 and we can transfer {c i,1 + c i,2 } i∈ [2,4] to repair the first node.The repair bandwidth for repairing the first node is 3(p − 1) bits, which is half of the trivial repair of Reed-Solomon code and achieves the lower bound on repair bandwidth.In the following Example 1, we present a code that has optimal repair bandwidth for all nodes.
Example 1: We take n = 4, k = 2 for example and the binary MDS array code C(n = 4, k = 2, l) can be defined by a set of matrices {H t,i : t ∈ [0, 1], i ∈ [4]} according to equation (4).Instead of constructing rn = 8 matrices, by making H t,i = H t i , we only need n = 4 matrices {H i : i ∈ [4]} to define the code.Let p = 11 and l = (p − 1)r n = 160, the code C(4, 2, 160) is obtained by stacking r n = 16 Blaum-Roth codes with parameter (4, 2, 11), whose "evaluation points" are chosen from a set of rn = 8 elements the "evaluation points" of the a-th Blaum-Roth code are x 1,a1 , x 2,a2 , x 3,a3 , x 4,a4 where (a 4 , a 3 , a 2 , a 1 ) is the binary expansion of a.For example, when a = 5 = (0, 1, 0, 1), the "evaluation points" of the corresponding Blaum-Roth code are The above example shows the core structure of the codes constructed in this section.We now present the general construction and analyze the repair bandwidth in the sequel.
Construction 1 (C 1 ): Given integers n and r, let l = (p − 1)r n and {x , where p is a prime and p − 2 ≥ rn.Code C 1 defined by ( 4) is constructed by taking H t,i = H t i Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. with Here, e a ∈ R r n is a column vector whose a-th entry is 1 and the other entries are 0s.Note that a ∈ [0, r n − 1] is also represented by its r-ary expansion, i.e., It is not hard to verify that H i is an r n ×r n diagonal matrix whose a-th diagonal entry is 4) is a column vector of length l = (p−1)r n over F 2 and it can also be viewed as a column vector of length r n over R. Specifically, For brevity of writing, we use c i,a , the a-th entry in C i , to denote both a vector of length p − 1 over F 2 and a polynomial in R. For any a ∈ [0, r n − 1], we can write down the corresponding parity-check equations in (4) as The code constructed in Construction 1 is named C 1 in this paper.We now give the first theorem claiming the code C 1 has optimal repair bandwidth for d = n − 1.
Theorem 1: Code C 1 achieves the lower bound (2) on repair bandwidth for single-node repair.
Proof: Assume node i ∈ [n] is failed and let where u ∈ [0, r − 1].According to (9), for fixed u and a(i, u) we have Summing up (10) over u = 0, 1, . . ., r − 1, we have where r−1 u=0 c j,a(i,u) is the data communicated from helper node j and can be denoted as D i,a j .According to (1), we can see that formula (11) defines an (n+r −1, n−1, p−1) binary MDS array code whose parity-check matrix can be written as Thus, the lost data {c i,a(i,0) , c i,a(i,1) , . . ., c i,a(i,r−1) } can be obtained by downloading {D i,a j : j ∈ [n]\{i}} from the n − 1 helper nodes where p − 1 bits are downloaded on each node.For the overall repair of C i , note that C i can be partitioned into r n−1 disjoint sets, each of which can be written as {c i,a(i,0) , c i,a(i,1) , . . ., c i,a(i,r−1) } for some a ∈ [0, r n − 1] and each set can be repaired by downloading p − 1 bits from every helper node.As a result, a total number of (p − 1)r n−1 bits are communicated from each helper node for the overall repair of C i , achieving the lower bound (2) on repair bandwidth.
We now analyze the MDS property of code C 1 by giving the following Property 1.
Property 1: Code C 1 is an MDS array code.
Proof: For any a ∈ [0, r n − 1], rewrite (9) in matrix form and we have  It is clear that formula (13) defines an (n, n − r, p − 1) binary MDS code according to Lemma 1.Thus, any k = n − r out of n coordinates in the codeword (c 1,a , c 2,a , . . ., c n,a ) can reconstruct the whole codeword.The MDS property of code C 1 follows since this holds for all a ∈ [0, The MDS property of code C 1 is obtained by selecting rn special polynomials of form x i , i.e. powers of x in R and the optimal repair bandwidth is achieved through r n combinations of these special polynomials.With this technique, we can construct the other two families of codes C 2 and C 3 by some modifications of C 1 .

B. Binary MDS Array Code With Optimal Repair Bandwidth for Arbitrary
In this subsection, we show that Construction 1 can be slightly modified to construct code with optimal repair bandwidth for arbitrary d , where p is a prime and p − 2 ≥ sn.Code C 2 defined by ( 4) is constructed by taking H t,i = H t i with Here, e a ∈ R s n is a column vector whose a-th entry is 1 and the other entries are 0s.Also, a ∈ [0, s n − 1] is represented by its s-ary expansion, i.e., a = (a n , a n−1 , . . ., a 1 ) = The code constructed in Construction 2 is named C 2 in this paper.
Theorem 2: Code C 2 achieves the lower bound (2) on repair bandwidth for arbitrary d Proof: The proof follows the similar arguments to that of Theorem 1 and we present it in detail to make it more Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
comprehensive.Assume node i ∈ [n] is failed and let a(i, u) = (a n , . . ., a i+1 , u, a i−1 , . . ., a 1 ) where u ∈ [0, s − 1].Since the parity-check matrix is a diagonal matrix over R, we can write down the parity-check equations for any a ∈ [0, s n − 1] as For fixed u ∈ [0, s − 1] and a(i, u), we have Summing up (15) over u = 0, 1, . . ., s − 1, we have where s−1 u=0 c j,a(i,u) is the data communicated from helper node j and can be denoted as D i,a j .According to Lemma 1, we can see that formula (16) binary MDS code whose parity-check matrix can be written as Thus, the lost data {c i,a(i,0) , c i,a(i,1) , . . ., c i,a(i,s−1) } can be obtained by downloading D i,a j from any d out of n − 1 helper nodes where p − 1 bits are downloaded on each node.For the overall repair of C i , note that C i can be partitioned into s n−1 disjoint sets, each of which can be written as {c i,a(i,0) , c i,a(i,1) , . . ., c i,a(i,s−1) } for some a ∈ [0, s n −1] and each set can be repaired by downloading p − 1 bits from each helper node.Consequently, a total number of (p − 1)s n−1 bits are communicated from each helper node for the overall repair of C i , achieving the lower bound (2) on repair bandwidth for arbitrary fixed Property 2: Code C 2 is an MDS array code.
Proof: The proof follows the same arguments as that in the proof of Property 1 and is omitted.
In the above repair procedure, we notice that only d out of the n − 1 surviving nodes are accessed to recover the lost bits and the other data in the remaining n − 1 − d nodes are "wasted".When there are more than d nodes taking part in the repair procedure, we find that code C 2 possesses the error-resilient capability while achieving the corresponding lower bound (3) on repair bandwidth.The following remark describes this error-resilient capability in detail.
Remark 2: By ( 16) and ( 17) we conclude that (c i,a(i,0) , c i,a(i,1) , . . ., c i,a(i,s−1) , D i,a j1 , D i,a j2 , . . ., D i,a jn−1 ) is a codeword of a (d + r, d, p − 1) binary MDS array code for any a ∈ [0, s n − 1] and the minimum distance of this code is r + 1.Thus, for an integer e such that e ≤ r 2 , any d + 2e out of the n − 1 remaining coordinates {D i,a j : j ∈ [n]\{i}} can reconstruct the codeword as long as the number of erroneous coordinates in the d + 2e coordinates is not greater than e.As a result, for any i ∈ [n], C i can be repaired by connecting any d + 2e helper nodes and downloading (p − 1)s n−1 bits on each node as long as the number of erroneous nodes among the helper nodes is not greater than e.In total, there are In the previous two subsections, we constructed binary MDS array codes with optimal repair bandwidth for some fixed d, meaning that d has only one value in [k + 1, n − 1].In the present subsection, we construct codes with optimal repair bandwidth for multiple values of d simultaneously by using previous constructions as building blocks.
Construction 3 (C 3 ): , where p is a prime and p − 2 ≥ sn.Code C 3 defined by ( 4) is constructed by taking Here, e a ∈ R s n is a column vector whose a-th entry is 1 and the other entries are 0s.Also, a ∈ [0, s n − 1] is represented by its s-ary expansion, i.e., a = (a n , a n−1 , . . ., a 1 ) = The code constructed in Construction 3 is named C 3 in this paper.
Theorem 3: Code C 3 achieves the lower bound (2) on repair bandwidth for The parity-check matrix of code C 3 consists of rn diagonal matrices of size s n × s n over R and for any a ∈ [0, s n − 1] we can write down the corresponding parity-check equations as For fixed u ∈ I δ , δ ∈ [s/s m ] and a(i, u), we have Summing up (19) over u ∈ I δ , we have Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where u∈I δ c j,a(i,u) is the data communicated from helper node j and can be denoted as D i,a,δ j .According to Lemma 1, it is not hard to see that formula (20) defines a (d m + r, d m , p − 1) binary MDS array code.Thus, any d m out of the n − 1 coordinates {D i,a,δ j : j ∈ [n]\{i}} can reconstruct the lost coordinates {c i,a(i,u) : u ∈ I δ } where p−1 bits are downloaded on each helper node.By setting δ = 1, 2, . . ., s/s m , we conclude that {c i,a(i,0) , c i,a(i,1) , . . ., c i,a(i,s−1) } can be repaired by downloading (p − 1)s/s m bits on each of the d m helper nodes.Note that C i can be partitioned into s n−1 disjoint sets, each of which can be written as {c i,a(i,0) , c i,a(i,1) , . . ., c i,a(i,s−1) } for some a ∈ [0, s n − 1].As a result, for the overall repair of C i , a total number of (p − 1)s n /s m bits are communicated from each of the d m helper nodes, achieving the lower bound ( 2

IV. BINARY MDS ARRAY CODES WITH OPTIMAL ACCESS BANDWIDTH
In the previous section, we constructed three families of binary MDS array codes with optimal repair bandwidth.However, one can find that despite the amount of data communicated through the storage network achieves the lower bound, all the data stored on the helper nodes are accessed to repair the failed node.As a result, the total number of data accessed is d/k > 1 times of that in the trivial repair of an MDS code, which will consume massive disk I/Os.
In this section, we construct binary MDS array codes with optimal-access property, i.e., the amount of data accessed during node repair is equal to the minimum amount of data that need to be communicated.Specifically, two families of codes C 4 and C 5 are presented, which achieve the optimal access bandwidth for arbitrary fixed d ∈ [k + 1, n − 1] and for multiple values of d ∈ [k + 1, n − 1], respectively.The codes in this section are also constructed over the binary polynomial ring

H2
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

H4
The above example shows the core structure of the codes constructed in this section.We present the general construction and analyze the access bandwidth in the sequel.Construction 4 (C 4 ): For integers n, k and d, let s = d−k+ 1 and l = (p − 1)s n where p is a prime with p − 2 ≥ n.Code C 4 defined by ( 4) is constructed by taking H t,i = H t i with where ⊕ denotes the operation of addition modulo s, {x i,0 = x i } i∈[n] is a set of n distinct nonzero elements in the polynomial ring R and Here, e a ∈ R s n is a column vector whose a-th entry is 1 and the other entries are 0s.Note that a ∈ [0, s n − 1] is also represented by its s-ary expansion, i.e., a = (a n , a n−1 , . . ., a 1 ) = It is not hard to verify that for ∀i ∈ [n], H i is an s n × s n permutation matrix over R and the a(i, a i ⊕ 1)-th entry in the a-th row of the matrix is x i or 1 for a ∈ [0, s n − 1], meaning that H i is invertible in R. To find the structure of H t i where t ∈ [0, s − 1], we first compute the square of H i as x i,ai e a e T a(i,ai⊕1) )( e a e T a(i,ai⊕2) .
Similarly, for i ∈ [n] and t ∈ [0, s − 1], we have x i,ai,t e a e T a(i,ai⊕t) , where x i,u,0 = 1 ∈ R, and x i,u,t = u⊕(t−1) v=u i is also an s n × s n permutation matrix whose nonzero entries are all invertible in R.
Before unveiling the optimal access property of the code C 4 during node repair, we introduce two properties of the permutation matrix of H i , i ∈ [n] and an invertible block matrix over R, respectively, in the following two lemmas.
Lemma 2: For any i, j Proof: According to the construction of H i , it is easy to find that x i,ai e a e T a(i,ai⊕1) )( x i,ai x j,aj e a e T a(i,j,ai⊕1,aj where a(i, j, a i ⊕ 1, a j ⊕ 1) is obtained by replacing a i and a j by a i ⊕ 1 and a j ⊕ 1, respectively.Note that the proof relies on the commutative property of To prove the second part of the lemma that H i − H j is invertible, we assume f a e a is a column vector of length s n − 1 over R. Clearly, we have x i,ai e a e T a(i,ai⊕1) )( x i,ai f a(i,ai⊕1) e a and similarly, x j,aj f a(j,aj ⊕1) e a .
Thus, for ∀a ∈ [0, s n − 1] we have Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Since x i,ai ∈ R * has a multiplicative inverse x −1 i,ai in R for all i ∈ [n] and a ∈ [0, s n − 1], multiplying (22) by x −1 i,ai , we obtain f a(i,ai⊕1) = x −1 i,ai x j,aj f a(j,aj ⊕1) and x j,aj f a(i,j,ai⊖1,aj ⊕1) .Expanding f a(j,aj ⊕1) recursively, we have where the commutative property of multiplication over R is used.Note that for i, j with 1 ≤ i, j ≤ n ≤ p and i ̸ = j, , where I is an l × l diagonal matrix over R with the diagonal entries equal to 1, is invertible if Proof: The proof of Lemma (3) can be obtained by following the similar induction procedure to that in [24] and we omit the details for the sake of brevity.Now we unveil the optimal access property of code C 4 in the following theorem.
Theorem 4: For n, k and arbitrary fixed d ∈ [k + 1, n − 1], code C 4 achieves the lower bound (2) on repair bandwidth with optimal access property.
Proof: For ∀i ∈ [n], H i is an s n × s n permutation matrix over R, and the parity-check equations can be written as where a ∈ [0, . Assume node i is failed, rewrite the equation ( 23) as x j,aj ,t c j,a(j,aj ⊕t) = 0.
From (24) we can see that the data {c i,a : a i = t} can be computed once we know the data {c j,a : j ∈ [n]\{i}, a i = 0}.Since ( 24) holds for any t ∈ [0, s − 1] ⊆ [0, r − 1], knowing the set {c j,a : j ∈ [n]\{i}, a i = 0} is sufficient to repair all the data C i stored on node i.We now define an injection where a ∈ [0, s n−1 −1] can be written as (21), we have x j e a e T a = x j I and x j H w j (27) according to ( 4) with H w,j = H w j .Multiplying ( 26) by x i ∈ R * and substracting the result from (27), we obtain n j=1 where , let e ′ a ∈ R s n−1 be a column vector whose a-th entry is 1 ∈ R and the other entries are 0 ∈ R. For j ∈ [n]\{i}, let A j be an s n−1 × s n−1 matrix over R which is computed as It can be verified that for j ∈ [i−1], A j is a permutation matrix and the a(j, a j ⊕ 1)-th entry in the a-th row of the matrix is x j or 1; for j ∈ [i, n − 1], A j is also a permutation matrix and the a(j, a j ⊕ 1)-th entry in the a-th row of the matrix is x j+1 or 1, where a ∈ [0, s n−1 − 1].According to (28), we have where With the commutative property of multiplication in R, we can rewrite (29) as . Following the similar arguments to that in the proof of Lemma 2, one can verify that for any Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
is invertible over R. Thus, ) always has a multiplicative inverse in R and so does ( Clearly, any d out of the n − 1 coordinates {C ′ j : j ∈ [n]\{i}} are sufficient to reconstruct the whole codeword, and with (25), the set {c j,a : j ∈ [n]\{i}, a i = 0} is further obtained.Finally, C i is recovered by (24).It is not hard to verify that in the above procedure, there are d(p − 1)s n−1 bits accessed and communicated, achieving the lower bound (2) on repair bandwidth for arbitrary fixed d Property 4: Code C 4 is an MDS array code.
Proof: The MDS property of C 4 can be directly obtained according to Lemma 2 and Lemma 3.
During the repair of node forms an MDS array code, we claim that code C 4 possesses the error-resilient capability while achieving the lower bound (3) on repair bandwidth with optimal access property.The following remark presents the error-resilient capability of code C 4 in detail.
Remark 5: \{i}} can reconstruct the codeword as long as the number of erroneous coordinates among the d+2e coordinates is not greater than e.Thus, the recovery of the whole coordinate C i is guaranteed by (24).In total, there are (p − 1)(d + 2e)s n−1 = (d+2e)l In this subsection, we present the construction of binary MDS code with optimal access property for multiple values of and l = (p − 1)s n where p is a prime with p − 2 ≥ n.Code C 5 defined by ( 4) is constructed by taking H t,i = H t i with where ⊕ denotes the operation of addition modulo s, {x i,0 = Here, e a ∈ R s n is a column vector whose a-th entry is 1 and the other entries are 0s.Note that a ∈ [0, r n − 1] is also represented by its s-ary expansion, i.e., a = (a n , a n−1 , . . ., a 1 ) = n i=1 a i •s i−1 where a i ∈ [0, s−1].The code constructed in Construction 5 is named C 5 in this paper.
Similar to code C 4 , for code C 5 , it can be verified that for any i ∈ [n], H i is an s n × s n permutation matrix and the a(i, a i ⊕ 1)-th entry in the a-th row of the matrix is x i or 1 for a ∈ [0, s n − 1], meaning that H i is invertible in R. The power of H i , i.e., H t i , is also an s n × s n permutation matrix whose nonzero entries are all invertible in R.
Theorem 5: Code C 5 achieves the lower bound (2) on repair bandwidth with optimal access property for d 1 , d 2 , . . ., d M simultaneously.
Proof: For any a ∈ [0, s n − 1], the parity-check equation of code C 5 can be written as where x i,u,0 = 1 ∈ R, and x i,u,t = u⊕(t−1) v=u For Multiplying (33) by H sm n and substracting the result from (34), we have , let e ′ a ∈ R s n /sm be a column vector whose a-th entry is 1 ∈ R and the other entries are 0 ∈ R. For j ∈ [n − 1], let A j be an s n /s m × s n /s m matrix over R which is computed as x j,aj e ′ a (e ′ a(j,aj ⊕1) ) T , j ∈ [n − 1], and x n,u )e ′ a (e ′ a(n,(sman⊕sm)/sm) ) T .
With (35) and g i , we have × (e ′ a(j,n,aj +1,(smbn⊕sm)/sm) ) T = A n A j where a(j, n, a j +1, (s m b n ⊕s m )/s m ) is obtained by replacing a j and a n by a j + 1 and (s m b n ⊕ s m )/s m , respectively.With this multiplication commutative property, equation (36) can be rewritten as where w ∈ [0, r − s m − 1].Following the similar arguments to that in the proof of Lemma (2), one can verify that forms an (n − 1, d m , (p − 1)s n /s m ) binary MDS array code.To further obtain that (C ′ 1 , C ′ 2 , . . ., C ′ n−1 ) forms an (n − 1, d m , (p − 1)s n /s m ) MDS code, we need to prove that A sm j − A n is invertible over R for all j ∈ [n − 1].We now assume that A sm j f = A n f for some j ∈ [n − 1], where f = (f 0 , f 1 , . . ., f s n /sm−1 ) = f a e ′ a ∈ R s n /sm is a column vector of length s n /s m over R. Since x j,aj e ′ a (e ′ a(j,aj ⊕1) ) T ) sm ( = ( x n,u )f a(n,(sman⊕sm)/sm) , we have and × f a(j,n,aj ⊖sm,(sman⊕sm)/sm) .
Expanding f a(j,n,aj ⊖sm,(sman⊕sm)/sm) recursively, we have Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Proof: Similar to code C 4 , the MDS property of code C 5 can be directly obtained according to Lemma 2 and Lemma 3.
During the repair of node 1)s n /s m ) binary MDS array code, we conclude that code C 5 possesses the error-resilient capability while achieving the lower bound (3) with optimal access property.The details are presented in the following remark.

V. EVALUATION
In this section, we evaluate the proposed codes in terms of encoding and decoding complexity.We also make comparisons of our codes with some existing codes in the literature.
According to ( 9), ( 14) and ( 18), the three families of codes in Section III are constructed by stacking multiple Blaum-Roth codes whose parity-check matrices are judiciously designed.We now analyze the encoding and decoding complexity of these codes.Without loss of generality, we take code C 1 for example and rewrite the parity-check equations for any a ∈ [0, r n − 1] in matrix form.Also, we assume that the first k nodes are information nodes and the last r = n − k nodes are parity nodes.From (13), we can obtain As a result, instead of inverting an r n+1 × r n+1 matrix over R, we only need to invert the r × r matrix for r n times to finish encoding.Similarly, the decoding procedure also only involves the inversion of r ×r matrix over R. Besides, the fast encoding and decoding algorithms of Blaum-Roth codes can be directly used in the implementation of these three families of codes.Note that both encoding and decoding procedures of these codes can be completed in parallel since the computation operations for different a ∈ [0, r n − 1] are independent.
We now analyze the encoding and decoding complexity of the codes C 4 and C 5 constructed in Section IV.Without loss of generality, we take code C 4 with d = n − 1 for example.Given any b 1 , b 2 , . . ., b k ∈ [0, r − 1], note that the r r+1 unknown elements {c i,a : i ∈ [k + 1, n], a i = b i } appear in exactly r r+1 equations in (23) and these r r+1 equations only contain these r r+1 unknown elements.As a result, these r r+1 unknown elements can be obtained by inverting the corresponding r r+1 × r r+1 matrix over R. Similarly, the decoding of code C 4 with d = n − 1 also only involves the inversion of r r+1 × r r+1 matrices for r k times, instead of the inversion of an r n+1 × r n+1 matrix over R.Both encoding and decoding of these codes can be completed in parallel since the r k matrix inversion operations are independent.
The most related works to the present paper are [10], [22], [24], [23], and [27].We now make comparisons of our codes with these codes respectively to end this section.
The codes in [24] and our codes share the same core structure and the main differences between them are in two folds.First, codes in [24] are constructed over finite fields while our codes are constructed over binary field.Consequently, multiplications and divisions over finite field are avoided, which is good for code implementation on hardware since XORs and cyclic shifts can be finished fast.Second, for codes in [24], entries in the parity-check matrices are chosen from arbitrary distinct nonzero elements in a finite field, while we choose elements of special form, i.e., powers of x, in the polynomial ring R = F 2 [x]/(1 + x + x 2 + • • • + x p−1 ) to guarantee the MDS property.One may argue that the MSR codes in [24] constructed over the finite field F 2 m can be easily converted to binary MSR codes as the field F 2 m is isomorphism to the vector space F m 2 .However, this conversion can not avoid operations over the field, i.e., the encoding and decoding of the resulting codes still involve multiplication and divisions over the field F 2 m , which are usually implemented through look-up table.We choose the special elements (powers of x) in the R to construct the codes, which will facilitate the encoding and decoding procedures as the fast (advanced) encoding/decoding algorithm of Blaum-Roth code can be directly used.
The binary MDS array codes constructed in [10] achieve the optimal repair bandwidth only for information nodes while the codes in the present paper has optimal repair/access repair bandwidth for both information nodes and parity nodes.More-Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
over, the codes in [10] have fixed redundancy, i.e., r = 2.It is important to construct codes with a wider range of redundancy choices to satisfy different fault tolerance requirements and the codes in this paper works for any choice of redundancy.
The codes in [22] are binary MDS array codes with optimal access bandwidth, which are constructed over the same polynomial ring as in this paper.Compared with our work in this paper, the codes in [22] fall short mainly in two folds.First, the codes in [22] only have one repair degree, i.e., the number of helper nodes d is fixed while our codes C 3 and C 5 work for multiple values of d simultaneously.Second, for some d < n − 1, a subset of the helper nodes are specific for the codes in [22], while arbitrary d out of the n − 1 surviving nodes can help repair the failed nodes for the codes in this paper.We want to note that the "arbitrary d" property provides more flexibility in helper node choice and may be important in situations where some nodes are too busy to help repair the failed node.
The codes in [23] are also binary MDS array codes with optimal access bandwidth, which are constructed through the similar pairwise coupling technique as used in [22].The transformations in [22] and [23] are different in that the choice of the PCT in [22] depends on the underlying MDS code and should be carefully determined case-by-case, the properties of the new codes should also be proved case-by-case.Whereas the PCT in [23] is uniform and independent of the underlying MDS codes.The transformation in [23] is more general and can be applied to any binary MDS code.The codes in [23] only work for d = n − 1 while our codes work for any d ∈ [k + 1, n − 1].Note that the transformation in [23] can be directly used to generate codes with d < n − 1.However, as the authors in [23] explained, the resulting codes will lack the "arbitrary d" property.Thus, the case of d < n − 1 and the d helper nodes can be arbitrarily chosen is their future work, which is done in the present paper.
The codes in [27] are binary MDS array codes with very small sub-packetization level of l = (p−1)r while the optimal repair bandwidth is sacrificed.It is worth mentioning that the sub-packetization level (column length of the array code) of our codes is much larger than that of the codes in [22] and [23].Thus, constructing binary MDS array codes with small sub-packetization level is part of our future work.

VI. CONCLUSION
In this paper, we proposed new constructions of binary MDS array codes with optimal repair/access bandwidth.By stacking multiple Blaum-Roth codes whose parity-check matrices are judiciously designed, we constructed three binary MDS array codes with optimal repair bandwidth.The fast encoding and decoding algorithms of Blaum-Roth codes can be directly used in the implementation of these codes.With the help of permutation matrices, we constructed two families of binary MDS array codes with optimal access bandwidth.For d ∈ [k + 1, n − 1], all the codes can achieve the lower bound on optimal repair/access bandwidth with error-resilient capability.Note that the sub-packetization level of the codes in this paper is very large, and constructions of binary MDS array codes with small sub-packetization level is part of our future work.

Manuscript received 15
July 2023; revised 14 November 2023; accepted 7 December 2023.Date of publication 15 December 2023; date of current version 18 June 2024.This work was supported in part by National Natural Science Foundation of China under Grant 62171279, National Key R&D Program of China under Grant 2022YFA1005000, and Fundings of SJTU-Alibaba Joint Research Lab on Cooperative Intelligent Computing.An earlier version of this paper was presented in part at the 2023 IEEE International Symposium on Information Theory [DOI: 10.1109/ISIT54713.2023.10206994].The associate editor coordinating the review of this article and approving it for publication was J. Chen.(Corresponding author: Yuan Luo.) Through some further modification to C 2 , we arrive at the construction of C 3 which has optimal repair bandwidth for multiple values of d ∈ [k + 1, n − 1] simultaneously.Using the permutation matrices, we construct two families of codes C 4 and C 5 with optimal access bandwidth, respectively, for arbitrary fixed d ∈ [k + 1, n − 1] and for multiple values of d ∈ [k+1, n−1] simultaneously.We note that codes C 1 , C 2 and C 3 were presented at the 2023 IEEE International Symposium on Information Theory.

(
d+2e)l d−k+1 bits communicated to repair a failed node.Thus, code C 2 achieves the lower bound on repair bandwidth for any fixed d ∈ [k +1, n−1] with e error-resilient capability where e ≤ r 2 .C. Binary MDS Array Code With Optimal Repair Bandwidth for Multiple Values of d ∈ [k + 1, n − 1] Simultaneously

3 :Remark 4 :
) on repair bandwidth for d m .The proof is completed since this holds for any m ∈ [M ].Property 3: Code C 3 is an MDS array code.Proof: The proof follows the same arguments as that in the proof of Property 1 and is omitted.Similar to code C 2 , code C 3 possesses the error-resilient capability while achieving lower bound (3) on repair bandwidth for several values of d ∈ [k + 1, n − 1] simultaneously.Details about this error-resilient capability of code C 3 are given in the following remark.Remark Since formula (20) defines a (d m + r, d m , p − 1) binary MDS array code with minimum distance equal to r + 1, we conclude for any integer e such that e ≤ r 2 , the lost bits {c i,a(i,u) : u ∈ I δ } can be recovered by connecting any d m + 2e out of the n − 1 coordinates {D i,a,δ j : j ∈ [n]\{i}} as long as the number of erroneous coordinates in the d m + 2e coordinates is not greater than e.As a result, for any i ∈ [n] and e ≤ r 2 , C i can be repaired by connecting any d m + 2e helper nodes and downloading (p − 1)s n /s m bits on each node as long as the number of erroneous nodes is not greater than e.Thus, code C 3 achieves the lower bound (3) on repair bandwidth for all d m , m ∈ [M ] with e-error resilient capability where e ≤ r 2 .Through a further slight modification of Construction 3, as presented in the following remark, we obtain the binary MSR code for all values of d ∈ [k + 1, n − 1] simultaneously.By substituting s in Construction 3 with s = lcm(2, . . ., r), we can obtain binary MDS array code which achieves the lower bound (2).The resultant code also achieves the lower bound (3) on repair bandwidth for all values of d ∈ [k + 1, n − 1] simultaneously with e-error resilient capability where e ≤ r 2 .

7 :
code, any d m + 2e out of the (n − 1) coordinates can reconstruct the whole codeword as long as the number of erroneous coordinates among the d m + 2e coordinates is not greater than e.The recovery of coordinate C i is guaranteed by (32) where a total amount of (p − 1)(s n /s m ) = (dm+2e)l dm−k+1 bits are accessed and communicated, achieving the lower bound (3) on access bandwidth for all d m , m ∈ [M ] simultaneously.Through a further slight modification of Construction 5, as presented in the following remark, we obtain the binary MSR code with optimal access property for all d ∈ [k + 1, n − 1] simultaneously.Remark By substituting s in Construction 5 with s = lcm(2, . . ., r), we can obtain binary MDS array code which achieves the lower bound(2) with optimal access property.The resultant code also achieves the lower bound (3) on access bandwidth for all values of d m ∈ [k + 1, n − 1] simultaneously with e-error resilient capability as long as d m + 2e ≤ n − 1.