ReShape: a decoder for hypergraph product codes

The design of decoding algorithms is a significant technological component in the development of fault-tolerant quantum computers. Often design of quantum decoders is inspired by classical decoding algorithms, but there are no general principles for building quantum decoders from classical decoders. Given any pair of classical codes, we can build a quantum code using the hypergraph product, yielding a hypergraph product code. Here we show we can also lift the decoders for these classical codes. That is, given oracle access to a minimum weight decoder for the relevant classical codes, the corresponding $[[n,k,d]]$ quantum code can be efficiently decoded for any error of weight smaller than $(d-1)/2$. The quantum decoder requires only $O(k)$ oracle calls to the classical decoder and $O(n^2)$ classical resources. The lift and the correctness proof of the decoder have a purely algebraic nature that draws on the discovery of some novel homological invariants of the hypergraph product codespace. While the decoder works perfectly for adversarial errors, it is not suitable for more realistic stochastic noise models and therefore can not be used to establish an error correcting threshold.

The construction of quantum codes often takes classical codes as a starting point. The CSS construction is one method for combining a pair of classical codes into a quantum code. However, the CSS recipe only works when the pair of classical codes are dual to each other. Unfortunately, some of the best known classical code families, such as those based on expander graphs, do not come in convenient dual pairs. The hypergraph product is a different recipe that allows a pair of arbitrary classical codes to form the basis of a quantum code [1]. Crucially, when the hypergraph product uses families of classical low-density parity check (LDPC) codes, it leads to families of quantum-LDPC codes. The quantum-LDPC property eases the experimental difficulty of implementation and, combined with suitably growing distance, ensures the existence of an error correction threshold [2].
Two of the most widely known quantum codes, the toric and planar surface codes, are hypergraph product codes that use the classical repetition code as their seed classical code. The decoding problem for the surface code can be recast as a minimum-weight perfect-matching problem, which is efficiently solved by the blossom algorithm [3], [4] and the unionfind algorithm [5]. Another interesting class of hypergraph product codes uses classical expander codes as their seed, with the resulting offspring called quantum expander codes [6], which are quantum-LDPC codes achieving both constant rate and Ω( √ n) distance. The classical expander codes can be decoded by a very simple bit-flip algorithm discovered by Spiser and Spielman [7]. This inspired the small-set flip decoder for quantum expander codes, which follows a similar idea but is slightly modified, and has been shown to correct adversarial errors [6], stochastic errors [8] and also to operate as a single-shot decoder [9]. However, any binary linear code can be used as a seed to build hypergraph product codes. Using classical codes other than repetition and expander codes, for instance the semi-topological codes proposed in [10], yield a broad range of hypergraph product codes for which there is no general propose decoder that is proven to work across the whole code family. For classical LDPC codes, using a belief propagation decoder (BP) works well in practice but it cannot be used out of the box on quantum-LDPC codes. In fact whenever a decoding instance has more than one minimum weight solution, it is degenerate, BP does not converge and yields a decoding failure. Degeneracy is the quintessential feature of quantum codes and therefore some workarounds are needed to use BP on quantum-LDPC codes [11], [12]. The literature offers many examples of BP inspired decoders for quantum-LDPC codes which show an error correcting threshold [10], [13]- [17], however none of them come with a correctness proof. Recently, a union-find like decoder has been proposed to decode quantum-LDPC codes [18]. The authors in [18] prove that their union-find decoder corrects for all errors of weight up to a polynomial in the distance for three classes of quantum-LDPC codes: codes with linear confinement (see [19], [20]), D-dimensional hyperbolic codes and D-dimensional toric codes for D ≥ 3. The decoder in [18] is therefore provably correct for adversarial noise, nonetheless a comprehensive investigation of its performance under stochastic noise is still missing.
Here we introduce the ReShape decoder for generic hypergraph product codes. Given a [[n, k, d]] hypergraph product code built using classical codes with parity matrices δ A and δ B , we assume access to a minimum weight decoder for parity matrices δ A , δ B , δ T A and δ T B . The ReShape decoder calls these classical decoders as blackbox oracles without any modification or knowledge of their internal working, and furthermore only requires O(k) oracle calls, and only a polynomial amount of additional classical computation. Under these conditions we prove that ReShape works in the adversarial setting, correcting errors (up to stabilisers) of weight less than half the code distance. Therefore, ReShape lifts the classical decoders to the status of a quantum decoder, providing the first general purpose hypergraph product codes decoder proven to correct adversarial errors. Formally we prove: Theorem 1. Any [[n, k, d]] hypergraph product code constructed from the classical parity check matrices δ A and δ B can be successfully decoded from error of weight up to (d − 1)/2 using O(k) oracle calls to classical decoders for arXiv:2105.02370v2 [quant-ph] 13 Jul 2022 the seed matrices and their transpose plus O(n 2 ) classical operations.
Theorem 1 though, does not state anything about stochastic noise or error correcting thresholds. Families of n-qubit hypergraph product codes have distance of at most O( √ n) and so they are bad codes in the sense that the distance is sub-linear. However, given a stochastic noise model with each qubit affected independently with probability p, the typical error size will be pn. Thus, for n > (d/2p), the most likely errors will not necessarily be corrected by ReShape and there is no guarantee that a threshold will be observed. Indeed, we implemented ReShape for several code families and found evidence that ReShape fails to provide a threshold (see Figure 4). A clear open problem is whether there exists a similar general lifting procedure, or modification of ReShape, for which one can prove good performance in the stochastic settings. Hence, if on one hand Theorem 1 provides a solution to the adversarial decoding problem for hypergraph product codes, on the other, a stronger, difficult and much longedfor result is desirable. Namely, the solution of the stochastic decoding problem for hypergraph product codes both on a theoretical level (proof of a threshold) and on a practical one (numerical observation of a high correcting threshold). Even so, ReShape still provides some improvement over state-of theart BP and union-find like decoders for stochastic noise. First, ReShape comes with a proof of correctness, that BP lacks; second, the proof works for all errors up to the optimal value of (d−1)/2, whilst the modification of union-find proposed in [18] is provably correct only for errors of weight up to Ad α , for some A, α > 0 and α < 1.

I. PRELIMINARIES AND NOTATION
A classical [n, k, d] linear code is compactly described by its parity check matrix H. The matrix H is a binary matrix of size m × n such that the codespace C(H) ⊆ F n 2 is described by: The codespace C(H) has dimension k = n − rank(H) and distance d defined as: where |v| is the Hamming weight of the binary vector v. Whenever the parity check matrix has columns and rows of small weight we say that it is a low density parity check (LDPC) matrix; when H has constant column and row weight w c , w r we shortly say that it is a (w c , w r )-matrix.
The classical decoding problem can be stated as: given a syndrome vector s ∈ F m 2 , find the minimum weight solution e ∈ F n 2 to the equation It is easy to show that the optimal decoder for any classical linear code can correct errors of weight up to half the code distance (see, for instance, [21]). A quantum [[n, k, d]] stabiliser code [22] is a subspace of dimension 2 k of the Hilbert space (C 2 ) ⊗n . It is described as the common +1 eigenspace of its stabiliser group S, an Abelian subgroup of the Pauli group P n such that −1 ∈ S. The Pauli group on n qubits is the group generated by the nfold tensor product of single qubit Pauli operators. The weight |P | of a Pauli operator P ∈ P n is the number of its nonidentity factors. We indicate by N (S) the normaliser of S i.e. the group of Paulis which commute with the stabiliser group S. Because S ⊆ N (S), the quotient group is well defined and referred to as homology group, see Appendix B. Elements [P ] of L are homology classes: equivalence classes with respect to the congruence modulo multiplication by stabiliser operators. Explicitly: and for any Pauli P , its homology class [P ] is uniquely defined via Eq. (3). Importantly, each Pauli P such that [P ] = [1] in L is an operator that preserves the codespace and has non-trivial action on it. We refer to such code operators modulo S as logical Pauli operators; with slight abuse of notation we write P ∈ L, meaning [P ] ∈ L. . Importantly, for a code of dimension k, L P k . The distance d of the code is the minimum weight of any non-trivial logical operator in L. Any generating set of the stabiliser group S induces a syndrome map σ. Namely, if S = S 1 , . . . , S m , the associated syndrome function σ maps any Pauli P ∈ P n in a binary vector s = (s 1 , . . . , s m ) T ∈ F m 2 such that s i = 0 if and only if P commutes with S i and 1 otherwise. We refer to the vector s as the syndrome. Conventionally, when considering a stabiliser code, it is always intended that a generating set {S 1 , . . . , S m } for the stabiliser group is chosen and with it a syndrome map. We say that a stabiliser code is LDPC if each S i has low weight and each qubit is in the support of only a few generators.
The decoding problem for stabiliser codes can be stated as: given a syndrome vector s ∈ F m 2 , find an operator E r ∈ P n such that (i) σ(E r ) = s and (ii) where E min is a minimum weight operator with syndrome s. We call any operator that satisfies (i) a valid solution of the syndrome equation and operators for which both (i) and (ii) are true, correct solutions.
Pauli operators can be put into a one-to-one correspondence with binary vectors, if we discard the phase factor ±i. In fact, any Pauli P can be written as: from which it follows: and two operators commute if and only if v, w + v , w = 0 mod 2 and anti-commute otherwise. This correspondence between binary vectors and Pauli operators is particularly handy when dealing with CSS codes [23], [24]. CSS codes are stabiliser codes for which the stabiliser group can be generated by two disjoint sets S x and S z of X and Z type operators respectively.
. . , Z[w mz ]} and we define H X and H Z as the matrices whose rows are the v i s and the w i s respectively, then the commutation relation on the stabilisers generators translate in to the binary constraint H X H T Z = 0. Using Eq. (5), it is easy to show that the syndrome for a Pauli error E = X[e x ]Z[e z ] is described by the two binary vectors s z = H Z e x and s x = H X e z . Since these two linear equations are independent, we can treat the X-part and Z-part of the error separately. For CSS codes, we define the X-distance d x as the minimum weight of an operator X[v] which commutes with all the stabilisers in S z but does not belong to the group generated by S x . Note that the weight of an operator X[v] equates the Hamming weight |v| of the vector v. Therefore, combining Eq. (4), Eq. (5) and the definition of d x , we shortly say that d x is the minimum weight of a vector v in ker H Z which does not belong to the row span of H X , i.e.
Similarly, d z is the minimum weight of a vector in ker H X not in Im H T Z . The Z-error decoding problem for CSS code can be stated as: given a syndrome vector s ∈ F mx 2 , find a valid and correct solution e ∈ F n 2 to the equation:  [25], [26].
In this article we focus on a sub-class of CSS codes, the hypergraph product codes [27]- [30]. We give a minimal description of these codes in Section II and we refer the reader to Appendix B for a more detailed presentation. We study some homology invariants for the logical operators of the hypergraph product codes in Section III-A. These invariants are the algebraic core upon which we design a decoder for these codes, the ReShape decoder. We prove that ReShape is an efficient and correct decoder for adversarial noise in Section III-B. We conclude with some consideration on the performance of ReShape under stochastic noise in Section III-C.

II. HYPERGRAPH PRODUCT CODES
We here present a bottom-up overview on hypergraph product codes. The purpose of this Section is dual: we both want to describe the hypergraph product codes with the least possible technical overhead and introduce the notation necessary to motivate and give an intuition for the results presented in Section III. We refer the reader interested in the homology theory approach to Appendix B.
The most well-known example of hypergraph product code is the toric code and its variations [31], [32]. The toric code is conventionally represented by a square lattice where qubits sit on edges, X-stabilisers are identified with vertices and Zstabilisers with faces. Since a square lattice has two kind of edges, vertical and horizontal edges, the first evident feature of this identification is that, accordingly, there are two type of qubits. The second is that each vertex/X-stabiliser uniquely identifies a row of horizontal edges and a column of vertical one, starting from the four ones that are incident to it. The third is that faces/Z-stabilisers, similarly to vertices, uniquely identify a column of horizontal edges and a row of vertical ones, starting from the four which lie on its boundary. Very similar attributes can be found in all the hypergraph product codes, as we now explain.
Consider two classical parity check matrices δ A , δ B of size m a × n a and m b × n b ; we indicate with C(δ A , δ B ) their hypergraph product code and refer to the matrices δ A and δ B as seed matrices. The qubits of the code C(δ A , δ B ) can be labelled as left and right qubits. Left qubits can be placed in a n a × n b grid and right qubits in a m a × m b grid, see Figure 1. Under this labelling, left and right qubits are uniquely identified by pair of indices (j a , j b ) and (i a , i b ) respectively, where j a , j b vary among the column indices of δ A , δ B while i a , i b vary among their row indices. Given a pair (L, R) of binary matrices, of size n a × n b and m a × m b respectively, we define the Z-operator: and similarly for X-operators. We refer to L as the left part of the operator and to R as its right part. The code C(δ A , δ B ) has m a × n b X-stabiliser generators which can be indexed by (i a , j b ). The X-stabiliser S x (i a , j b ) has support contained in the union of the j b th column of left qubits and the i a th row of right qubits. More precisely 1 , it acts as X[(δ A ) ia ] on the left qubits located at column j b and as X[(δ B ) j b ] on the right qubits located on row i a , see Figure  1b. Using the X-version of Eq. (8), S x (i a , j b ) is uniquely 1 Here and in the following, for a m × n matrix δ we indicate by δ i ∈ F n 2 the transpose of its ith row, and by δ j ∈ F m 2 its jth column.
represented by the pair of matrices, where E ia,j b is the all-zero m a ×n b matrix but for the (i a , j b )th entry which is 1. From the characteristic 'cross' shape of the stabilisers generators S x (i a , j b ), it follows that if (G L , G R ) is an X-stabiliser for C(δ A , δ B ), then (i) each column of G L , as a vector in F na 2 , belongs to Im δ T A and (ii) each row of G R , as a vector in F m b 2 , belongs to Im δ B . Similarly, Z-stabiliser generators are indexed by (j a , i b ) for 1 ≤ j a ≤ n a and 1 ≤ i b ≤ m b and S z (j a , i b ) is uniquely represented by the pair of matrices: for E jai b of size n a ×m b , with all entries 0 but for the (j a , i b )th entry which is 1.
The syndrome equation for hypergraph product codes can be derived combining Eq. (7) and the expression for the stabiliser generators. By Eq. (7), the ith bit of the syndrome vector s ∈ F mx 2 equates the inner product between the ith X-stabiliser generator, which corresponds to the ith row of the matrix H X , and the error vector. In the same way, by reshaping vectors into matrices (see Appendix B-A), the (i a , j b )th bit of the syndrome matrix S ∈ F ma×n b 2 equates the inner product of the (i a , j b )th X-stabiliser generator and the error matrices (L, R): and by linearity: It is easy to show that any Z-stabiliser has trivial X-syndrome, which is equivalent to X-stabilisers and Z-stabilisers commuting. As a consequence, C(δ A , δ B ) is a well-defined CSS code. A minimal generating set of logical Z-operators for C(δ A , δ B ) is given by: where: varies among a basis of ker δ T B , e ia varies among a basis of (Im δ A ) • , |e ia | = 1 .
Here, given a vector space In particular, the space V • is in general different from the orthogonal complement V ⊥ of the space V , see Appendix A for details. Similarly, a minimal generating set of logical X-operators is: where: L left x := (L, 0) : L = e ja · k T b , k b varies among a basis of ker δ B , e ja varies among a basis of (Im δ T A ) • , |e ja | = 1 , and L right By convention, we define the distance of the trivial code {0} to be ∞. In particular, whenever one or both seed matrices (or transpose) are full rank, one of the summands in the expression for k cancel out e.g. if δ A or δ B have full rank, then k = (n a − rk a )( The similarities in structure between general hypergraph product codes C(δ A , δ B ) and the toric codes (with and without boundaries) should now be clear: the toric code with boundaries (resp. without) of lattice size L is just the hypergraph product code C(δ L , δ L ) where δ L is the full-rank L − 1 × L (resp. non-full-rank L×L) parity check matrix of the classical [L, 1, L] repetition code, e.g. for L = 3: Left and right qubits correspond to vertical and horizontal edges; vertices and faces on the square lattice can be indexed in the natural way yielding the same stabiliser indexing of the general hypergraph product codes; string like (resp. loop like) logical operators correspond precisely to the left and right logical operators described above which have single column/single row support.
In what follows, we focus on Z-errors and their correction. With slight abuse of notation, we will refer to pair of matrices (L, R) as operators (and vice versa sometimes) where the identification is clear via Eq. (8). The corresponding results for X-errors are easily obtained by duality as per any CSS code. More precisely, by swapping the role of X and Z but also the role of rows and columns; alternatively, considering : : :

III. RESULTS
Here we present the ReShape decoder. The intuition behind ReShape is that we can look at hypergaph product codes as codes built combining (product) multiple copies of the same classical codes. As such, with due care, we can 'decouple' these copies and retrieve the original classical seed codes.
On a [[n, k, d]] hypergraph product code C(δ A , δ B ), Re-Shape works by splitting the decoding problem into k smaller classical decoding problems which can be solved using classical decoding algorithms for the seed matrices. In order to identify the k classical decoding problems, it applies a linear transformation, a change of basis, on the n dimensional codespace of C(δ A , δ B ), yielding a canonical form for error operators. This canonical form exposes two important features of the codespace: the first one is that logical operators of C(δ A , δ B ) are naturally partitioned into two sets, of left and right operators; the second is that the weight of each logical operator directly depends on the weight of the classical codewords of the seed codes. By writing an operator in its canonical form, we can immediately assess to which of the two classes it belongs and, via classical decoding, to which logical operator it is closest. Hence, we successfully detect and correct errors.
In this Section, we first proceed to study the algebraic invariants of the logical operators upon which the canonical form is defined. The correctness of ReShape, and so the proof of Theorem 1, strongly relies on the existence of these invariants. We detail the Reshape algorithm in Section III-B and discuss its limitations in Section III-C. All the proof of this Section are deferred to Appendix C.

A. Invariants
The characteristic shape of operators on the codespace of C(δ A , δ B ) and the structure of its stabilisers and logical operators, induces a canonical form for Z-operators in C(δ A , δ B ). More precisely, by combining the construction outlined in Section II and the definition of complement of a vector subspace (see Appendix A) we have proven the following: . Then, for the operator (L, R), the left part L can be expressed as a sum of a free part M L and a logical part O L such that every row of M L belongs to Im δ T B and every row of O L belongs to (Im δ T B ) • . Similarly, the right part R can be expressed as a sum of a free part M R and a logical part O R such that every column of M R belongs to Im δ A and every column of O R belongs to (Im δ A ) • . Hence, for (L, R) holds: We refer to the writing given by Eq. (CF) as canonical form of the operator (L, R).
Crucially, as we detail in Appendix C, it is always possible to 'move' the support of the free part of an operator from the left qubits to the right qubits and vice versa, by adding stabilisers. Opposite is the situation for the logical part: the support of the logical part of an operator cannot be moved from the left to the right qubits without changing its homology class. These two observations justify the name free and logical part in the canonical form of a Z-operator on C(δ A , δ B ). We refer to Figure 1 and 2 for a visual representation of the canonical form of a Z-operator on C(δ A , δ B ). In Figures 1b and 1e we see stabiliser operators in their canonical form: their free part has support pictured, their logical part is 0. In Figures 1c, 1d, 1f, 1g we see logical operators in their canonical form: their free part is 0, whilst their logical part, pictured, has support contained in either a line or a column of one of the two grids of qubits. In Figure 2 we see a Z-operator whose free and logical part are both non trivial.
Given a Z-operator (L, R) we define its row-column weight as The primary significance of this novel notion of weight is explained by Proposition 2, which also represents a key result towards the construction of the ReShape decoder.
Corollary 1 below further specifies the structure of logical Z-operators and it is easily derived from the proof of Proposition 2, which is deferred to Appendix C.
Equivalently, if (L, R) has canonical form given by for its logical row-column weight holds: The pivotal property of the logical row-column weight is expressed by Proposition 3.
Proposition 3. The logical row-column weight of a Zoperator on C(δ A , δ B ) is an invariant of its homology class.
Proposition 3 not only justifies the introduction of the notion of logical row-column weight but also constitutes the core resource upon which we prove the correctness of the ReShape decoder, which we now introduce.

B. The ReShape decoder
An hypergraph product code C(δ A , δ B ) is a CSS code and as such the decoding for X and Z error can be treated separately but in a symmetric way. Here we focus on Z-errors and therefore we measure a generating set of X-stabilisers. The Zerror decoding problem for C(δ A , δ B ) can be stated as: given a m a × n b syndrome matrix S, find a valid and correct solution (L,R) to the equation: ReShape (Algorithm 1) works separately on the left part L and on the right part R of the operator (L, R) and in fact it could be run in parallel (lines 1-10 and lines [11][12][13][14][15][16][17][18][19][20]. Starting from a valid solution (L, R), it minimises its logical row-column weight by minimizing #row log (L) (lines 1-10) first and #col log (R) after (lines [11][12][13][14][15][16][17][18][19][20]. Because the logical row-column weight is an homology invariant for Z-operators (Proposition 3) and ReShape minimises it, this suffices to assure that ReShape is correct, as stated in Proposition 4. ReShape works on the left part L of the inputted valid solution (L, R) (lines 1-10) into two steps: Decode and Split. Each of these two steps exploits a characteristic feature of the Zoperators on the codespace of C(δ A , δ B ): (i) Split step: a Z-stabilizer (G L , G R ) has left part G L such that every row is in the image of δ T B ; (ii) Decode step: a logical Z-operator which acts non-trivially on the left qubits has a representative (L z , R z ) such that at least one column of L z is in ker δ A \ {0}. The Split and Decode steps are similarly performed on the right part R, as specified in lines 11 -20 of the pseudocode in Algorithm 1. Again with reference to the left part as guide case, we now describe the Split and Decode steps in details and specify their computational cost. By extending this analysis to the right part, and thanks to Proposition 4, Theorem 1 is proved.
Let (L, R) be any valid solution of (SE) given in input to ReShape.
(i) Split. First, in lines 1-3, L is written in its canonical form with respect to the basis described by Eq. (16): Split: (16)  Decode: ρ j = D δ A (O j L ) 8: end for 9:L ← matrix whose columns are ρ j 10:L ←L + O L + M L 11: for all R j columns of R do 12: This operation has the cost of a change of basis over the vector space F n b 2 , namely from the canonical basis to the basis described by Eq. (16). A change of basis over a vector space is a linear operation that correspond to a multiplication by an invertible square matrix. Since we are interested in computing the image of this linear transformation for each of the n a column vectors of L, this amount to the multiplication of an n a × n b and a n b × n b matrix. To sum up, the Split step of ReShape has cost O(n a n 2 b ). (ii) Decode. The second step performed by ReShape (lines 6-10) aims to minimise the logical row-column weight of (L, R) by looking at non-homologically equivalent operators: If the computational cost of the classical decoder D δ A is O(c a ), the computational cost of the second step of ReShape is O(k b c a ). The Split and Decode steps described for the left part are replicated, with opportune modifications, for the right part. To be exact, if one or both δ A and δ B are full rank, then the right part does not encode any logical operator so the algorithm terminates 2 .
Proposition 4 below ensures that the recovery operator (L,R) found by ReShape is a correct solution of (SE), as long as the classical decoders D δ A and D δ T B succeed. It is important to note that the condition (13) on the weight of the original error is on its row-column weight, while usually decoding success is assessed depending on the weight of an operator, meaning the number of its non-identity factors. Obviously, for any operator (L, R) it holds: #row(L) ≤ |L| and #col(R) ≤ |R|.
As a consequence, Proposition 4 entails that ReShape succeeds in correcting any Z-error of weight up to half the code distance d z = min{d A , d T B }. Combining this with the cost analysis of the Split and Decode steps detailed above, gives a proof of Theorem 1.
It is worth to observe that actually ReShape can correct errors of weight strictly bigger than half the code distance, as long as they are not too 'spread'. In fact, whenever an error is homologically equivalent to an operator (L, R) such that L 2 In fact, as per Eq. (9) and Eq. (10), if rank(δ A ) = ma or rank( has 'few' non-zero rows and R has 'few' non-zero columns, ReShape succeeds. Formally, because by definition: #row(L) ≥ #row log (L) and #col(R) ≥ #col log (R).

Proposition 4 yields
Corollary 2. Provided that the classical decoders succeed, Re-Shape successfully corrects any Z-error (L, R) with bounded row-column weight: To sum up, ReShape successfully solves the decoding problem for any hypergraph product code requiring only k oracle calls to a classical decoder for the seed matrices, where k is the logical dimension of the code. Furthermore, it is able to correct for a vast class of errors of weight strictly bigger than half the code distance, provided that they have a 'good' shape. Here by 'good' we mean errors of low logical columnrow weight but arbitrary Hamming weight as for instance the Z-operator pictured in Figure 2, that has Hamming weight 23 but logical row-column weight (1, 0) and would therefore be successfully corrected by the ReShape decoder.
The next Section focuses on what happens when we cannot control the shape of the errors but we assume that the probability of a given error to occur decays exponentially in its weight.

C. ReShape for Stochastic noise
Up till now, we have focused on the adversarial noise model: errors on qubits are always correctable because we assume they have weight less than half the code distance. In real systems though, this is rarely the case and it is more faithful to assume that errors are sampled accordingly to a local stochastic noise model, where qubits errors have arbitrary location but the probability of a given error decays exponentially in its weight [3]. More precisely the probability of a Pauli error E ∈ P n to occur is given by: meaning that Pauli errors on each of the n qubits are independent and identically distributed. Under the binomial distribution associated to Eq. (14), the expected error weight on the encoded state is pn. Because the best possible distance scaling for the hypergraph product codes is ∼ √ n (when the classical seed codes have linear distance), as n increases, we eventually find pn > √ n/2 ∼ d/2. Nonetheless, it is well known that LDPC hypergraph product codes do have a positive error correcting threshold [2]. A family of codes has threshold p th > 0 if, for noise rate below p th , non-correctable errors that destroy the logical information occur with probability p non−correctable which decays exponentially in the system size [2], [8], [33]: for some α, β > 0. It is important to stress that Eq. (15) does not contrast with the fact that the typical error under the stochastic noise model will have weight pn. Instead, Eq. (15) entails that, among all the errors sampled, the non-correctable ones are only a small fraction. Beyond the theoretical threshold that Kovalev and Pryadko proved in [2], the literature offers several numerical evidence of decoders for hypergraph product or related families of codes which exhibit a threshold. Nonetheless these decoders either lack a correctness proof, e.g. BP in [10], [15], or need some additional constraints on the seed matrices, e.g. expander codes with small-set flip decoder [8], or augmented surface codes with the union-find decoder in [34]. On the contrary, for any choice of the seed matrices in the hypergraph product, ReShape is provably correct for adversarial errors. Not surprisingly though, ReShape does not show a threshold and a possible intuition for its anti-threshold behaviour is the following.
If we contrast Reshape with pairs of LDPC codes families and decoders which exhibit a threshold, such as expander codes with the small-set flip decoder [8] or hypergraph product codes with BP [10], [15], a feature of difference is the 'locality' of the decoding algorithms. Loosely, we say that a decoding algorithm is local if errors affecting distant regions on the qubit graph are dealt with separately and independently. We stress that a decoder's locality is a feature of the algorithm and it is not related to the locality of the code's stabiliser generators. A code can have local stabilisers, meaning that for a given layout of qubits in the space, stabiliser generators only involve qubits in a limited area, and yet be equipped with a non-local decoding algorithm. Indeed, ReShape is such a decoder. It is a non-local decoder that can be used on the very much local planar code. Locality of the decoding algorithm is relevant because local stochastic errors tend to form small disjoint clusters on the qubit graph which do not destroy the logical information as long as they are (1) small enough (2) sufficiently far apart. Therefore, if a decoder manages to mimic the error cluster distribution on the qubit graph and finds recovery operators accordingly, then it is likely to preserve the logical information and show a threshold. ReShape, on the other hand, has a deeply global nature. The Split step groups all the clusters of flipped qubits scattered across the qubit graph in a small pre-assigned region; a recovery is then chosen (Decode step) based on the syndrome information in this preassigned region. If we take the planar code as an example (see Figure 3), the Split step groups the error (and the syndrome) weight on one column of the left qubits. The subsequent Decode step decodes that column and finds a recovery operator with supported on the column. Because for the planar code a logical Z-operator can be chosen to have support on only one column of the left qubits, this procedure can easily destroy the logical information.
Our intuition on the performance of ReShape under stochastic noise finds confirmation in the plots reported in Figure  4. Even if at first sight the plots in Figure 4a and 4b could indicate the presence of a very low threshold (below 1%), a closer analysis suggests that this is not the case. In fact, as d increases, the crossing point between the dashed curve labelled d = 0 and the d-curves slips leftwards. Since the dashed curve is the locus of points where the failure probability p fail equates the noise rate p, it corresponds to the case of no encoding i.e. d = 0. The common crossing point, in other words, represents the pseudo-threshold of the code [35]. Importantly, if a code family has a threshold p th in the sense of Eq. (15), then all the codes of the family crosses the curve d = 0 at the same point of coordinates (p th , p th ). Figure 4c clearly illustrates this left slipping phenomenon for the toric codes without boundaries. For close distances d = 6, 8, 10, there it seems to be a common crossing point with the d = 0 curve. However, the crossing point lowers if we increase d more substantially, e.g. d = 20. The situation appears less clear in Figure 4d because the pseudo-threshold seems to increase with the distance of the code. Still though, there is no common crossing point of the three curves; besides, we would expect the same trend as the one observed for the toric codes if codes of bigger distance were considered.
In conclusion, ReShape is not suited to tackle stochastic errors on [[n, k, d]] code in the regime where typical errors have weight exceeding d/2.

IV. CONCLUSIONS AND OUTLOOK
In this paper we determined some important homology invariants of hypergraph product codes. Exploiting these invariants, we designed the ReShape decoder. ReShape is the first decoder to efficiently decode for all errors up to half the code distance, across the whole spectrum of hypergraph product codes.
We foresee two natural extensions of this work. The first is to adapt ReShape for it to work in the stochastic noise model settings. Because ReShape actually succeeds in correcting errors of weight substantially bigger than (d − 1)/2 (namely it corrects error of weight as big as ∼ d 2 , when they have the right shape!), this gives us some hope that ReShape would work under stochastic noise if paired with the right clustering technique. For instance, something on the line of the clustering methods used in the renormalisation group or the union-find decoders [5], [18], [36], [37].
The second is to find the corresponding invariants for other families of homological product codes. Specifically, for the codes in [38], which have 'rectangular' shaped logical operators instead of 'string' like as the standard hypergraph product codes here studied; or the balanced product codes proposed in [39]. Once found, the right invariants could be plugged-in an appropriately modified version of ReShape and yield a provable correct decoder for these class of codes too.

APPENDIX A LINEAR ALGEBRA: SPACE COMPLEMENT
In this Appendix we review some known linear algebra facts that we use in our proofs. We refer the reader for instance to [40], [41] for a detailed presentation on the topic.
Consider a m × n binary matrix δ. If rank(δ) = rk then we can choose binary vectors v 1 , . . . , v rk in F m 2 whose span is Im δ:  3. Graphical representation of one instance of ReShape for Z-errors. The code considered is the planar code of distance 3 (toric code with boundaries) or, equivalently, the [ [13,1,3]] hypergraph product code C(δ, δ) for δ full-rank parity check matrix of the distance-3 repetition code i.e. leftmost matrix in (11). We use the same graphical representation used in Figure 1. A minimum weight logical Z operator for C(δ, δ) can be chosen to have support on all the qubits of a column of left qubits (Decode, bottom grid of qubits, support in red). The row span of δ consists of all vectors in F 3 2 of even weight, hence for a generic Z-operator on C(δ, δ), all the rows of left qubits that have an even number of filled qubits belongs to its free part and do not contribute to its logical row-column weight. Since δ is full-rank, its column span is the whole space F 2 2 , and therefore a column displaying any choice of filled qubits is in the image of δ T . As such, there is no contribution to the logical-row column weight from the right part of the operator. In particular, there is no need to run the ReShape decoder on the right part: Algorithm 1 will not execute lines 11 -20. The figure is divided into to four sectors, one for each stage of the decoding cycle: Input, Split, Decode and Output. For what said on the image of δ, the free part M L of L is a matrix whose rows have all even Hamming weight. Since O L has 2 non-zero rows, (L, R) has logical row-column weight wt log rc (L, R) = 2. Decode -Algorithm 1, lines 1 -9 : the non-zero column (0, 1, 1) T of the logical part O L of L is given in input to a decoder D δ for the classical distance-3 repetition code. The solution found is (1, 1, 1) T , represented by the single column of black bits on the top. This solution is plugged in the hypergraph product code and yields a logical operator correction represented by the operator at the bottom with support on the red qubits. Output -Algorithm 1, line 10 : the output solution (L,R) is obtained by adding the input operator (L, R) and the operator found in the Decode step with support on the red qubits. The support of (L,R) is represented by the yellow qubits. We note that the solution found (L,R) has Hamming weight 7 and, by observing that only the first rows has odd weight, we deduce that its logical row-column weight is 1. It is easy to verify that (L,R) is indeed homologically equivalent to the minimum weight solution (L 1,3 , 0) whereL 1,3 ∈ F 3×3 2 is the matrix with all zeros but for the (1, 3)-th entry which is 1. In fact, (L,R) = (L 1,3 , 0) + S z (1, 1) + S z (2, 1) + S z (3, 1) thus, by (3) By selecting the pivot rows, we obtain a basis of F m 2 of the form: where the f i are unit vectors. Letting: we have: We refer to the space (Im δ) • as complement of the space Im δ. We remark that the complement V • is not equal to the orthogonal complement V ⊥ . To see how this is the case, consider Then the spaces V • and V ⊥ can be chosen as In particular,

APPENDIX B HYPERGRAPH PRODUCT CODES
CSS codes can be easily described in terms of homology theory [31], [42], [43] via the identification of the objects of the code with a chain complex [44]. For our purposes, a length chain complex is an object described by a sequence of + 1 vector spaces {C i } i over F 2 and binary matrices {∂ i : In the following, we use the symbol ∂ to indicate the maps of a chain complex of length > 1 and the symbol δ to indicate the map of a chain complex of length 1. Given a chain complex C: we can define a CSS code C by equating: Since ∂ 0 ∂ −1 = 0 by construction, X-type and Z-type operators do commute i.e. H X · H T Z = 0 and the code C associated to the chain complex (C) is well defined. The code C has length n = dim(C 0 ) and its dimension k equates to the dimension of the 0th homology group H 0 = ker ∂ 0 / Im ∂ −1 or, equivalently, to the dimension of the 0th co-homology group H * 0 = ker ∂ −1 / Im ∂ 0 . Its Z-distance and X-distance are given by the minimum Hamming weight of any representative of a non-zero element in H 0 and H * 0 respectively: An hypergraph product code C(δ A , δ B ), which is a CSS code, can be easily defined in terms of product of chain complexes. Consider the two length-1 chain complexes defined by the seed matrices δ A and δ B : We define their homological product as follows. Take the tensor product spaces The chain complex C A,B : is well defined. In fact: Therefore, the complex (C A,B ) defines a valid CSS code, which we denote by C(δ A , δ B ) and refer to as the hypergraph product code of the seed matrices δ A and δ B . If the classical code with parity check δ , δ T has parameters [n , k , d ] and [n T , k T , d T ] respectively ( = A, B) then the hypergraph product code C(δ A , δ B ) has parameters: [43].

A. Reshaping of vectors
One tool we make extensive use of, and from which our decoder takes its name, is the reshaping of vectors of a twofold tensor product space into matrices (see, for instance, [43], [45]). Consider a basis B of the vector space F n1 2 ⊗ F n2 2 : B = {a i ⊗ b j | i = 1, . . . , n 1 and j = 1, . . . , n 2 }.
Then any v ∈ F n1 2 ⊗ F n2 2 can be written as: for some v ij ∈ F 2 . We call the n 1 × n 2 matrix V with entries v ij the reshaping of the vector v. By this identification, if ϕ, θ are respectively m 1 × n 1 and m 2 × n 2 matrices, then (ϕ ⊗ θ)(V ) = ϕV θ T . The inner product between u ⊗ w and v in F n1 2 ⊗ F n2 2 can be computed as As we here detail, the identification of operators on the code space C(δ A , δ B ) with pairs of binary matrices that we used in the main text is rigorously justified by the reshaping of vectors into matrices. With slight abuse of notation, we refer to binary vectors and binary matrices as operators and vice versa, where the identification is clear via Eq. (8).

B. Graphical representation
Physical qubits of the code C(δ A , δ B ) are in one-to-one correspondence with basis elements of the space C 0 . If B of dimension n a , n b , m a , m b respectively, then the union of the two sets is a basis of C 0 . We refer to qubits associated to elements in B L , or its span, as left qubits and to those associated to B R , or its span, as right qubits. Since qubit operators are vectors in C 0 , by reshaping, they can be identified with pairs of matrices (L, R) where L has size n a × n b and R has size m a × m b ; in particular, L acts on the left qubits while R acts on the right qubits.
A Z-stabilizer for the code associated to the complex (C A,B ) is any vector in Im ∂ −1 . A generating set for Z-stabilizers is: where e ja and e i b are unit vectors of C 0 A and C 1 B respectively, i.e. they are a basis of the two spaces. Let E jai b ∈ C 0 A ⊗C 1 B be the reshaping of (e ja ⊗ e i b ), i.e. it is the matrix with all zeros entries but for the (j a , i b )-th entry which is 1. The reshape of ∂ −1 (e ja ⊗ e i b ) is then given by the pair of matrices: Logical Z-operators are vectors in ker ∂ 0 which are not in Im ∂ −1 . Specifically, a minimal generating set of logical Zoperators is given by [30]: The reshaping of vectors inL z gives the set L z of Eq. (9) in the main text. The vector version of logical X-operators is likewise obtained from the set of matrices L x of Eq. (10).

APPENDIX C PROOFS
This Section contains all the proofs of the statements made in the main text.
Broadly speaking, in this work we wanted to characterize Z-errors operators on the codespace of C(δ A , δ B ) associated to the chain complex (C A,B ). In order to do so, we first studied the logical Z-operators of C(δ A , δ B ) and introduced a canonical form for them. From homology theory, we know that non-trivial logical Z-operators are associated to vectors in ker ∂ 0 which do not belong to Im ∂ −1 . Lemma 1 below describes all the vectors in ker ∂ 0 . Lemma 1. Let (L, R) ∈ C 0 be in ker ∂ 0 , then: Proof. Let (L, R) ∈ ker ∂ 0 . Then: Eq. (20) yields: for some V ∈ C 1 A ⊗ C 0 B . Eq. (21) entails that all columns of V belong to Im δ A while its rows belong to Im δ T B . As a consequence, it must exists U ∈ C 0 A ⊗ C 1 B such that: Therefore Eq. (21) can be re-written as: which yields: Equivalently, Eq. (22) states that L + U δ B has columns in ker δ A : and therefore: as in the thesis. Similarly, we find A proof of Proposition 1, reported below for clarity, follows directly combining what said in Appendix B-A and Lemma 1.
Proposition 1 (Canonical form). Let (L, R) be a Z-operator on the codespace of C(δ A , δ B ). For a vector space V ⊆ F n 2 , we denote by V • any space such that V ⊕ V • F n 2 , (see Appendix A). Then, for the operator (L, R), the left part L can be expressed as a sum of a free part M L and a logical part O L such that every row of M L belongs to Im δ T B and every row of O L belongs to (Im δ T B ) • . Similarly, the right part R can be expressed as a sum of a free part M R and a logical part O R such that every column of M R belongs to Im δ A and every column of O R belongs to (Im δ A ) • . Hence, for (L, R) holds: (CF) We refer to the writing given by Eq. (CF) as canonical form of the operator (L, R).
In the main text, we have introduced the notions of rowcolumn weight and logical row-column weight for a Zoperator on C(δ A , δ B ). The definition of these two quantities finds its explanation in Proposition 2, whose proof builds on the results of Lemma 1. Proof. If (L, R) is a non-trivial logical Z-operator, it must anti-commute with at least one logical X-operator (L x , R x ). Because a Z-operator and a X-operator anti-commute if and only if their supports overlap on an odd number of positions, either L and L x or R and R x have odd overlap. Without loss of generality, we can assume that the former is verified and we can choose (L x , R x ) as a left operator of the form where f is a unit vector in (Im δ T A ) • and k ∈ ker δ B . In other words, we choose logical X-operator (f ⊗ k, 0) from the set of generators of X-logical operatorsL left x , as in the X-version of Eq. (19). The inner product equation for reshaped vectors Eq. (18) then yields: In particular, (L, R) belongs to ker ∂ 0 and thanks to Lemma 1, we can re-write it as: where columns of K A belong to ker δ A and rows ofK B belong to ker δ T B . Using Lemma 1's decomposition for (L, R) ∈ ker ∂ 0 , we can expand the matrix-vector product Lk as: Eq. (24) entails Lk = K A k and therefore that Lk, being a linear combination of column-vectors in ker δ A , belongs to ker δ A itself. Furthermore, by Eq. (23), Lk = 0. To sum up, Lk is a non-zero vector in ker δ A and therefore it must have Hamming weight at least d a . As a consequence, L is a matrix with at least d a rows: Similarly, we would have found: , if we had assumed that (L, R) anti-commuted with a logical X-operator (0, R x ) inL right x . Corollary 1 follows easily. Corollary 1. If (L, R) is a non-trivial logical Z-operator on C(δ A , δ B ), at least one of the following hold: (i) L has at least d a rows which are not in Im δ T B when seen as vectors in C 0 B .
(ii) R has at least d T b columns which are not in Im δ A when seen as vectors of C 1 A . Proof. Write (L, R) in its canonical form: and let for some binary matrices N L , N R of size n a × m b . As done in the proof of Proposition 2, consider a logical X-operator (f ⊗ k, 0) such that it anti-commutes with (L, R). Combining the canonical form of L and Eq. (24), yields: by Eq. (24) for some n a × n b matrix K A with columns in ker δ A . By the same argument used in the proof of Proposition 2, we find: and in particular that O L has at least d a non-zero rows. Since by definition of canonical form the non-zero rows of O L are precisely those rows of L which do not belong to Im δ T B , we have proven point (i). Point (ii) follows similarly in the case (L, R) anti-commutes with at least one logical X-operator of the form (0, R x ).
Corollary 1, together with Proposition 3 below, justifies the definition of the logical row-column weight for Z-operators on C(δ A , δ B ) (Definition 2). The logical row-column weight of (L, R) is denoted by the symbol wt log rc (L, R) and stands for the integer pair (#row log (L), #col log (R)) where #row log (L) is the number of rows of L that are not in Im δ T B and #col log (R) is the number of columns of R which are not in Im δ A . Proposition 3, that we now prove, states that the logical row-column weight of a Z-operator on C(δ A , δ B ) is an homology invariant of the chain complex (C A,B ) and therefore it legitimates the name choice for this quantity.
For some n a × m b binary matrix U . Eq. (25) entails that any row of G L belongs to Im δ T B and any column of G R belongs to Im δ A . Therefore, if we write (L, R) in its canonical form: we see that we can 'delete' all the rows of M L by adding a stabiliser and hence 'move' part of the support of the operator (L, R) from the left qubits to the right qubits. Specifically, if M L = N L δ B for some n a × m b binary matrix N L , we consider the stabiliser G = (N L δ B , δ A N L ) and we obtain: Similarly, we could move the M R part of the operator (L, R) from the right qubits to the left qubits, by adding the stabilizer G = (N R δ B , δ A N R ), for a n a × m b matrix N R such that On the other hand though, it is not possible to delete nonzero rows of O L via stabiliser addition. In other words, it is not possible to remove, via stabiliser addition, any of the rows of L that are not in Im δ T B . Hence, the number #row log (L) of non-zero rows of O L is an homology invariant. Likewise, we find that it is not possible to delete any column in O R by adding stabilisers and therefore #col log (R) is a logical invariant too.
The proof of Proposition 3 actually entails a stronger result than the invariance of the row-column weight of Z-operators on C(δ A , δ B ). Namely, we have proven that the indices of the rows and the columns in the sets row log and col log respectively, are homology invariants of the reshaped Z-operators (L, R) on C(δ A , δ B ). However, because to prove the correctness of ReShape it is sufficient to look at the cardinality of the two sets row log and col log , we decided to state Proposition 3 in this more compact and elegant form.
We can now prove Proposition 4. Suppose that the minimum weight operator (L min , R min ) with syndrome S has (d a /2, d T b /2)-bounded logical row-column weight i.e. wt log rc (L min , R min ) = (#row log (L min ), #col log (R min )), is such that Then, on input D δ A , D δ T B , S and (L, R), ReShape outputs a correct solution (L,R) of (SE), provided that the classical decoders D δ A , D δ T B succeed. In other words, the solution (L,R) found by ReShape is in the same homology class as the minimum weight operator with syndrome S: Proof. This is a proof by contradiction: we suppose that the minimum weight solution and the solution found by ReShape (Algorithm 1) are not homologically equivalent and we find as a consequence that the minimum weight solution need to have high logical row-column weight.
Let (L, R) be the valid solution of (SE) in input to ReShape and (L,R) be the recovery operator found.
First note that σ(L,R) = σ(L, R). In fact, the Split step only finds the canonical form of (L, R) and therefore changes neither the operator (L, R) nor its syndrome. The Decode step, possibly adds to (L, R) logical Z-operators (L z , R z ) such that σ(L z , R z ) = 0 and therefore, even when it changes the operator, it preserves its syndrome.
Suppose now that the solution found by ReShape and the minimum weight solution (L min , R min ) of (SE) belong to two different homology classes: Since both (L min , R min ) and (L,R) are valid solution of (SE), they must differ for an operator with zero X-syndrome. Because (L min , R min ) and (L,R) are not homologically equivalent, they must differ for a non-trivial Z-operator in the normaliser N (S) of the stabiliser group. As such, they must differ for an operator which is the sum of a Z-stabiliser and a non-trivial logical operator: (L min , R min ) = (L,R) + (G L , G R ) + (L z , R z ), where (G L , G R ) is a Z-stabiliser and (L z , R z ) is a non-trivial logical Z-operator.
Without loss of generality we assume that (L z , R z ) is nontrivial on the left qubits, meaning that L z has at least one nonzero column in ker δ A . The proof is substantially the same in case it is non-trivial on the right qubits.
First, write the left operators L min andL in their canonical form with respect to the same unit-vector basis used to write the logical operators in L z (see Eq. (16) Note that, by construction, the left operator L z +G L is already in its canonical form, where L z is its logical part and G L is its free part. By Eq. (17), the sum is direct and therefore the equality given by Eq. (26) must hold component-wise for the free part and the logical part: