Near-Term n to k Distillation Protocols Using Graph Codes

Noisy hardware forms one of the main hurdles to the realization of a near-term quantum internet. Distillation protocols allows one to overcome this noise at the cost of an increased overhead. We consider here an experimentally relevant class of distillation protocols, which distill $n$ to $k$ end-to-end entangled pairs using bilocal Clifford operations, a single round of communication and a possible final local operation depending on the observed measurement outcomes. In the case of permutationally invariant depolarizing noise on the input states, we find a correspondence between these distillation protocols and graph codes. We leverage this correspondence to find provably optimal distillation protocols in this class for several tasks important for the quantum internet. This correspondence allows us to investigate use cases for so-called non-trivial measurement syndromes. Furthermore, we detail a recipe to construct the circuit used for the distillation protocol given a graph code. We use this to find circuits of short depth and small number of two-qubit gates. Additionally, we develop a black-box circuit optimization algorithm, and find that both approaches yield comparable circuits. Finally, we investigate the teleportation of encoded states and find protocols which jointly improve the rate and fidelities with respect to prior art.


I. INTRODUCTION
Entanglement is a key feature of quantum mechanics, and is the fundamental resource to be distributed in the quantum internet.Unfortunately, experimental setups are imperfect, leaving entanglement noisy in practice.Entanglement distillation is any procedure using local operations and classical communication that (usually probabilistically) converts n input states to (usually) a smaller number of states k with increased fidelity [1], [2], [3], [4].Distillation thus allows for overcoming the effects of inherent noise in any physical implementation of a quantum network.
Finding good distillation protocols that are also feasible experimentally is thus important for the workings of future quantum networks [5], [6].This motivates us to study distillation protocols that 1) distill from n to k pairs for n relatively small, i.e. n 10, 2) require only a single round of communication, and 3) use only operations that are relatively simple to implement.For the latter, we allow both parties to apply operations of the form C T ⊗C † , where C is a Clifford circuit, i.e. constructed from H, S and CNOT gates.Such Clifford circuits are relevant since they form a key component for quantum applications and can be Fig. 1: Correspondence between bilocal Clifford distillation protocols and stabilizer codes.On the left we show the general form bilocal Clifford distillation protocols can take.That is, Alice and Bob apply C T and C † for some Clifford circuit C, and then measure out the last n − k pairs.They then use classical communication to send the measurement outcomes to one another, and use those to decide on whether to keep the states and/or apply a final correction.On the right we show a stabilizer code, which takes in a state |ψ , and transforms it to a logical state |ψ L by applying a Clifford circuit C to |ψ and n − k auxiliary qubits.There is a one-to-one correspondence between stabilizer codes bilocal Clifford distillation protocols and stabilizer codes, given by using a fixed Clifford circuit C in both cases.
Our goal is to find good near-term bilocal Clifford distillation protocols.To this end, we use two methods.Firstly, an approach based on graph theory to find provably optimal (with respect to any measure) bilocal Clifford protocols in the case of uniformly depolarized states and no noisy operations.Secondly, an approach based on black-box optimization with genetic algorithms [5].This framework is flexible, allowing for a heuristic optimization even when considering arbitrary Pauli noise, noisy circuits and limitations on the number of qubits that can be simultaneously processed.
The graph-theoretical framework reduces the optimization over bilocal Clifford protocols to a smaller set of certain equivalence classes on graphs of n + k vertices.The number of equivalence classes is significantly smaller than the number of possible Clifford circuits, allowing us to optimize by performing a full enumeration.
We compare circuits found using the graph-theoretical approach and with the black-box algorithm.We find that both approaches yield similar results, where each approach works best in different parameter regimes.Finally, we consider the procedure of teleporting and correcting encoded states.This requires two parties to share a bipartite state of local dimension 2 k .These states can be generated in multiple ways.Here, we consider creating k bipartite states, creating k distilled bipartite states out of 2k states through use of the DEJMPS protocol [3], or by distilling once n pairs to k pairs.We find that the latter option can provide higher fidelities and success probabilities, while also using fewer resources than distilling k pairs independently.
The rest of this work is structured as follows.We start by laying down the preliminaries and the used notation in Section II.In Section III we detail explicitly the correspondence between stabilizer codes and Here we show an alternative approach to how one could implement a subset of the stabilizer encodings.That is, first prepare the k qubit input state (the corresponding qubits are called input vertices).Then prepare n output qubits in the |+ state.Then, CZ gates are applied according to some simple graph on n + k vertices, where we distinguish between the in-and output vertices.Such objects we call (n, k)-graphs.Then, on the right the input qubits are measured in the X-basis, initializing the remaining n output qubits in some logical state |ψ L .When correcting against depolarizing noise, it suffices to consider encodings performed in this way [15].This thus reduces the optimization to one over (n, k)-graphs.Finally, we reduce the search space even further by showing that (n, k)-graphs that are equivalent under so-called local complementations, edge flips and (in the case of permutationally invariant depolarizing noise) permutations of the input vertices and permutations of the output vertices yield equivalent distillation protocols.We note that the (n, k)-graph formalism can also be used to construct circuits that implement the corresponding distillation protocols/stabilizer codes (not shown in this figure).
bilocal Clifford protocols.We specialize this correspondence to the case of distilling an n-fold tensor power of a Werner state in Section IV.This allows us to study bilocal Clifford distillation protocols through the study of graph codes.In particular, we show it is possible to find all bilocal Clifford distillation protocols on an n-fold tensor power of a Werner state for several values of n and k by searching over all graph codes.In Section V, we detail a way to convert a bilocal Clifford distillation protocol via a corresponding graph code into a circuit.We then discuss certain heuristics that can be used to improve circuits (such as reducing the depth) given a graph code.Given a circuit of a distillation protocol, we discuss briefly how to calculate the quantities of interest in Section VI.These quantities are the probability and the coefficients of the output state as a function of the observed measurements.Using the above tools, we analyse the performance of our found protocols for several communication tasks/metrics in Section VII.We end with concluding remarks and potential avenues for further research in Section VIII.
II. PRELIMINARIES Here we set our used notation and definitions, most of which is similar to the notation in [14].We denote by F 2 the field with two elements.Relevant single-qubit operations are given by the Pauli operators I, X, Y, Z, Hadamard gate H and phase gate S. A subscript indicates a specific qubit, e.g.H 2 denotes a Hadamard gate acting on the second qubit and the identity I acting on the remaining qubits, where we assume there is an ordering given on the qubits.We use the term single-qubit Clifford operations to refer to the elements in the group generated by Hadamard and phase gates on each qubit.
The relevant two-qubit operations are given by the controlled-not operation CNOT ij , controlled-Z operation CZ ij and swap operation SWAP ij .For the CNOT ij operation, the subscripts i and j indicate the control and target, respectively.
The Pauli operators expanded in the computational basis are given by These single-qubit Pauli operators can be extended to n qubits, yielding the Pauli group P n .The group P n consists of all matrices that are tensor products of Pauli operators, up to phases from {±1, ±i}.That is, P n ∼ = P n / iI ⊗n .With abuse of terminology we will say that two elements of P n (anti-)commute if arbitrary elements in their pre-images (anti-)commute.Note that this is well-defined, since it does not depend on the choice of elements in the preimage.
The weight wt of an element of P n is the number of non-identity Pauli elements in the string.For a subset S of P n , let E w (S) be the number of elements in S with weight w.We will refer to the collection of E w (S) as the weight enumerator of S. Furthermore, define the weight enumerator polynomial of S as E (S, x, y) = n w=0 E w (S) x n−w y w .These objects are related to the weight enumerators used in (quantum) error correction [16], and will turn out to be useful to express the output states of distillation protocols with.
The Clifford group C n on n qubits is the group generated by H, S operations on any qubit, and CNOT ij between any two qubits i and j.The Clifford group acts on P n by conjugation, and in fact each automorphism of P n that preserves the commutation relations arises as the conjugation by some C ∈ C n .

A. Symplectic representation
There is a convenient representation of Pauli operators (without phase) and the action of the Clifford group on the Pauli operators in terms of linear algebra over F 2 .
Elements of P n are represented by elements of F 2n 2 .In particular, X i and Z i are represented by the standard basis vectors e i and e i+n , respectively.The representation can then be linearly extended to arbitrary Pauli strings.It can be checked that multiplication in P n corresponds to vector addition in F 2n 2 .Let Ω = 0 I n −I n 0 and ω : F 2n 2 × F 2n 2 → F 2 be the standard symplectic bilinear form given by Two Pauli strings commute iff ω evaluated on the two corresponding binary vectors v, w equals zero.Furthermore, conjugation by a Clifford corresponds to a symplectic linear transformation, i.e. there is a surjective group homomorphism from the Clifford group to the symplectic group of order n over F 2 , Thus Sp(2n, F 2 ) consists of those matrices M such that ω(M v, M w) = ω(v, w), ∀v, w ∈ F 2n 2 .

B. Graph theory
We consider here only simple undirected graphs -that is, graphs with no loops and at most one edge between any two vertices.A graph G = (V, E) has a vertex set V and edge set E, the latter of which has as elements unordered pairs of vertices.The neighborhood N v of a vertex v is the set of all adjacent vertices of v, i.e.
is defined as the graph with vertex set S and an edge set containing all edges that are incident with vertices in S only.Furthermore, G − S is defined as A local complementation τ v is an operation on a graph G that for a vertex v takes the graph complement on the induced subgraph G [N v ], while leaving the rest of the edges invariant [17].That is, for each pair of vertices in the neighborhood of v, an edge is added if it was not present, and removed if it was present.We show an example of a local complementation in Fig. 3. Two graphs that are related by a sequence of local complementations are LC equivalent.These operations will be important to describe operations on representations of distillation protocols.Finally, the chromatic index of a graph G will be useful for us to express minimum circuit depths with.The chromatic index of a graph G is the smallest number of colors needed to color the edges of G such that no two incident edges have the same color.

III. DISTILLATION AND ERROR CORRECTION
In this section we define bilocal Clifford distillation protocols and stabilizer codes, and demonstrate a useful correspondence between the two.

A. Bilocal Clifford protocols
Bilocal Clifford protocols are distillation protocols where Alice and Bob first apply C T ⊗ C † , for some Clifford circuit C, see Fig. 4.These Clifford circuits are composed of Hadamard gates H, S gates, and CNOT gates.Afterwards, they measure out the last n − k qubit pairs in the computational basis, and communicate their outcomes to each other.They both calculate the syndrome string b of length n, where b i equals zero for 1 ≤ i ≤ k, and equals the parity of the sum of the two outcome bits of the measurement on the i'th pair for k < i ≤ n.Depending on the outcome, Alice and Bob call the distillation a success or failure, and are otherwise allowed a final local unitary in the case of success.We will consider first only the case of post-selecting on b = 0 (which we will also refer to as the trivial measurement syndrome), and consider the general case later in Section VI.
The states that Alice and Bob distill are Bell pairs with noise applied to them.In particular, we assume Bell-diagonal noise, i.e.N P (•) = P ∈Pn p P P (•) P † .That is, the noise corresponds to having applied the Pauli strings P with probability p P .We can assume without loss of generality that the noise is applied to only one side of the Bell pairs.This is due to the identity A T ⊗I |Φ + ⊗n = I ⊗A |Φ + ⊗n , where A is any matrix of the appropriate size [18].Bell-diagonal noise is not only a relevant error model [14], but states can always be transformed to be of Bell-diagonal form by applying only local operations and classical communication whilst preserving the fidelity [19].
Define the set P k by N P (•) = P ∈Pn p P P (•) P † .In c), we use that A T ⊗ I |Φ + ⊗n = I ⊗ A |Φ + ⊗n for any matrix A of the appropriate size [18].For d), we use that Cliffords act on the group of Pauli strings P n by conjugation.
The channel can therefore be written as N P (•) = P ∈Pn p P P (•) P † with P = C † P C.
The probability of a measurement with the all-zero syndrome string b = 0 depends only on the set of P ∈ P n that are mapped to P k under the map P → CP C † [14].Equivalently, these are all elements in the subgroup C † (P k ), where we abuse notation and use the shorthand C † (P k ) = {C † P C | P ∈ P k }.The probability p b succ for observing the b = 0 syndrome is given by Similarly, the fidelity for the all-zero syndrome string b = 0 is determined by the P ∈ C † (B k ), where B k is the set defined as The output fidelity F b (with respect to the k-fold tensor power of |Φ + ) for the case of b = 0 is given by As was shown in [14], the set C † (P k ) determines the set C † (B k ) and vice versa.This is because the elements of P k are exactly the elements that commute with all of B k , and vice versa.Since conjugation by Cliffords is an automorphism on P n , the image of P k is uniquely determined by the image of B k (and vice versa) under such a conjugation.We note here that constructing the inverse of C (in particular in the symplectic picture) can be done efficiently.A distillation protocol is characterized by its distillation statistics -that is, the multiset of its output states (up to local operations) and success probabilities, for all possible values of b.

B. Stabilizer codes
A stabilizer group B is defined as an Abelian subgroup of the Pauli group on n qubits P n , not containing the −I element.A stabilizer group acts on C 2 n , the statespace of n qubits, and stabilizes a subspace of dimension 2 k .This subspace is the stabilizer code associated with B. The basis codewords of a stabilizer code are a (non-unique) collection of states that form a basis for the stabilized subspace, the elements of which we will also refer to as codewords.
Given a stabilizer group B, let B ⊥ be the set of elements in P n that commute with all elements in the stabilizer group.This set forms another group, which turns out to be an important group for quantum error correction [16].In the symplectic picture, the two subgroups correspond to so-called isotropic and co-isotropic subspaces, respectively [20], and form each others complement under the symplectic form ω.
An important further quantity of a code is its distance d.The distance is the smallest weight error E ∈ P n that maps one codeword to another.In terms of the stabilizer group B, this is the largest integer d such that E w (B) = E w B ⊥ , for all 0 ≤ w < d, see [16].
The Clifford group acts transitively on all stabilizer codes of fixed n and k.In other words, given a fixed [n, k, d] stabilizer code, it is possible to apply Clifford operations to it to obtain any other possible [n, k, d ] stabilizer code, which follows from the fact that the symplectic group acts transivitely on symplectic bases [20].
For such a fixed stabilizer code, we can choose a particularly simple one.For given n and k, we fix the stabilizer subgroup B base as the one generated by Z k+1 , Z k+2 , . . ., Z n .Applying a Clifford circuit C † to the stabilizer group B base gives a new stabilizer group C † B base C. We have used C † instead of C, which will turn out to be convenient later on.We note that the states stabilized by B base are the states of the form |ψ |0 ⊗(n−k) , where |ψ is an arbitrary state on k qubits.We note that stabilizer states correspond precisely to [n, 0, d] stabilizer codes [21], [22].

C. Correspondence
The above-mentioned stabilizer subgroup B base is exactly the same as B k .Furthermore, P k is the same as B ⊥ base .Thus, applying C † to B base defines a new code C † B base C, which also sets the P ∈ P n that get mapped P → CP C † to B k .As mentioned above, this specifies the output state (up to local unitaries) and the success probability.More explicitly, for a given stabilizer code that encodes a k-qubit state |ψ into n qubits by applying C to |ψ |0 ⊗(n−k) , the corresponding distillation protocol corresponds to Alice and Bob applying the circuit C T ⊗ C † and then measuring out the last k states in the computational basis, in effect measuring the stabilizers of the code.
We show the correspondence in Fig. 5.We note that the general case of the correspondence between quantum codes and distillation was considered in [23], which we consider here a special case of, namely the correspondence between stabilizer codes and bilocal Clifford protocols.From now on, we will refer interchangeably to codes and distillation protocols.
One detail here is that in the bilocal Clifford protocol picture a ± factor in front of a stabilizer is immaterial.In the stabilizer picture these prefactors do not change the actual error-correcting properties of the code, and we will ignore them here as well.

IV. REDUCTION TO GRAPH CODES
Here we show how we can reduce an optimization over all bilocal Clifford distillation protocols to one over a subset of graph codes in the case of permutationally invariant depolarizing noise.Depolarizing noise is a common noise model for quantum systems and for a single qubit corresponds to the following map ρ → (1 − p) ρ + pTr (ρ) I  2 , where Tr indicates the trace and I is the identity operator on the corresponding qubit.
Graph codes are a subset of stabilizer codes, and any of the basis codewords can be conveniently described by a graph G with n vertices, along with a linear combination of k linearly independent bitstrings a i of length n.First, we define , and where with {i,j}∈G CZ ij we abuse notation to mean that a CZ ij gate is applied for every edge {i, j} in the graph G .The set of basis codewords are then of the form where Z b is shorthand for a Z gate for each qubit corresponding to a 1 in the bitstring b ∈ F n 2 , and the b are all linear combinations of the a i .Since the a i are linearly independent, there are 2 k distinct b, so that the corresponding space is 2 k -dimensional.The viewpoint of graph codes as built from a graph G with a collection of Z-type operators/bitstrings has been used in for example [24] to construct quantum error correction codes.We note that for the case of k = 0, one retrieves the case of graph states [21], [22], since the span of the empty set is the trivial vector space.
An [n, k, d] graph code can also be described by the following procedure [15], [21], which will turn out to be useful for our purposes.First, prepare n output qubits in the |+ state, and prepare the state to be encoded in k input qubits.Now, CZ gates are applied between pairs of qubits, i.e. {i,j}∈G CZ ij for G some graph is applied.Unlike the codeword picture, the graph G here specifies the CZ gates to be applied also between input qubits and output qubits.As such, the graph G has n + k vertices, and not n vertices as in the graph used in Eq. ( 6).
To such a graph G we can thus associate a (family of) states of the form {i,j}∈G CZ ij |ψ |+ ⊗n , where |ψ is an arbitrary state on k qubits.The choice of |ψ only changes the state to be encoded, and does not change the error correcting properties of the code.
By measuring all the k input qubits in the X basis and applying a correction dependent only on the measurement outcomes, the input qubits are encoded in the n remaining output qubits [21].To specify a graph code, it thus suffices to specify a graph G and label the vertices as in-and output qubits, see Fig. 6 for an example.The example given there corresponds to the [4, 2, 2] code [25].
Furthermore, we will interchangably refer to input (output) qubits and input (output) vertices.Finally, we will refer to permutations of the vertices that permute the n output and k input vertices separately as (n, k)-permutations.[25].The two-qubit input to the code is initialized on the diamond vertices, and then measured in the X basis.After (local) corrections depending on the measurement outcomes, the input state is encoded on the remaining four vertices.
Let us now investigate the relation between the (n, k)-graph picture and the codeword picture from Eq. ( 6).First let us consider the case of k = 1, i.e. a single input qubit.Fix an (n, k)-graph G with a single input qubit (labelled by v), and prepare the input qubit in the state α |0 + β |1 .A measurement on that input qubit leads (after a correction consisting solely of Pauli operations) to a state α |G − {v} + Z Nv β |G − {v} , where |G − {v} is the graph state corresponding to the graph G with vertex v deleted, and Z Nv is shorthand for i∈Nv Z i .Now let us consider k arbitrary.After measuring out all input qubits v i ∈ V in and applying the necessary corrections, we find that we end up with a superposition of (in general) 2 k states of the form By equating Eqs. ( 6) and ( 7), we find that G = G − V in is the graph obtained by removing all input vertices v ∈ V in from G, and that the possible b are linear combinations (over F 2 ) of the k strings a i .Importantly, the a i are exactly those bitstrings that have a 1 for the vertices in G − V in connected to v i for each v i ∈ V in , and zero otherwise.We note that the correction that needs to be performed is a stabilizer of |G (and thus consists of only Pauli corrections), and is chosen to anti-commute with exactly those Z b that acquired a minus sign after the measurement.While both the codeword (from Eq. ( 6)) and (n, k)-graph picture are useful for understanding graph codes, the (n, k)-graph picture will be more fruitful than the codeword picture for the enumeration of such codes.On the other hand, the codeword picture is particularly useful for understanding how to construct distillation circuits (see Section V).For related literature on the (n, k)-graph picture, see [26], [25].
As mentioned above, graph codes are a strict subset of stabilizer codes that admit a convenient graphical representation.However, we will show that we can restrict to graph codes.First, let us define the subgroup K n of the Clifford group on n qubits as . This subgroup corresponds to permutation of the qubits, and single-qubit Clifford operations.We now define two equivalence relations on distillation protocols.Definition IV.2.Two bilocal Clifford distillation protocols are distillation equivalent if the two protocols yield the same output states (up to local rotations) with the same success probability when distilling an n-fold tensor power of a Werner state and when conditioning on seeing the trivial measurement syndrome b = 0. Definition IV.3.Two bilocal Clifford distillation protocols are locally equivalent if their associated subgroups B 1 and B 2 are equal up to conjugation by an element K in K n , i.e.
The motivation for the first equivalence is clear -if two protocols output the same state with the same probability, they are indistinguishable in their distillation capabilities, at least for b = 0. Ideally, one would call two distillation protocols equivalent if for each syndrome string b there exists another syndrome string b such that the output state for the first protocol with syndrome string b is the same as the output state up to local rotations for the second protocol with syndrome string b .This is however impractical for enumeration purposes, since the number of possible syndrome strings grows as 2 n−k , the number of coefficients to compare grows as 4 k , and each coefficient is described by a weight enumerator of length n + 1 (see Section VI).In Section VII we provide a heuristic motivation for restricting to the b = 0 case.Thus, an enumeration over distillation protocols means finding a set of pairwise inequivalent distillation protocols for fixed n and k.The second equivalence is motivated by the fact that K n is the subgroup of the Clifford group that stabilizes an n-fold tensor power of a Werner state.Thus, the states before measuring when distilling with circuits C and CK with K ∈ K n are equal, which means they are indistinguishable in their performance as a distillation circuit in the case of no noise.We note that the same equivalence was given in terms of double cosets in [14], and that local equivalence implies distillation equivalence.Now, every stabilizer code is equal to some graph code, up to single-qubit Cliffords [15], [27].This means that it suffices to consider graph codes up to permutation of the qubits.
While every bilocal Clifford protocol is equivalent to a graph code, this graph code is not unique.This induces an equivalence relation on graph codes themselves.It will turn out to be most convenient to phrase this equivalence on (n, k)-graphs.
Definition IV.4.Two (n, k)-graphs G 1 , G 2 are locally equivalent if there are two stabilizer states |ψ , |ψ such that {i,j}∈G 1 CZ ij |ψ |+ ⊗n and {i,j}∈G 2 CZ ij |ψ |+ ⊗n are the same up to (not necessarily singlequbit) Clifford operations on the input qubits and single-qubit Clifford operations plus permutations on the output qubits.
This equivalence under single-qubit Clifford operations and permutations on the output qubits stems from the same reasoning as in definition IV.3 when distilling Werner states.The equivalence under arbitrary Clifford operations on the input qubits stems from the fact that the state to be encoded does not change the error correcting properties of the code, as noted before.That is, the resultant codewords from Eq. ( 6) will not change, only their weights.The term locally equivalent is motivated by imagining the input qubits to being local to a single node, while the remaining qubits are assumed to be separated in space.We note that permutations on the output qubits are not local in this sense, however.
Proposition IV.5.Local equivalence on (n, k)-graphs is equivalent to the underlying (n, k)-graphs being related by a sequence of (n, k)-permutations, local complementations and edge flips, i.e. the addition or removal of an edge between two input vertices.
The above proposition follows from a result from [28], which deals with transforming graph states when qubits are grouped in such a way to be local to a node.In other words, each party is allowed to perform arbitrary Clifford operations on their locally held qubits.The result from [28] now states that two graph states |G , |G are related by such party-local Clifford transformations if and only if the underlying graphs are related by a sequence of edge flips and local complementations.Here, the edge flips are only allowed between vertices corresponding to a local party.
Furthermore, the equivalence relation can be relaxed to a finer -but better studied -equivalence relation.
Corollary IV.6.To enumerate all [n, k, d] bilocal Clifford distillation protocols, it suffices to enumerate over all graphs with n + k vertices up to graph isomorphism and local complementation, together with all subsets of the vertices with size k (which effectively corresponds to selecting the k input vertices).
We can furthermore restrict to connected graphs.That is because if G is not connected, there are qubits that do not interact with each other.The corresponding distillation protocol would then naturally decompose into smaller distillation protocols.Connected representatives under the LC + permutation equivalence relation have been found up to n = 12 [29], meaning that in principle we can enumerate all n to k distillation protocols such that n + k = 12.We note that a restriction to connected graphs was not possible from the viewpoint considered in for example [24].
For distillation protocols with n + k > 12, a naive method would be to partition the set of (n, k)-graphs into the equivalence classes directly.Similar to the approach from [30], [31] a more efficient approach exists, however.This approach is based on so-called extensions.We have not used this approach however, but detail it for completeness in Appendix A.
We close this section with two remarks.First, a slightly more general scenario can be considered where besides in-and output qubits there exist also auxiliary qubits.Similarly to the output qubits, these qubits are prepared in the |+ state and have the CZ gates applied to them.Unlike the output qubits however, they are measured out in the X basis, similar to the input qubits.Importantly, we do not have to consider the case of auxiliary qubits, since measuring an auxiliary qubit in the X basis maps graph states to graph states, where importantly the two possible graph states that can arise are LC equivalent [22].Thus, the resulting states can be transformed by single-qubit Cliffords, and thus will yield equivalent codes.
Finally, we note that we restricted ourselves in definitions IV.3 and IV.4 to equivalences phrased in terms of arbitrary Clifford operations, instead of arbitrary unitaries.This is motivated by the following.It was conjectured that equivalence of two graph states up to single-qubit unitaries implied equivalence up to single-qubit Clifford operations [32], [33].This was shown to be false, however [34].So far, there has been no good (graph-theoretical) understanding of the equivalence up to single-qubit unitaries for graph states, let alone for the case of k > 0. For this reason, we consider only equivalence up to Clifford operations.

V. DISTILLATION CIRCUITS
In the previous sections we used the (n, k)-graph representation to enumerate over bilocal Clifford distillation protocols.However, given an (n, k)-graph, it is not clear how to construct a bilocal Clifford circuit corresponding to the code.In particular, the encoding picture requires a total of n + k qubits, while there exists a bilocal Clifford circuit that only processes n qubits simultaneously.
In this section we provide first a way to construct a bilocal Clifford circuit from an (n, k)-graph.We then introduce heuristics for reducing the number of two-qubit gates (and/or optimize any other quantity of interest) of the corresponding circuits.

A. From graph codes to circuits
To find a circuit from a given graph code, we find a way to map the codewords of the code to codewords of the form where the b are all the 2 k bitstrings that are 0 on the last n − k indices.These codewords are chosen since they correspond to the situation after decoding, see the left-hand side of Fig. 5.
The codewords of a graph code are always of the form shown in Eq. ( 6).Applying the {i,j}∈(G−V in ) CZ ij circuit to such codewords yields codewords of the form Z b |+ n (where we have assumed an ordering on the vertices).Since the b are the linear combinations of the a i , it suffices to map the a i to a basis of the subspace that has a 0 for all the qubits that are to be measured.Since the a 1 , a 2 , . . ., a k are linearly independent it is possible to bring the matrix into row reduced echelon form with k pivots.By relabeling the vertices, it is possible to set the reduced echelon form to have pivots in columns 1 to k.It will be convenient to use such a labeling.In particular, let the in-and output vertices of an (n, k)-graph be labeled by respectively.Such a labeling also splits the output vertices into those that are kept and measured out by setting Definition V.1.A labeling V in , V out is a valid labeling if the row reduced echelon form of the matrix A has pivots in columns 1 to k.
An example of a valid labeling is shown in Fig. 7.A non-valid labeling would be one with output vertices 2 and 4 switched, since then is already in reduced echelon form but has pivots in columns 1 and 3.
Given a valid labeling of an (n, k)-graph, it is possible to find a canonical set of CNOT gates (up to ordering) such that the Z av i operators are mapped to have support on only V out keep .In particular, for 1 ≤ i ≤ k, perform a CNOT ji for every non-zero entry j = i in the i'th row of A. For example, the matrix A = 1 0 1 1 0 1 0 1 corresponds to performing CNOT 31 CNOT 41 CNOT 42 .Note that the CNOT gates in this construction have the control on qubits in V out meas and target on qubits in V out keep , and thus all commute.This fact will turn out to be useful for our heuristics for circuit construction later in this section.
Thus, to construct a circuit corresponding to an (n, k)-graph, a valid labeling needs to be established first.We emphasise that the labeling does not change the statistics when distilling an n-fold tensor power of a Werner state, and only affects the construction of the circuit.Then, CZ ij is applied for each edge in the graph G − V in .Afterwards the above construction for the CNOT gates is applied.Finally, for each qubit in V out meas a Hadamard is applied and then measured out.See Fig. 7 for an example of the circuit constructed from an (n, k)-graph (with the associated valid labeling).We note a related approach was taken in [26].We now use this circuit picture to show that it is always possible to remove CZ gates that act only on qubits that are kept (i.e.vertices in V out keep ), without changing the distillation statistics.Important for the proof are the following commutation relations, where the i, j, k, l are distinct.
Lemma V.2.Fix a valid labeling of an (n, k)-graph G. Then G is locally equivalent to an (n, k)-graph G that has no edges between any pair of vertices in V out keep .Furthermore, the edge sets of G and G differ only by edges {i, j} with i, j ∈ V out keep or i, j ∈ V out meas .Proof.Given a valid labeling, there is a canonical circuit (up to ordering of the CZ and CNOT gates).Assume for that circuit there is some CZ ij with i, j in V out keep .Let us now attempt to commute this CZ ij gate through one of the CNOT kl gates, where by construction k ∈ V out meas and l ∈ V out keep .There are two cases -either the CZ ij and CNOT kl gates commute, or a CZ ik gate is added.Repeating this procedure until the CZ ij gate is moved to the end will thus lead to a sequence of CZ pq gates (with p ∈ V out keep , q ∈ V out meas ) and CNOT kl gates (where as before k ∈ V out meas , l ∈ V out keep ), followed by the CZ ij at the end.Note that the CZ ij gate at the end is on qubits in V out keep , and thus does not change the distillation statistics.Each of the other CZ pq gates with p ∈ V out keep , q ∈ V out meas can now be commuted back to the other CZ gates at the beginning of the circuit.As before, either a CZ pq gate will commute with a CNOT kl gate, or add a CZ qk gate, with q, k ∈ V out meas .To summarize, since q, k ∈ V out meas , it is possible to commute a CZ gate acting on qubits in V out keep through the CNOT gates (after which it can be ignored since it acts on qubit pairs that are to be kept), without introducing any CZ gate acting only on qubits in V out keep .By repeating the above procedure for every CZ ij gate with i, j ∈ V out , there will eventually be no such CZ ij gate remaining.Furthermore, this procedure only added CZ gates between vertices in V out meas , and did not change any edges incident with V out .Thus, since the above procedure did not depend on which valid labeling was used, the statement follows.
We will use this Lemma in Section VI to find another way to enumerate distillation protocols using the symplectic formalism.
We close this section with the following two subtleties.While it is true that any [n, k, d] code is locally equivalent to a graph code specified by an (n, k)-graph, the converse is not true.That is, while any (n, k)graph specifies a stabilizer/graph code, it is not true that that code is necessarily a stabilizer [n, k, d] code.A trivial example is given when none of the k input qubits are connected with any output qubit.In this case, while the number of input qubits is greater than zero, the input state is prepared on a fixed state, and thus encodes no logical qubits.A less trivial example is given by the (n, k)-graph in Fig. 8. Here, the problem is that the resultant codewords of the code span a space of dimension less than k.This is because the two input vertices share the same neighbors.As noted before, this is due to the fact that the a i are not linearly independent.Note that such examples do not have any impact on any of the statements made in this section regarding our search for distillation protocols.
Finally, in the construction of the circuit a valid labeling of the (n, k)-graph was required.The labeling will lead to different constructed circuits, which could potentially lead to better circuits.We do not pursue optimizing over the different labelings, however.

B. Heuristics for circuit compilation
In the previous subsection we found a way to systematically construct a circuit from an (n, k)graph.Here we are concerned with constructing good circuits that achieve the same distillation statistics.Depending on the physical model, different criteria/metrics can be used for defining a good circuit.The first and most important metric we use is the number of two-qubit gates, which should be minimized.If Fig. 8: A graph with 2 input vertices, but whose corresponding code only encodes one qubit.decoherence over time is significant, it is important to minimize the depth of the circuit.If the gate noise is the predominant source of noise, we aim to reduce the number of two-qubit acting on the qubit(s) to be kept.We will refer to gates that act on the qubits to be kept as keep-gates for short.In what follows, we detail three heuristics methods to search through a set of circuits that achieve the distillation statistics corresponding to a given (n, k)-graph.
Firstly, given an (n, k)-graph G, we can construct a circuit using any (n, k)-graph that is locally equivalent to G.This is because the distillation statistics will necessarily be the same for the constructed circuits.As an example, we show a graph in Fig. 9 that is LC equivalent to the graph in Fig. 7 (by an LC on vertex 2).Note that the graph in Fig. 7 yields a shorter circuit.
Fig. 9: A graph code and corresponding circuit that is LC equivalent to the code and circuit in Fig. 7.
Here the CNOT gates map Z 1 Z 3 Z 4 and Z 1 Z 2 to Z 1 and Z 1 Z 2 , respectively.
Before moving on to the other heuristics, we investigate now briefly how to calculate (upper bounds) on the depth and the number of two-qubit gates corresponding to the circuit of an (n, k)-graph.First, the number of two-qubit gates is given by the sum of the number of CZ and CNOT gates.The number of CZ gates is equal to the number of edges in G − V in .The number of CNOT gates is equal to the non-zero entries of the reduced row echelon form of [a 1 , . . ., a k ] T minus the number of pivots.The depth needed to perform the CZ gates is equal to the chromatic index of G − V in , see Section II.For calculating the depth of the CNOT gates, we note that all the CNOT gates commute.Thus, the minimum depth for the CNOT gates is the chromatic index of the graph with n vertices and an edge between two vertices v i , v j if there is a CNOT ij gate.Finally, one more time step is needed to perform the layer of Hadamard gates.We note that in certain cases it is possible to perform some of the CZ and CNOT gates at the same time, which can reduce the depth even further.
Secondly, it is possible to change the order of all of the CZ and CNOT gates by commuting all of them through each other.For this, we use the commutation relations from Eq. ( 9).In certain cases, the additional CZ gates incurred will cancel with CZ gates already present, leading potentially to a smaller number of two-qubit gates/depth/keep-gates.
In the above paragraphs we had circuits that first had a round of CZ/CNOT gates, followed by a round of CNOT/CZ gates.As our final heuristic, we break this structure to find better circuits.First, note it is possible to apply a CNOT gate (just before measuring) with control and target on the n − k qubits that are measured out, without changing the distillation statistics.By commuting such a gate through (one of) the CZ gates, it is possible that some CZ gates will cancel.This can lead to keeping the total number of two-qubit gates the same (or even lower them), but allowing in certain cases to reduce the depth/keep-gates.Similarly, we also consider the case when permuting at most one of the CZ gates with the CNOT gates.We show an example of our heuristics in Fig. 10.In this example, we reconstruct the circuit also presented in [14], but which was found using a brute-force method.
The top left graph is known to correspond to the five qubit code [15].The top right graph is obtained after a local complementation on two adjacent output vertices.For the next graph we choose a specific labelling of the input qubits.The first circuit is constructed from the graph above it.The second circuit is obtained from commuting the CNOT gates through the CZ gates.The last circuit results from adding a CNOT 34 gate before measuring (which does not change the statistics), and commuting it through the CZ gates.Note that the last circuit has a lower depth than the one above it.We note that the last circuit is the same circuit as found in [14], but which was found using a brute-force method.
Thus, to heuristically find good enough circuit(s) for a given (n, k)-graph G, we first sample (n, k)equivalent graphs by randomly applying local complementations (using the implementation from [35]) and edge flips.For each (n, k)-equivalent graph G , we calculated the number of two-qubit gates for the circuit found directly from G , and also from the circuit found from commuting all CZ gates through the CNOT gates.Out of these, only the circuits with the smallest number of two-qubit gates was kept.After having sampled through a sufficient number of (n, k)-graphs, the heuristics from the previous paragraph are applied to minimize either the depth or number of keep-gates.

VI. ENUMERATING PROTOCOLS AND CALCULATING STATISTICS IN THE SYMPLECTIC PICTURE
With the ability to enumerate all bilocal Clifford protocols, we need a way to gauge the performance of a given distillation protocol.The quantities of interest are the success probability (for a given observed syndrome b) and the coefficients of the output state (conditioned on observing b).These quantities will depend on the initial probability distribution of the input state {p P } P ∈Pn and the given Clifford circuit C. Calculating these quantities in the density matrix formalism becomes unwieldy and impractical.Luckily, all of the necessary calculations can be phrased in the stabilizer/symplectic formalism.
In this section, we first construct the symplectic matrix given an (n, k)-graph.Then, we show how to reduce the search space of distillation protocols to symplectic matrices of a certain form.We close with discussing how to calculate the quantities of interest for distillation.

A. Constructing symplectic matrices
Here we describe how to find the symplectic matrix M given an (n, k)-graph.Following the recipe from Section V, we first apply a CZ ij gate for each edge {i, j} ∈ G − V in .We use the fact that the symplectic representation of {i,j}∈G−V in CZ ij is equal to , is the adjacency matrix of G−V in where we have rewritten the matrix without loss of generality with . Now, the CNOT ij gates are applied.Let T ∈ F (n−k)×k 2 be the matrix with T i,j = 1 if a CNOT gate is performed between j + k and i and 0 otherwise.Note that T is the bottom (n − k) × k submatrix of A T .The resulting symplectic matrix is then of the form Now the final layer of Hadamard gates is applied.For convenience, we multiply both from the left and right with H ⊗n .Note that multiplying by the right with H ⊗n does not change the distillation statistics, since H ⊗n ∈ K n .
The symplectic matrix is then of the form with A and B as above.
Note that by Lemma V.2 it suffices to consider those (n, k)-graphs such that Q = 0. We then retrieve the following.
Theorem VI.1.Given a symplectic matrix M corresponding to a distillation protocol, there is always a matrix M of the following form that will yield the same distillation statistics, and S ∈ F is symmetric with zeroes on the diagonal.Now let t i , r i be the i'th column of T and R, respectively.Using a similar argument from [14], it suffices to consider those T, R such that for each 1 ≤ i ≤ k it holds that t i ≤ r i ≤ t i + r i .Furthermore, it suffices to consider for S the adjacency matrices of all graphs of order n − k up to graph isomorphism.This result is a generalization from Lemma V.I in [14].
We use corollary IV.6 and Theorem VI.1 to perform our enumeration over distillation protocols.Interestingly, in certain cases one of the two approaches work better.For example, corollary IV.6 allows for a full enumeration over all n = 9 to k = 1 protocols within a reasonable time, while this is not possible using the approach from Theorem VI.1.On the other hand, since a characterization of LC equivalent graphs is missing for up to 17 vertices, we could only enumerate over all n = 10 to k = 7 protocols using Theorem VI.1.

B. Distillation statistics from symplectic matrices
With a given symplectic matrix M in hand, we now turn to calculating the corresponding distillation statistics.As defined before, let b ∈ {0, 1} n be such that for 1 ≤ i ≤ k b i = 0, and b i is the parity of the two outcome bits of the measurement on the i'th pair for k < i ≤ n.Before delving into the calculations, let us first motivate the idea of post-selecting on sets of different measurement syndromes.A number of entanglement distillation protocols (such as those studied in [14] and [5]) were based on error detection -that is, only the b = 0 case was deemed a success.On the other hand, one can consider all possible syndrome strings, such as done in [36].This is commonly called error correction.Error detection succeeds with a lower probability than error correction (since there are less accepted syndromes), but will have a higher (average) fidelity.This motivates us to consider arbitrary sets of syndrome strings to accept -this will lead to a more fine-grained trade-off between the success probability and average fidelity.
For the symplectic matrix M corresponding to a given distillation protocol and observing a given syndrome b, we find a success probability of where v b is the symplectic representation of the operator X b = n i=1 X b i .This is because observing the syndrome b corresponds to applying the operator X b just before measuring.Furthermore, we abuse notation and use P k to refer to the symplectic representation of P k .Similarly, the corresponding fidelity is The fidelity corresponds to the coefficient belonging to the identity Pauli string.Generalizing the above, the coefficient F b P corresponding to an arbitrary Pauli string P is where v P is the symplectic representation of P .
We will now specialize to simplifying the calculation for the case of distilling an n-fold tensor power of a Werner state.In the case of distilling an n-fold tensor power of a Werner state, the coefficient p P of a Pauli string P is entirely determined by the input fidelity and the weight wt (P ) of the string.Concretely, where F is the initial fidelity of the input Werner states.This implies that it is sufficient to keep track only of the number of different weight operators for calculations.In particular, in the terminology introduced in Section II, it suffices to consider the weight enumerators Similarly, we find that the success probability equals Furthermore, we do not have to find the individual summands of the numerator and denominator of Eq. 16 for the case of b = 0, P = I ⊗n .This is because E (M −1 (P k )) w and E (M −1 (B k )) w are related by the so-called quantum MacWilliams identity [37], [16], Calculating the probability using Eq. 17 requires 2 n+k sums.However, using Eq.18 it suffices to calculate only E (M −1 (B k )) w , which requires only a sum over 2 n−k terms, and then performing O (n 3 ) sums.This gives a speedup for calculating the fidelity and success probability for the case of b = 0.
This motivates generalizing the MacWilliams identity to the case of b = 0.That is, finding a relationship between We note that an invertible relation does not exist for the case of v b replaced with general v P .This is because examples were found of symplectic matrices M 1 and M 2 such that . but there exist no P 1 , P 2 = I ⊗n such that . More informally, this is because we found examples of symplectic matrices M 1 and M 2 such that the resulting states have the same fidelity and success probability, but the other coefficients of the output state differ (even after local operations).This is related to the existence of codes/stabilizer states that are locally inequivalent, yet share the same E w (M −1 (B k )) and E w (M −1 (P k )) [30].
We note here that, since an [n, k, d] code has E (M −1 (B k )) w = E (M −1 (P k )) w for all w < d [16], expanding the expression for the fidelity for b = 0 in Eq. ( 16) around F in = 1 gives a distillation protocol with output fidelity Finally, we note that it is also possible to formulate the calculation of the weight enumerators in terms of the (n, k)-graph only (i.e.without constructing a symplectic matrix first).This is done by first constructing the codewords, and then calculating the weight distributions as in [24].A related approach was given in [24], where a graph-theoretical approach was given to calculate the distance of a graph code 1 .

VII. RESULTS
We have used our tools to find practical distillation protocols, which we now report on here.As in the previous sections, we focus on the scenario of distilling an n-fold tensor power of a Werner state.
First, we investigate the potential benefits that considering non-trivial measurement syndromes (i.e.b = 0) can give for n to 1 distillation.Secondly, we evaluate how well the heuristically found circuits perform under gate-and measurement noise.We compare the output fidelities of our circuits with those found using the genetic algorithm from [5].Finally, we explore the advantages more general n to k distillation protocols can bring in comparison with n to 1 distillation.To this end, we use the highest fidelity 10 to 7 distillation protocol to teleport one half of a maximally entangled state encoded in the Steane code between two parties.We compare this approach with two more standard approaches -one based on no distillation at all, and one that concatenates the 2 to 1 DEJMPS distillation protocol [3].

A. Non-trivial measurement syndromes
For our first exploration of the impact of non-trivial measurement syndromes, we consider both the success probability and output fidelity F out for different input fidelities F in .In Fig. 11 we consider the envelope of all found protocols, both with only b = 0 (solid) and optimizing over all syndrome sets (dashed).Since the possible number of syndrome sets to condition is equal to 2 2 n−k , the results shown are only for up to n = 5.
From Fig. 11 it can be seen that including non-trivial measurement syndromes provides a more significant benefit for larger input fidelities.However, note that it is in principle possible to always achieve the convex hull of a set of distillation protocols by probabilistically mixing distillation protocols.Observe that the convex hull of the solid and dotted lines are equal for input fidelities equal to or less than 0.85.This implies that for input fidelities 0.85 the inclusion of non-trivial measurement syndromes provides no benefit, while for input fidelities somewhere in between 0.85 and 0.95 non-trivial measurement syndromes start to perform better than probabilistic mixing of trivial measurement syndromes.This is consistent with the results from [36].
Secondly, we consider using distillation for quantum key distribution.We consider the secret-key rate achieved when using asymptotic asymmetric BB84 [38] after performing n to 1 distillation.Furthermore, we consider two different approaches.Firstly, we consider only using the output state when measuring a trivial measurement syndrome b = 0. Secondly, we consider using all the possible states for each possible syndrome string b.Importantly, we bin the states.That is, we separate the measured statistics into bins according to the syndrome string b.This allows us to separate the observed bits into those that had smaller or greater quantum bit error rates.From the convexity of the secret-key rate this can lead to increased secret-key rates, see for example [39] for a similar approach.We show the resultant rates for n = 2, . . ., 7 in Fig. 12, where the solid line corresponds to the above-mentioned binning approach, the dotted line corresponds to only using the syndrome string b = 0.The plot only shows the results for up to n = 7, since calculating the output states for the 2 n−1 different syndromes became too computationally intensive.Fig. 12: Achieved secret-key rate using the asymptotic BB84 protocol after distilling from n to 1 pairs, where the envelope is taken over all n to 1 protocols for fixed n.The solid line corresponds to separating the generated states into bins according to the observed syndrome, and performing the BB84 post-processing for each such bin separately.The dotted line corresponds to only using the state with the syndrome string b = 0 (i.e.error detection).
As would be expected, distilling with a larger number of pairs allows for a higher noise tolerance.Furthermore, we see that the envelope of both strategies is the same.This thus suggests that it suffices to condition only on the b = 0 syndrome when one can choose the number of pairs n to distill one pair out of, similar to the conclusion from [39].Even for larger n, any potential difference between the strategies would be marginal and for a small range of fidelities.
That is, for tasks such as for example QKD, considering non-trivial measurement syndromes does not provide a benefit.This then provides a heuristic motivation for the equivalence defined in Definition IV.2, where two distillation protocols were deemed distillation equivalent if the output states for b = 0 were the same up to local rotations.On the other hand, deterministic distillation (i.e.including all possible measurement syndromes) is a key component of second generation quantum repeaters [36].Furthermore, it is not clear how non-trivial syndromes would impact the capabilities of general n to k bilocal Clifford protocols, especially for such tasks as QKD.
We conclude this subsection by noting that a possible strategy is to take the average state over all syndrome strings b after local corrections.However, for the values of n considered here, this only increases the output fidelity for input fidelities F in 0.88 [36].Since asymptotic BB84 requires an input fidelity of F in 0.835 (assuming a Werner state as input), distilling does not allow for generating key at input fidelities lower than F in 0.835.At the same time, the fact that more states are used and the success probabilities drop down as n increases, leads to the fact that distilling with bilocal Clifford protocols with such a strategy does not bring any benefits for quantum key distribution.This shows the benefits of using additional measurement information and binning accordingly for certain quantum communication tasks [39].

B. Noisy circuit comparison
The results from the previous section assumed perfect gates and measurements.In practice operations will be noisy, reducing the benefits of distillation.This motivates us to investigate how well our found circuits perform in the case of noise.As a comparison, we use the genetic algorithm tools from [5].The approach taken there is to represent purification protocols as sequences of gates, however, permitting only gates that map Bell states to other Bell states.As detailed in the Appendix, that is sufficient to describe the purification protocols considered here and it permits very efficient simulation.Moreover, the simulation can take into account local gate and measurement noise, not only network noise in the initial Bell pairs.Thus, the optimizer, which is a simple genetic algorithm over the sequence of gates, can find circuits more resilient to the imperfections of real hardware.
We note that the framework from [5] explicitly allows for the optimization of circuits in the case of there being a limit on the number of qubits that can be processed simultaneously.Such considerations are especially relevant for distillation on NISQ devices [5], [40].In the framework considered in the present paper, there is no such restriction.Furthermore, the software from [5] allows for an optimization when considering arbitrary Pauli noise, i.e. it is not restricted to depolarizing noise.
Lastly, the genetic algorithm black-box optimizer needs to be executed for every set of hardware parameters, as different levels of noise might be addressed by different circuits, as seen in Fig. 14.
We model the noise in the circuit by gate and measurement noise.Measurement noise is modeled with a probability p m of the measurement producing the wrong outcome.Gate noise is included with a two-qubit depolarizing channel with error probability p g .In the simulations, we set p g = p m and vary this noise probability parameter between 0.001 and 0.045.
We have applied the heuristics in Section V to find good circuits.We show our used circuits in Appendix A. In Fig. 13, we show how these circuits behave in the presence of operation noise versus circuits found with the genetic tools of [5], for three different input fidelities of the initial Bell states F in .Details about how the data is generated can be found in Appendix A.
It is clear from Fig. 13 that the genetic algorithm is more consistent in finding good protocols at 4 ≤ 7 than at n = 8 and n = 9.As explained in more detail in Appendix A, we used approximately 12 hours calculation time for each genetic algorithm data point.We expect that the n = 8 and n = 9 results become more consistent if one increases the calculation time.
Furthermore, for each data point of the black-box method in Fig. 13, we plot a closed marker if the noiseless version of the circuit achieves the same distillation statistics as the protocol that achieves the highest fidelity in the case of no noise.Data points with an open marker have different distillation statistics without operation noise.From the results it becomes clear that, typically, at low p g = p m , the circuits found with [5] have the same distillation statistics as the best-performing noiseless circuits.At higher p g = p m , this is typically no longer the case: it is in this regime where the black-box method clearly outperforms the purely theoretical approach.This behaviour is not consistently present for n = 8 and n = 9: it might be that increasing the calculation time will show that protocols with the same distillation statistics as the optimal circuit with no operation noise will also work the best at low p g = p m for n = 8 and n = 9.Fig. 13: Output fidelity when distilling from n to 1 pairs as a function of the gate noise.The input fidelity of the initial states is F in = 0.7 (top row), F in = 0.9 (middle row), and F in = 0.95 (bottom row).We consider two cases: distilling with circuits found using our heuristics ('Heur.')and distilling with circuits found using the software from [5] ('ML').For the heuristic results, all n to 1 data points for a specific n represent the same circuit: for each n, this circuit can be found in Appendix A. For the 'ML' data, each data point is a specific circuit that came out of the black-box optimizer.For these data, an open (closed) marker indicates that this circuit has different (the same) distillation statistics as the optimal circuit in case of no gate and measurement noise (i.e., as the corresponding circuit of Appendix A).
We now show the results for a 10 to 7 distillation protocol in Fig. 14.For the found 10 to 7 protocol we first found the (n, k)-graph that achieves the highest fidelity.Then, we applied random local complementations and edge flips to find an (n, k)-equivalent (n, k)-graph that would yield a low number of two-qubit gates and small number of keep-gates.We show our found representative and corresponding circuit in Figs.16 and 17.As before, we find that for significant gate noise (i.e.p g = p m = 0.05) the black-box method achieves a higher fidelity.Furthermore, for p g = p m = 0.01 both approaches perform comparable, with the heuristic optimization performing slightly better for lower input fidelities and worse for high input fidelities.We find in particular that the black-box algorithm cannot find the optimal protocol in the case of no noise.

C. Applying a 10 to 7 protocol to the teleportation of encoded states
We now consider the teleportation of logical states between two users Alice and Bob.Teleportation ensures that the states are transmitted unconditionally, and the encoding increases the resilience against The dashed lines do not correspond to a single circuit, rather at each parameter value we optimize a new circuit (e.g., at very high F in the circuit is the trivial no-operation circuit, in order to avoid adding additional noise).For low gate noise parameters, the graph enumeration method discovers the best possible circuits and it outperforms the black-box method.On the other hand, the black-box method performs better in the presence of significant gate noise.
noise.As such, it can form a basis for quantum repeater schemes [36].We emphasize that, unlike the previous subsection, we consider here only the case of no noise on the gates in the circuits.More concretely, Alice first creates a maximally entangled state, after which she encodes it into 2n qubits using an error correction code (n , 1, d ) code.Then, she teleports one half of the state using n bipartite states shared with Bob.Finally, Bob decodes his share of the state.Here, n to k protocols with k = n > 1 could provide a potential benefit over the k = 1 case, through reducing both the resultant infidelity and the number of initial states required.We use our tools for the case of the seven qubit Steane code [41] (i.e.n = 7), for which we have found the n = 10 to k = 7 protocol with the highest fidelity, i.e. the same one found in the previous subsection.
We compare this 10 to 7 protocol with two more standard approaches -seven times the 2 to 1 DEJMPS protocol [3] and seven undistilled pairs.We compare the resultant (in)fidelities for several input fidelities in Fig. 15.We find that for input fidelities greater than ≈ 0.85 the 10 to 7 protocol works best.Furthermore, taking into account the finite success probabilities of these protocols, we find that the 10 to 7 protocol requires less states on average than the seven times 2 to 1 protocol for input fidelities greater than ≈ 0.95, demonstrating the benefits of distillation protocols with k > 1.

VIII. CONCLUSIONS
In this work, we used a correspondence between stabilizer codes and bilocal Clifford protocols to reduce the search for distillation protocols to one over graphs.Furthermore, we found a way to map between such graphs and explicit circuits, allowing us to systematically construct distillation circuits with a small number of two-qubit gates and depth.
We have found that there is no distillation protocol (for fixed n and k) that is optimal for a number of relevant quantities at the same time.That is, dependent on the quantity of interest and the input fidelity, different distillation protocols turned out to be optimal, highlighting the benefits of a full enumeration.[41], and then decoding the transmitted state.We find that teleportation using states from a 10 to 7 distillation protocol leads to an increase in fidelity for initial fidelities greater than ≈ 0.85.
Moreover, we have shown that our results compare favorably with numerical optimization methods that explicitly take into account noise.
We have primarily focused here on the case of entanglement distillation.However, due to the correspondence between distillation and error correction, our enumeration can also be of interest to finding better quantum error correction protocols.
Fig. 17: Corresponding circuit constructed from the (n, k)-graph in Fig. 16.This circuit has depth 6 (where each time-step is demarcated by the dashed lines) and 15 two-qubit gates.
Proof.The proof follows the same logic as that in [30], [31].First, let L k n+1 be an arbitrary transversal of the local equivalence relation on (n + 1, k)-graphs, and choose an arbitrary (n + 1, k)-graph G. From the n + 1 + k vertices of G, choose an arbitrary subset V that excludes exactly one of the output vertices.Since the induced subgraph G[V ] is an (n, k)-graph, it is possible to perform local complementations on the vertices in V , together with edge flips on the k input vertices such that G[V ] is equivalent up to an (n, k)-permutation to some representative G ∈ L k n .But then G is equivalent up to an (n, k)-permutation to an extension of G .A similar argument holds for input extensions, but now an arbitrary (n, k + 1)-graph G is chosen and V is a subset that excludes one input qubit.An input extension of the induced subgraph G[V ] is then equivalent up to (n, k)-permutations and edge flips to G. The main body of this work deals with first-principles, analytical, efficient enumeration of good purification protocols.However, this approach does not automatically provide the best circuit implementing a given protocol, neither does it consider the detrimental effects of imperfect local gates.We used alternative tools in order to study how effective our approach is when considering the aforementioned additional constraints.Namely, we employed a known black box optimizer for the generation of good noisy purification circuits [5], albeit without optimality guarantees.This black box optimizer consists of two parts: a noisy entanglement simulator and a genetic optimization algorithm.
The simulator works by restricting the representation of the Bell pairs to only states that can be expressed as density matrices diagonal in the Bell basis.Gates in the purification protocols are simply permutations of the Bell basis and measurements are simply deletion of half of the basis states, thus providing for very efficient simulation (faster than Clifford circuit simulation).Our particular simulator is exponentially costly in the number of Bell pairs due to purely classical reasons: we track all possible correlations between Bell-diagonal states.However if that becomes a practical problem, a standard classical Monte Carlo approach would be enough to speed up the simulation at a fairly modest cost to the precision of the simulation results (as we do in a yet to be published related work [42]).
The genetic algorithm employed for the simulation is fairly conventional: we represent circuits as a sequence of gates.That sequence forms the "genome" of the circuit.Each circuit is an "individual" in a large "population" of circuits.At each iteration of the optimization algorithm we generate "offspring" circuits by randomly mixing up the genome of "parent" circuits.At each iteration we also generate "mutant" circuits by randomly perturbing existing circuits.Random perturbation can be anything from swapping the order of a pair of gates, to changing the parameters of a gate (e.g. a CNOT becomes a CPHASE).This new "generation" of circuits is evaluated and the worst performers are culled.The procedure is repeated until we converge on good circuits, which usually takes a hundred generations and less than an hour on commodity hardware for registers of width under 8 qubits.
The only gates permitted in the genome are gates that map "good" Bell pairs to the same Bell pair, but permute the other possible basis states arbitrarily.
In Sec.VII-B and Fig. 13 of the main text, we compare protocols found with our heuristic method to protocols found with the genetic tools of [5] in situations with gate and measurement noise.Here, we will provide details on how the data of Fig. 13 is generated.
Because we wanted to compare our results to the circuits generated with [5] for specific Bell state numbers n, we had to slightly adjust the code of [5].In creating the new generation of circuits, we introduced a check that made sure if the number of 'raw' (i.e., input) Bell pairs used for the specific individual circuit did not exceed n.This adjustment is very similar to the already existing check in the code that made sure the number of total operations does not exceed a preset number.
To generate the results, we set the number of register qubits of the circuits to n. Strictly speaking, one could also generate circuits for a certain number of input Bell states n with a smaller register, as the circuits re-use measured-out qubits.However, to make sure we would not exclude distillation circuits, we decided to use the maximum register size.For each of the initial individuals of the population, we selected n + 2 random operations.During evolution, we let the number of gates and measurements grow or shrink without restrictions.We made use of a population size of 300 circuits.When creating children, we used 20 random pairs of this population, and generated 100 children for each pair.During mutation, per individual of the population, we generated 2 mutants for each of the 4 different mutant types included in the code.
We let the software generate a maximum of 100 generations, but also, for each data point of Fig. 13, cut-off the creation of new generations after 12 hours.If all of the 100 generations were generated before the 12 hour mark, or if the population converged with a smaller number of generations before the 12 hour mark, we started a new iteration of the software with a new random starting population and a different seed.At the end, we selected the best result from all iterations.
We present here some of the circuits found with our optimization.For each n, we selected the circuits based on the output fidelity of the final state at input state fidelity F in = 0.9 and operation noise p g = p m = 0.03.
Fig.2:Here we show an alternative approach to how one could implement a subset of the stabilizer encodings.That is, first prepare the k qubit input state (the corresponding qubits are called input vertices).Then prepare n output qubits in the |+ state.Then, CZ gates are applied according to some simple graph on n + k vertices, where we distinguish between the in-and output vertices.Such objects we call (n, k)-graphs.Then, on the right the input qubits are measured in the X-basis, initializing the remaining n output qubits in some logical state |ψ L .When correcting against depolarizing noise, it suffices to consider encodings performed in this way[15].This thus reduces the optimization to one over (n, k)-graphs.Finally, we reduce the search space even further by showing that (n, k)-graphs that are equivalent under so-called local complementations, edge flips and (in the case of permutationally invariant depolarizing noise) permutations of the input vertices and permutations of the output vertices yield equivalent distillation protocols.We note that the (n, k)-graph formalism can also be used to construct circuits that implement the corresponding distillation protocols/stabilizer codes (not shown in this figure).

Fig. 3 :
Fig. 3: Example of a local complementation on a graph.The local complementation is performed on the encircled vertex.The unconnected edges indicate that the graph shown can be part of a larger graph, that is left unchanged after the local complementation.

Fig. 5 :
Fig. 5: Relation between bilocal Clifford protocols and stabilizer codes, for the specific case of n = 4, k = 2.The left figure corresponds to a two-qubit state |ψ being encoded into four qubits through a Clifford circuit C. The right figure shows a bilocal Clifford protocol, where C is the same Clifford circuit as in the left.The circuit C T ⊗ C † acts on the input state of the distillation protocol.

Fig. 6 :
Fig.6: Example of an (n, k)-graph, corresponding to the [4, 2, 2] graph code[25].The two-qubit input to the code is initialized on the diamond vertices, and then measured in the X basis.After (local) corrections depending on the measurement outcomes, the input state is encoded on the remaining four vertices.

Fig. 7 :
Fig. 7: Constructing a circuit from an (n, k)-graph.The CZ gates correspond to the induced subgraph on the output qubits, while the CNOT gates map Z 1 Z 3 Z 4 (the neighbors of the left input qubit) and Z 2 Z 3 (neighbors of the right input qubit) to Z 1 and Z 2 .

Fig. 11 :
Fig.11: Envelope of the achieved fidelity and success probabilities, where we compare post-selecting on detecting all correlated outcomes (solid) with optimizing over all possible corrections after measuring (dotted).The envelope is shown for several input fidelities, and for all n to 1 bilocal Clifford protocols with n = 2 . . . 5.

Fig. 18 :
Fig. 18: Left) An (n, k) = (4, 2) graph on n + k = 4 + 2 vertices.Middle) an extension of the first graph.Right) An input extension of the first graph.Possible edges are indicated by dashed lines.Input vertices are indicated by diamond nodes.
Fig.14: Black-box optimization of circuits helps in the presence of local errors, but misleads if the local node hardware is perfect.This plots details the performance of circuits obtained from black-box optimization with genetic algorithms (dashed line) versus the performance of the best circuits obtained through graph enumeration (solid line).Output fidelity (vertical axis) is plotted against input fidelity (horizontal axis) at different local gate noise levels (color coded) for the best 10-to-7 purification circuit.
Resultant infidelity after first teleporting one half of the logical maximally entangled state of the Steane code