Decoding Quantum Error Correction Codes With Local Variation

In this article, we investigate the role of local information in the decoding of the repetition and surface error correction codes for the protection of quantum states. Our key result is an improvement in resource efficiency when local information is taken into account during the decoding process: the code distance associated with a given logical error rate is reduced with a magnitude depending on the proximity of the physical error rate to the accuracy threshold of the code. We also briefly discuss an averaged approach with local information for table lookup and localized decoding schemes, an expected breakdown of these effects for large-scale systems, and the importance of this resource reduction in the near term.


I. INTRODUCTION
It has long been known that quantum information processing devices at any significant scale will face the obstacles of cumulative noise and error [1][2][3][4].Quantum error correction codes [5][6][7] were developed to overcome these obstacles, at the cost of increased resource (qubit) and time overheads.Even for smaller devices in the nearterm, partially error-corrected approaches have been proposed to mitigate limiting noise processes [8][9][10][11].Present devices face tight resource restrictions and error rates comparable to even the largest accuracy thresholds, so it is not (yet) sufficient to treat quantum error correction schemes as if their choice was agnostic with respect to the underlying technology or application: we must consider all idiosyncrasies and constraints before us.The constraints of most physical systems mean that the family of topological quantum error correction codes seems most promising for any near-to mid-term development, having three characteristic advantages: large accuracy thresholds, small correction circuits, and local interactions.The surface code [12][13][14], for example, requires only nearest-neighbour interactions.Conversely, many codes without such local constraints, such as Shor's code [15] and other concatenated codes [16], are simply out of reach for many real physical systems.
Just as the form of interaction varies according to our choice of physical system, so too does the form of the noise and error we confront; the very earliest proposals for quantum error correction in fact relied on error detection schemes [17][18][19][20], under the assumption that the error of the state was subject to the quantum Zeno effect [21].Standard models did eventually settle on the depolarising noise channel [22], but even then parallel streams of development emerged to deal with quantum channels for which depolarising noise was insufficient, such as loss channels [23].In the last decade we have seen a plethora of results looking at different noise models, verifying the performance of the codes under such models and asking what modifications, if any, might be made to improve performance.Early examples focussed on the tendency for errors to be highly biased toward one particular basis (such as dephasing) [24][25][26].Investigations of qubit loss [27][28][29][30][31], amplitude damping [32,33], correlation [34][35][36][37][38][39], and qubit leakage [40,41] have since been undertaken.
In this work we focus not on qualitatively distinct channel behaviour, such as loss or amplitude damping, but on local variation in a standard depolarising noise channel.Specifically, we assume that the measurement outcomes associated with each stabiliser operation may be distinct with respect to their information content.This variability will not be the result of any changing external influence, but inherent in the information content associated with the measurement outcomes themselves.Any multi-shot [42] or long-time count-threshold [43] measurement scheme in the presence of error is expected to display such local variation.We perform pseudo-threshold simulations for the repetition and surface codes, comparing the standard, fixed-error-rate phenomenological error model with the case for an error-rate drawn from a discrete, balanced, two-component distribution D of equal mean p µ but a fixed relative width σ, where δ[•] is the delta function and p µ will serve as the phenomenological physical error rate [44,45] in addition to the mean measurement error rate.This toy distribution is chosen to maximise the contrast between different sites, to accentuate the effects of the variability, and because the kind of feedback in measurement that we envisage is expected to result in a discrete distribution (as opposed to continuous alternatives such as the Normal Distribution).We also introduce two approximate measures consistent with the simulated results to extrapolate the significance of variation for larger codes and higher dimensions.Our paper is organised as follows: In Sec-tion II we describe and justify the approximate measures we introduce.Section III then defines and takes the repetition code as an exemplar of the significance of local variation for increasing code distance, while Section IV extends the analysis to the surface code for comparative inference about the behaviour of codes in higher dimensions.In Section V we summarise our results and discuss a potential generalisation for alternative decoding schemes before concluding.

II. QUANTIFYING SIGNIFICANCE
It is important to investigate the impact of local measurement variablility (σ, for our bimodal model) on the error rate as a function of the code size and structure.This will be addressed in two ways: Firstly, numerical pseudo-threshold simulations will be performed for the repetition and surface codes, allowing us to compare the logical error rate between these two codes and across a range of code distances and local error rates.Secondly, we explain the observed numerical behaviour by modelling the probability that local variance allows an error chain of linear dimension L 2 to be less likely than one of linear dimension L 2 + 1, with L the code distance.Let us first consider chains of adjacent lengths to investigate the transition point at which local information becomes useful; modifications to chains resulting in the same logical state are of at least second order in the link probability.We observe that the exponential suppression of a chain's probability with length means that local information will have the most impact when two options are close to one another in length.A chain's length is modelled as a number of successful Bernoulli trials, since a chain need not be contiguous along a given dimension of the lattice to cause a logical error.We expect chains of length L 2 will have a lesser share of the tail of this distribution if they are far above the mean number of errors Q•p µ , where Q is the total number of qubits and p µ is the mean error rate per qubit (taken to be equal to the corresponding term for measurement error in Equation ( 1)).As the code distance increases a greater fraction of logical errors should be caused by chains deviating from the half-code-distance, so that our focus on adjacent chain lengths is more valid for small code distances.Take P = L/2 to be the probability of sampling an error chain of length L/2 from L qubits via the binomial distribution, and P ≥ L/2 to be the probability of sampling an error chain greater than or equal to L/2, Along a single dimension, we can then justify this assertion by explicitly computing the ratio of these two probabilities The behaviour of this ratio with increasing L and fixed p µ = 0.1 is shown in Figure 1.Over the range of code distances considered in this paper, approximations of relative likelihood based on adjacent-length error chains should correspond well to true behaviour; this will be significant in Subsections III B and IV A, where we develop an intuition and attempt an explanation of our observed numerical results.

III. THE REPETITION CODE AND CHAIN LENGTH
The repetition code, depicted in Figure 2, is defined by mapping qubit subsystems and operations to a 1 × L chain.It is essentially a classical code, but may nonetheless be used to partially protect quantum information and is useful when the limiting source of error is highly biased along a single dimension.The repetition code embeds one bit within the +1 eigenspace of parity operators ŜX (v) acting on adjacent bits in this 1-dimensional chain, where v are vertices, e are edges, ∂ denotes the boundary, and σ(e) x is the application of the Pauli X matrix, to the qubit represented by edge e. Vertices of degree-1 are excluded.Equation ( 5) uses a common shorthand notation for operators that are sparse with respect to the set of qubit subsystems, ignoring the order of, and trivial elements in, the tensor product in favour of superscripts.

FIG. 2. A graphical representation of the repetition code.
Edges represent qubits, while nodes represent σ(i) x σ(i+1) x parity (stabiliser) operations between adjacent qubits.Errors in a basis orthogonal to the parity operations are detectable.
A single local operation on any bit in the basis protected by the code (the basis orthogonal to the parity check operators) will be detected by measurement of the parity operators and may be corrected so long as the number of such errors is less than half the length of the chain.Measurement errors are incorporated by repeating parity measurements, with the effect of extending the lattice of the code into a second dimension [14,46].The probability that accumulated error after the total set of such measurement rounds cannot be corrected is called the logical error rate.We restrict our attention to the phenomenological error model for the duration of this report; in this model individual qubit and measurement error rates are defined per measurement round, rather than per gate, and are associated with lattice edges.

A. Time-Constant Error Rates
We begin by considering the simplest case where the measurement error varies spatially between sites, but is constant at each site in time; this form of error we call 'static'.This is in contrast to the runtime-error to be considered in the following sections, but given the nearuniversality of static inhomogeneity in quantum devices, and given the computational costs of real-time decoding as systems increase in size, some distinct emphasis on this form of variation is thought useful.Inhomogeneity in detector efficiency is very common.Ranges for system detection efficiencies appear to be of order 10%, so even once we get the mean values down toward our target threshold, we expect significant remaining spreads.
As a simple demonstration, we take the repetition code with imperfect measurements under an error model equivalent to the phenomenological error model of the surface code [44,45].The minimum distance between two points is found for this case by taking the horizontal Manhattan distance [47] according to site indices, and then performing a minimisation over vertical-edge weights in the horizontal region bounded by the two points for the corresponding vertical Manhattan distance.This is in contrast to later sections where variation in time makes a full evaluation of the minimum distance across the lattice necessary.A single boolean variable to represent the parity of a qubit at the far left edge of the lattice is maintained, as is a row of 2-bit values to track the evolution of parity measurement outcomes, with the indices of odd parity outcomes passed to an extensible list in an online fashion.Correction is performed only on the tracked left-hand qubit, and is determined as the parity of the number of connections between internal lattice sites and the left lattice boundary.Assuming a final round of perfect measurement to close a single trial, the final state of this tracked qubit then records the presence or absence of logical error.Results for 10 6 trials, for σ ∈ {0.1, .., 0.5}, L ∈ {9, .., 31} and p µ = 0.07 are displayed in Figure 3.We find that improvements on the order of 10% are observed for relative widths of order 0.4-0.5, and that these improvements appear to be increasing with code distance [48].Relative error rates for the repetition code under time-constant, but space-variable measurement error as described in Section III A. σ ∈ {0.1, 0.2, .., 0.5} (red, green, blue, black, yellow) and L ∈ {9, 11, .., 31}.pµ = 0.07 was the physical error rate as well as the mean measurement error rate.10 6 trials were used per point.

B. The Impact of Local Variance
Let us now move away from the time-constant case to more general local error rates.The variance in the total weight of a sampled error chain, as its length increases, depends upon the assumed local distribution.It is not the absolute variance that is important, since this will be suppressed for longer chains, but rather the variance relative to the chains' mean weight; the variance must offset the effect of the additional multiplicative factor associated with incrementing the length of the chain.For the approximate measure defined in Section II, we will look at two distributions: the uniform distribution, The ratio between the standard deviation of a product distribution associated with a chain of length L and the mean difference in probabilities between chains of lengths L and L + 1, plotted as a function of the length L. 10 7 samples are taken for each point, and increasing variability among samples manifests in visibly increasing uncertainty in the ratio as the lengths increase.The ratio is found to increase exponentially with the lengths of the chains, and may therefore alter corrections inferred from comparisons between error chains of differing length, even with larger code distances for which the fraction R of Figure 1 We calculate the ratio between the standard deviation of the weight of a chain of length L/2 and the difference between the mean probabilities of chains of lengths L/2 and L/2 + 1.The resultant product distributions are not normally distributed, so the standard deviation provides only a rough characterisation of the width.Without analytic formulae for the sample-product distributions, we compute the considered ratio numerically via random sampling.The results are shown in Figure 4, where the ratio is found to increase exponentially with the chain length.Local variability is therefore expected to become more significant as the code distance increases, and this is reflected in the results of our pseudo-threshold simulations, shown in Figure 5.This behaviour necessarily results in a slight upward shift in the accuracy threshold.
The repetition code is quite a simple code and very useful for explaining the local variance issue here.However, while the repetition code may be used for quantum sensing and in near-term biased-noise applications, we would also like to know how general this issue is for larger scale quantum computation.Let us now examine the surface code, one of the leading error correction codes being considered for large scale quantum computation [49].shown: the error rates when the mean error probability pµ is used for decoding (light) and those when local variation is incorporated (dark).For the latter series, the relative local width σ is 0.5.Also shown is the line of equality between the two axes (grey, dashed).Error bars denote 3 standard deviations from the mean, calculated according to the Wilson Score [50].(Bottom) Relative change in the logical error rates when local information is incorporated, at a mean error rate pµ = 0.091, as a function of code distances between 9 and 31 and for relative widths of local variation σ between 0.1 and 0.5.Dashed lines are added to guide the eye.Standard estimates of sample error do not apply to this ratio distribution, but it is derived from points similar to those in the top section of this figure, for which error bars are shown.Each point in either graph is the result of 10 5 trials, decoded using Kolmogorov's Blossom V algorithm [51,52] for minimum-weight perfect matching.

IV. THE SURFACE CODE AND CHAIN ENTROPY
The surface code is defined by mapping qubits and operations to an l × m rectangular lattice [12][13][14].Edges of this lattice represent qubits, while faces and vertices represent measurements of parity operators in the Z and X bases respectively (the bases are arbitrary, but must be orthogonal).These measurements are defined by Paulioperator products acting non-trivially on qubits (edges) x , as represented graphically in Figure 6.Here v are vertices, e are edges, f are faces, and ∂ denotes the boundary.Vertices of degree-1 are excluded.The set of these measured operators generates the stabiliser group, S [53].
Elements of this stabiliser group commute with all logical operations and therefore preserve the subspace in which the logical qubit is encoded.We require that our system exists in the +1 eigenspace of the stabiliser group.By then ensuring that there is exactly one more physical qubit than there are generators of this group, we restrict the total space of our system to a two-dimensional subspace within which we can define a logical qubit.At the boundaries of the surface code, faces and vertices need not have the full complement of four adjacent edges.If a boundary consists of three-edge vertices, it is called smooth, while if it consists of three-edge faces, it is called rough.For the identification we have chosen, Xbasis (Z-basis) operations on vertices (faces), a contiguous chain of Pauli σ z (σ x ) errors with both end-points at a rough (smooth) boundary will be undetectable.If both ends of the chain meet a single, contiguous such boundary, then the chain is equivalent to the application of a stabilising operation and therefore acts trivially on the logical qubit.On the other hand, if such a chain has its end-points at two non-contiguous such boundaries, then there is no equivalent stabilising operation and the chain is by definition a logical operation.A logical operation is only unique up to elements of the stabiliser group, and is in this sense equivalent to any string of single qubit operations stretching between its two boundaries, though canonical representatives are usually defined as σ(L) x and σ(L) (10) where the qubits are designated on the lattice by the two dimensional indices (i, j).
A surface code on an qubits, encodes at most 1 logical qubit, and has a code distance of L. The code distance indicates that states in the code space are topologically separated by L local qubit operations.As in Section III, parity measurements are repeated to account for faulty measurements, extending the code into a third time dimension.

A. Lattice Dimension
The discussion in Section III assumed a simple, onedimensional repetition code.With the surface code as a point of comparison, we can now discuss the effect of local variation in higher dimensions.Since we are assuming that physical errors occur at a constant rate, the impact of variation will be affected by the fraction of links in a given error chain corresponding to measurement error.Increasing the dimension of the code will decrease this fraction.However, extending the lattice along an additional dimension also increases the number of qubits as well as the multiplicity of equivalent error chains: the effect of moving to higher dimensions is not trivially apparent.
The number of direct paths connecting two vertices in an m-dimensional lattice, when these points are separated by an equal number of links, n, along each dimension, is (mn)!/(n!) m .More generally, when points are separated by a number of steps d i in dimension i, the number of direct paths is ( i d i )!/ i (d i !); here we assume an average symmetry on the grounds of equal mean error rates p µ .The assumption that the most likely error chain can be used as a proxy for the most likely error class relies on the condition that the exponential suppression in likelihood with length overcomes the additional entropic contribution from the increase in the number of chains.Approximating this as (mn)!(n!) m p mn ≥ (m(n + 1))! ((n + 1)!) m p m(n+1) , (11) and using Stirling's approximation n! ≈ √ 2πn( n e ) n we find that p must satisfy This approaches the finite value p Critical = 1/m as n increases, and remains larger than the accuracy thresholds of the surface code variants known at low dimensions.The accuracy threshold of the code p th indicates the regime in which it is likely to operate, at least in the nearterm.Taking the ratio between the accuracy threshold of the code p th and this critical probability p Critical gives us a measure indicating the relative impact of the diminishing probability of a chain as against the increasing multiplicity of its class.For the repetition code of Section III we find that p th /p Critical ≈ 0.2, while for the surface code we have p th /p Critical ≈ 0.09.Here lower values are more significant.We conjecture that, given the dominance of direct error paths as indicated by our discussion in Section II, the approximate factor of 2 separating these ratios represents the relative significance of variation in the probability of a single error chain; this would be consistent with the approximate factor of 2 between the relative improvements found for the sampled results shown in Figures 5 and 7. To state this another way, while the accuracy threshold is related to a distance from the point of zero-returns, we believe this critical probability indicates a gradient of logical error with code distance.

V. DISCUSSION
In this work we have performed pseudo-threshold simulations using minimum-weight perfect matching and Kolmogorov's Blossom V algorithm [51,52], and have introduced two intuitive but approximate measures of qualitative, predictive utility.Our results show that accounting for local variability in measurement errors can reduce logical error rates by factors of order 30%, and also show evidence that this reduction increases for higher code distances and dimensions, under the minimum-weight perfect matching decoder.There are two intuitive explanations for this behaviour: firstly, it is known that the performance of these codes under loss (which can be modelled as perfect mixing of a known subset of qubits) exceeds their performance under the depolarising channel, and we therefore expect some advantage in the spectrum between these two extremes; secondly, the gradient of the curve for logical error rate is not constant, and therefore it is not surprising that we achieve some advantage by spreading the physical error rates across this curve.In light of these points, it is not the presence but the magnitude of the observed advantage to which we would like to draw the reader's attention.
The minimum-weight perfect matching decoder may run into difficulties at higher code distances when the shown: the error rates when the mean error probability pµ is used for decoding (light) and those when local variation is incorporated (dark).For the latter series, the relative local width σ is 0.5.Also shown is the line of equality between the two axes (grey, dashed).Error bars denote 3 standard deviations from the mean, calculated according to the Wilson Score [50].(Bottom) Relative change in the logical error rates when local information is incorporated, at a mean error rate pµ = 0.024, as a function of code distances between 11 and 19 and for a relative width of local variation σ = 0.5.The dashed line is added to guide the eye.Standard estimates of sample error do not apply to this ratio distribution, but it is derived from points similar to those in the top section of this figure, for which error bars are shown.Each point in either graph is the result of 10 5 trials, decoded using Kolmogorov's Blossom V algorithm [51,52] for minimum-weight perfect matching.
weight of each chain is allowed to vary.The increase in the variance relative to the weight of the chain that we observed in Figure 4 indicates that the most likely single chain becomes less representative of its entire class as the length increases.At the same time, the variance of the entire class will itself increase; individual error chains will become less significant but variability in the set of such chains should become more useful.However, the number of chains in a set increases rapidly with the lengthsee Equation (11); when the distance between syndrome points is large, the inefficiency in classical processing required to account for the full class becomes prohibitive.An online treatment of local variability therefore seems applicable only in small-to mid-level applications.
Beyond small-to mid-level codes, the computational cost of the minimum-weight perfect matching decoder motivates the use of alternative decoding schemes.Belief propagation [54] would be one approach to incorporate local variation without having to consider the global lattice.By contrast, the renormalisation group decoder of Cianci et al. [55][56][57] is another popular alternative but relies on pre-computed local tables.While this decoder is important because it allows us to parallelise the classical processing involved in decoding, it does not allow realtime local feedback.However, as the probability over the links of an error chain is multiplicative, the appropriate mean is geometric: the mean probability of a chain of fixed length should decrease as the variance of individual links increases.For some more involved error interdependence, corresponding maps from length to probability can be imagined.The prior distribution, through its variance, therefore has a direct macroscopic impact on the logical error rate and can be accounted for even in alternative decoding schemes using pre-computed tables.Additionally, we could consider qubit rotations subject to random analogue rotation errors, without the local feedback from measurement.
Finally, we note that the impact of measurement error is magnified for measurement-based quantum computing, for which it is involved in every operation, and for distributed schemes, wherein heralded transmission losses allow a distinct source of local information about operation error rates.We expect local variability to provide large effective reductions in resource requirements in the near-term, as resources are severely limited and gate error rates for many systems remain at or near the surface code accuracy threshold of ∼ 1%.
FIG. 4.The ratio between the standard deviation of a product distribution associated with a chain of length L and the mean difference in probabilities between chains of lengths L and L + 1, plotted as a function of the length L. 10 7 samples are taken for each point, and increasing variability among samples manifests in visibly increasing uncertainty in the ratio as the lengths increase.The ratio is found to increase exponentially with the lengths of the chains, and may therefore alter corrections inferred from comparisons between error chains of differing length, even with larger code distances for which the fraction R of Figure1declines.(Blue) A discrete, balanced, two-component distribution (δ(x−a)/2+δ(x−b)/2) with parameters a = 0.05 and b = 0.15.(Orange) A uniform distribution (1/(b − a)) with the same parameters.
FIG. 4.The ratio between the standard deviation of a product distribution associated with a chain of length L and the mean difference in probabilities between chains of lengths L and L + 1, plotted as a function of the length L. 10 7 samples are taken for each point, and increasing variability among samples manifests in visibly increasing uncertainty in the ratio as the lengths increase.The ratio is found to increase exponentially with the lengths of the chains, and may therefore alter corrections inferred from comparisons between error chains of differing length, even with larger code distances for which the fraction R of Figure1declines.(Blue) A discrete, balanced, two-component distribution (δ(x−a)/2+δ(x−b)/2) with parameters a = 0.05 and b = 0.15.(Orange) A uniform distribution (1/(b − a)) with the same parameters.

3 FIG. 5 .
FIG.5.(Top) Sampled logical error rates for the repetition code over the code distances 9, 23, 27, and 31, as a function of the mean physical error rate pµ.Two sets of series are shown: the error rates when the mean error probability pµ is used for decoding (light) and those when local variation is incorporated (dark).For the latter series, the relative local width σ is 0.5.Also shown is the line of equality between the two axes (grey, dashed).Error bars denote 3 standard deviations from the mean, calculated according to the Wilson Score[50].(Bottom) Relative change in the logical error rates when local information is incorporated, at a mean error rate pµ = 0.091, as a function of code distances between 9 and 31 and for relative widths of local variation σ between 0.1 and 0.5.Dashed lines are added to guide the eye.Standard estimates of sample error do not apply to this ratio distribution, but it is derived from points similar to those in the top section of this figure, for which error bars are shown.Each point in either graph is the result of 10 5 trials, decoded using Kolmogorov's Blossom V algorithm[51,52] for minimum-weight perfect matching.
FIG.6.Illustration of a length-7 surface code.Dotted edges correspond to qubits in their initial state.An X-basis measurement (at a vertex) detects local σz operations and vice versa.Logical σ (L) x operations stretch from the left side of the lattice to the right, while logical σ (L) z operations stretch from the top edge of the lattice to the bottom.A sample error syndrome is shown; blue (red, purple) edges correspond to qubits following a local σz (σx, i σy) operation.
Sampled logical error rates for the surface code over the code distances 15, 17, and 19, as a function of the mean physical error rate pµ.Two sets of series are