$O(N^3)$ Measurement Cost for Variational Quantum Eigensolver on Molecular Hamiltonians

Variational Quantum Eigensolver (VQE) is a promising algorithm for near-term quantum machines. It can be used to estimate the ground state energy of a molecule by performing separate measurements of $O(N^4)$ terms. Several recent papers observed that this scaling may be reducible to $O(N^3)$ by partitioning the terms into linear-sized commuting families that can be measured simultaneously. We confirm these empirical observations by studying the MIN-COMMUTING-PARTITION problem at the level of the fermionic Hamiltonian and its encoding into qubits. Moreover, we provide a fast, pre-computable procedure for creating linearly-sized commuting partitions by solving a round-robin scheduling problem via flow networks.

Variational Quantum Eigensolver (VQE) [1] is a quantum algorithm that is a leading contender, if not the top contender, for demonstrating a practical quantum advantage on near-term machines.Unlike traditional quantum algorithms, which have extremely high quantum requirements in terms of gate counts and qubit lifetimes, VQE is feasible with modest quantum resources that are already available on current quantum computers.It attains a lower quantum resource cost in part by structuring computation over a large number of subproblems, each of which can be performed on a quantum computer with modest capabilities.
While the low quantum resource requirements per subproblem are appealing, the number of subproblems is an issue for practical application of VQE.Consider molecular ground state estimation, a classically-hard problem that is considered the canonical application of VQE.Within the framework of VQE, molecular energy estimation is performed by applying linearity of expectation to the Hamiltonian H, an observable that captures a molecule's energy configuration.Under the second quantization and expressed in fermionic form, we have [2]: h pqrs a † p a † q a r a s (1) Applying linearity of expectation, we see that measuring H reduces to measuring a † p a q and a † p a † q a r a s .Each of these O(N 4 ) terms is transformed via fermionto-qubit encoding into a sum over a constant number of Pauli strings (N -fold tensor product of Pauli matrices).Measurement of each of these resulting O(N 4 ) Pauli strings constitutes a subproblem.Although the measurement for each subproblem is simple, requiring only singlequbit rotations, the O(N 4 ) scaling of subproblems poses * pranavgokhale@uchicago.edu a practical challenge towards applying VQE to molecules of interest such as caffeine and cholesterol, which appear to require N numbering hundreds of qubits [3].
Recently however, several research groups observed that this O(N 4 ) scaling may be reducible to O(N 3 ) [4][5][6][7][8].The core principle underlying these papers is that commuting Pauli strings can be measured simultaneously.The O(N 4 ) → O(N 3 ) improvement is conjectured based on extrapolation of results across a range of molecules.
Here, we confirm this observation of linearly-reduced measurement cost for molecular Hamiltonians encoded under Jordan-Wigner-the most widely used encoding [9].Our general approach is to demonstrate that the molecular Hamiltonians can always be partitioned into pairwise-commuting families where each family contains O(N ) terms.Since the terms in each such family can be measured simultaneously, this constitutes our reduction in the measurement cost of VQE from O(N 4 ) to O(N 3 ).
In addition to proving the existence of such a partition, we explicitly demonstrate how to construct it.Our construction is efficient, computable in O(N 5 log N ) time.Moreover, the construction is independent of the specific molecular Hamiltonian of interest and instead only depends on N .This means that the partitioning can be pre-computed once for each N .The efficiency of our approach is critical.In contrast, proposals for simultaneous measurement in the recent prior work have involved algorithms with runtimes as high as O(N 12 ), which may be slow enough to undermine the advantage of simultaneous measurement.

II. PRIOR WORK
The empirical results in [4][5][6][7][8] all suggest that the advantage due to simultaneous measurement appears to increase for larger molecules.The specificity of this claim varies across the papers- [4] explicitly extrapolates linear scaling for molecular Hamiltonians over a range of encodings, molecules, and active space sizes; [5] formulates it as an explicit conjecture for "almost all" sets of Pauli arXiv:1908.11857v1[quant-ph] 30 Aug 2019 strings; [6,7] observes this scaling via least-squares fitting for molecular Hamiltonians under the Jordan-Wigner and Bravyi-Kitaev qubit encodings; and [8] makes note of increasing partition size with increasing N .
Moreover, [4, Section 5.1] provides two encouraging examples of an asymptotic gain from simultaneous measurement for specific types of contrived Hamiltonians.First, it is observed that simultaneous measurement can yield an exponential gain: the 2 N Pauli strings with the same underlying measurement basis across all qubits can be simultaneously measured with a single measurement.Second, in the case of measuring all 4 N Pauli strings on N qubits, a square root (2 N ) reduction is achievable by Mutually Unbiased Bases.
However, as suggested by [4, Appendix A], an asymptotic gain from simultaneous measurement is not guaranteed.For example, consider the set of 2N Pauli strings matching the pattern Z*(X|Y)I*, where * matches 0 or more occurrences and | is a Boolean OR.For example, for N = 3, we have [XII, Y II, ZXI, ZY I, ZZX, ZZY ].It can be shown that none of the pairs in this set commute.Thus, simultaneous measurement offers no advantage for this set of Pauli strings.More generally, we see that simultaneous measurement does not automatically confer any advantage.
During the preparation of this manuscript, we became aware of very recent work by [10] that also proves the O(N 3 ) measurement cost for molecular Hamiltonians.Their work approaches the problem via Majorana operators, which leads to a proof agnostic of the underlying fermion-to-qubit encoding.

III. COMMUTATIVITY OF INDEX-DISJOINT TERMS
Our top-level goal is to partition the molecular Hamiltonian into commuting families, such that the number of partitions is minimized.This problem is termed MIN-COMMUTING-PARTITION and is NP-Hard in general [4].We instead seek to approximate a good partitioning.Our approach is to address this problem at the level of the fermionic Hamiltonian in Equation 1.By contrast, past work, except for [4, Section 6], has focused on this problem at the qubit Hamiltonian stage, after the fermionic Hamiltonian has been encoded into a summation over Pauli strings.
We focus on the O(N 4 ) terms with p = q = r = s in the second sum of Equation 1, because these terms are asymptotically dominant; the number of other terms is only O(N 3 ).Without loss of generality, let us suppose that p > q > r > s, and likewise i > j > k > l.We denote the set of Pauli strings in the Jordan-Wigner encoding of a † p a † q a r a s as {a † p a † q a r a s } JW .Our core observation is that if two a † a † aa terms have disjoint indices, then the terms in their qubit encodings commute.In particular: where the commutator is taken to apply between all pairs of elements between the two sets.
Theorem 1 can be verified by inspecting the form of the Pauli string terms in {a † a † aa} JW .Under the Jordan-Wigner encoding [11], we perform the transformations: Carrying out the transformation for a † p a † q a r a s yields the 16 Pauli strings matching the regular expression: where Z p:q denotes Z on each index between p and q, exclusive of endpoints.Figure 1 shows this pattern as a pictorial representation: the repeating Z's are blue rectangles and the {p, q, r, s} indices are the black vertical bars demarcating the blue and white rectangles.To evaluate the commutativity between a term in {a † p a † q a r a s } JW and a term in {a † i a † j a k a l } JW , we simply need to count the number of indices that anti-commute, as explained in [4, Section 3].If the number of anticommuting indices is even, then the two Pauli strings commute.For all indices other than p, q, r, s, i, j, k, l, the Pauli matrices at the indices commute, because [ On the remaining 8 indices, the commutation depends on whether the (X|Y ) is matched to an I (commutes) or Z (anti-commutes).Figure 2 depicts this: when one of the black bars (X|Y ) is vertically aligned with a blue rectangle (Z), the index does not commute, as marked by the red cross.When the black bar is vertically aligned with a white rectangle (I), the index commutes.
The commutativity between {a † p a † q a r a s } JW and {a † i a † j a k a l } JW terms can be verified by considering all possible interleaved orderings of the 8 indices, subject to the constraint that p > q > r > s and i > j > k > l.There are 8  4 = 70 such cases that can be explicitly checked (or 35 cases, accounting for symmetry) to prove Theorem 1. Figure 3 demonstrates four representative cases, which provide useful intuition for the general case.In particular, when sliding one of the {p, q, r, s} indices while keeping {i, j, k, l} fixed, the parity of the number of anti-commuting indices is invariant.Thus, this parity is always even, and two {a † p a † q a r a s } JW and {a † i a † j a k a l } JW terms with disjoint indices always commute, as claimed in Theorem 1.

FIG. 3.
Four representative examples illustrating why {a † p a † q aras}JW and {a † i a † j a k a l }JW terms always commute (have an even number of anti-commuting indices) when {p, q, r, s} ∩ {i, j, k, l} = ∅.At the top, no black bars align with blue rectangles, so there are 0 anti-commuting indices.Below, r > i > s > j, so there are 2 anti-commuting indices: i and s.Below that, observe that sliding the i endpoint into the interval between q and r does not change the parity of the number of anti-commuting indices.The bottom example shows a case with the maximal number of anti-commuting indices, 6.

IV. EXISTENCE OF LINEARLY-SIZED PARTITIONS
Consider the set of Pauli strings contained in for N divisible by 4.There are 16 N 4 = 4N ∈ O(N ) Pauli strings in this set.However, since the indices are disjoint, Pauli strings from each of the N 4 subsets can be measured simultaneously by Theorem 1.In particular, the Pauli strings can be partitioned into 16 ∈ O(1) measurement families.In fact, they can even be partitioned into just 2 measurement families by noting that the MIN-COMMUTING-PARTITION within each {a † p a † q a r a s } JW term is 2, as described in [4, Section 6].
A natural question is whether all N 4 p > q > r > s terms in Equation 1 can be partitioned in such a fashion-if so, then this constitutes a partitioning of the ) commuting families.Intuitively, this is the same problem as trying to schedule a round-robin tournament of N players with 4 players-per-game into N −1 3 rounds.We can think of each index as a player, and 4-player games can be scheduled simultaneously if they don't share players.Equivalently, these problems can be bijected to a graph theory problem: does the 4-uniform complete hypergraph on N vertices admit a 1-factorization?
The answer to all of these questions is affirmative, per Baranyai's Theorem [12].In our case, it means that for N divisible by 4, the = 35 rows has two fermionic terms with disjoint indices-thus, their corresponding Jordan-Wigner qubit encodings can be measured simultaneously.

V. CONSTRUCTION OF LINEARLY-SIZED PARTITIONS
Prior literature refers to Baranyai's original proof as either being non-constructive [13,14] or providing an exponential-time construction [15] (prior literature varies in what exactly is considered Baranyai's proof).In order for Baranyai's proof to be useful to us, we need a fast polynomial-time algorithm for partitioning the N 4 subsets of N into N −1 3 groups, each containing N/4 disjoint subsets.Fortunately, due to later work by [16], a proof was provided that leads to an efficient construction [17].The proof is based on maximum flows in network flow graphs.
We refer readers to [18] for a lucid explanation and to [19] for an implementation in code.This implementation was used to generate Table I.The pseudocode is given in Algorithm 1.An outer loop is called N times, and each iteration solves for maximum flow on a network = 70 a † p a † q aras terms into = 35 subsets, with disjoint indices between the two terms in each subset.Such a partitioning is guaranteed to exist for all N divisible by 4, per Baranyai's Theorem [12].
with O(N 3 ) vertices and O(N 4 ) directed edges.Since the maximum flow in the proof construction has a value of O(N 3 ), solving for it with the Ford-Fulkerson algorithm would incur a cost of O(N 7 ) per loop iteration [20].However, due to work on flow-rounding [21][22][23][24], this runtime is reduced to O(N 4 log N ).This is because for each flow network, a fractional solution is known that can be rounded to an integral solution faster than computing an integral solution from scratch.Thus, the total runtime of the Baranyai constructive proof is O(N 5 log N ).
A useful aspect of the Baranyai-based approach to molecular Hamiltonian partitioning is that it depends

., N ] do
Create flow network with two layers: O(N 3 ) partition nodes and O(N 3 ) subset nodes; Set capacities for O(N 4 ) edges per [16,18] construction; Set fractional maxflow of value

VI. DISCUSSION
We have demonstrated that Jordan-Wigner encoded molecular Hamiltonians can be partitioned into O(N 3 ) commuting families, each containing O(N ) Pauli strings.Our proof stems from Baranyai's Theorem, which has a constructive form that efficiently yields partitionings, per Algorithm 1.Since commuting families can be measured simultaneously, this constitutes a reduction in the measurement cost of VQE from O(N 4 ) naively to O(N 3 ) with these partitions.The simultaneous measurement circuits are efficient too, requiring only O(N ) gates, since the shared eigenbasis of the commuting partitions can be expressed as a tensor product over 4-qubit chunks.
An advantage of our technique is that it only depends on N and is pre-computable for all N -qubit molecular Hamiltonians.Further optimizations may be possible by analyzing h pqrs coefficients in Equation 1.For example, for molecular Hamiltonians, we expect the h pqrs = h srpq symmetry [25], which reduces the number of relevant Pauli strings in each {a † p a † q a r a s } JW set from 16 to 8. Recent work [9] has gone deeper in this direction, by factoring molecular Hamiltonians into a form that empirically seems to have O(N ) partitions.Moreover, the simultaneous measurements only appear to require O(N 2 ) gates, even with linear qubit connectivity.It would be informative to benchmark this recent work against our strategy, which produces O(N 3 ) partitions but requires only O(N ) gates under full connectivity.
Beyond VQE, our technique may be useful in other quantum computational chemistry applications.For example, the simulation of Hamiltonian dynamics could be improved by partitioning into commuting families.Naively, Hamiltonian evolution is performed by Trotterization that requires fine time slicing to account for noncommuting terms [26,Section 4.7].However, by ordering Pauli strings in a Hamiltonian such that large commuting sets are consecutive, the Trotterization cost could be diminished.This approach seems promising since our work proves an asymptotic gain for partitioning.Moreover, simultaneous measurement circuits would not be needed, so this re-ordering of a Trotterization would have essentially no quantum cost.

FIG. 1 .
FIG. 1. Pictorial representation of the Jordan-Wigner encoding of a †p a † q aras.Repeating Z's span the blue rectangles between p and q and between r and s.The other three ranges have repeating I's.At indices p, q, r, and s, which are denoted by the black vertical bars between the blue and white rectangles, we can have either X or Y .Thus, there are 2 4 = 16 Pauli strings involved in the Jordan-Wigner encoding.

FIG. 2 .
FIG. 2. Pictorial representation of the commutation on eachindex between two {a † a † aa}JW rectangles.All indices commute except possibly the 8 indices with black bars-these indices anti-commute when the black bar (X or Y ) is vertically aligned with a blue rectangle Z.In this example, the there are an even (4) number of anti-commuting terms, so the two patterns commute.

N 4 ∈ 3 ∈
O(N 4 ) terms can be partitioned into N −1 O(N 3 ) sets, such that the N 4 terms within each set have disjoint indices.Table I demonstrates such a partitioning for N = 8 qubits.Each of the 8−1 3
only on N and not on the h pq and h pqrs coefficients in Equation1.In this sense, it is pre-computablefor instance, the N = 8 partitioning in TableIwill apply to all 8-qubit Hamiltonians.By contrast, MIN-COMMUTING-PARTITION techniques in prior work operate on the specific molecular Hamiltonians of interest.Thus, the partitionings are not pre-computable and the classical cost of partitioning must be accounted for in time-to-solution.