Subdivided Phase Oracle for NISQ Search Algorithms

Because noisy, intermediate-scale quantum (NISQ) machines accumulate errors quickly, we need new approaches to designing NISQ-aware algorithms and assessing their performance. Algorithms with characteristics that appear less desirable under ideal circumstances, such as lower success probability, may in fact outperform their ideal counterparts on existing hardware. We propose an adaptation of Grover's algorithm, subdividing the phase flip into segments to replace a digital counter and complex phase flip decision logic. We applied this approach to obtaining the best solution of the MAX-CUT problem in sparse graphs, utilizing multi-control, Toffoli-like gates with residual phase shifts. We implemented this algorithm on IBM Q processors and succeeded in solving a 5-node MAX-CUT problem, demonstrating amplitude amplification on four qubits. This approach will be useful for a range of problems, and may shorten the time to reaching quantum advantage.


I. INTRODUCTION
With the advent of NISQ (Noisy Intermediate-Scale Quantum [1]) processors, implementation of various NISQ-friendly algorithms, such as VQE [2], is in progress.On the other hand, many algorithms whose theoretical computational complexity guarantees quantum acceleration require large-scale quantum circuits.Practical scale implementation of these algorithms will be difficult with NISQ devices, and future quantum computers with error correction capabilities will be needed.
Cross et al. proposed Quantum Volume (QV) as a quantitative indicator of the computing power of quantum processors [3].QV might double every year due to improvements in quantum processor performance [4].Determining the relationship between the QV of a processor and the size of the quantum circuit it can perform is essential in determining when a future quantum processor can solve a particular problem.
FIG. 1 shows an abstract diagram of the relationship between classical and quantum computers.Hardware improvements and error mitigation reduce the effect of decoherence.The increased QV due to their contribution allows us to move to the upper right along this line.Improvements in algorithm, compilation, and structural connectivity both move down and change the slope of this line.
Focusing on the algorithm aspect, we describe the following contributions in this paper: 1) replacing the combination of the digital accumulator plus the binary (0 or π) phase flip with the subdivided oracle phase, and 2) an implementation method for n-controlled Toffoli gate suitable for processors with low connectivity.As an application of the first technique, we present an implementation for the MAX-CUT problem.The second technique addresses a fundamental need and may become an essential component of many algorithms.* {satoh,rum,rdv}@sfc.wide.ad.jpFIG. 1.The significance of software development.The solid, straight lines indicate the quantum computing power achieved to date, and the dashed line is the performance that will be realized assuming continuing increases in QV.Through the combined improvement of software and hardware, the aim is to reach the intersection with the curve of the ability of classical computers.Thus, software advances have the potential to shorten the time to the achievement of quantum advantage.
Using these approaches, we have attempted to clarify the relationship between Grover's algorithm [5] (Sec.II A) and QV.As a preliminary step, we designed an algorithm to obtain an exact solution in the MAX-CUT problem (Sec.II B and III).In this algorithm, when the input length exceeds 4 qubits, the total number of Controlled-NOT (CX) gates exceeds 100, and presentday quantum processors cannot obtain a useful answer.To miniaturize the algorithm as much as possible, we reduced the weight of the C ⊗n X gate used in the diffusion operator (Sec.IV B) and adapted the phase information fragmentation in the oracle (Sec.IV A).Although this makes it possible to realize a smaller quantum circuit than the above algorithm, it is not possible to transform a given problem into a decision problem, so we cannot call our solution NP-Complete.The correctness of the solution obtained depends on the average degree of the graph.
We executed our proposed algorithm on two IBM transmon systems, ibm_ourense with QV = 8 and ibm_valencia with QV = 16, and evaluated the success probability and KL divergence.The 3-data qubit Grover algorithm for the K 1,3 MAX-CUT found the correct answer over 29% (theoretical 34.7%) of the time on both processors (Sec.IV C).The 4-data qubit Grover algorithm for the K 1,4 MAX-CUT found the correct answer more than 11% (theoretical 21.2%) of the time on both processors.In the second experiment, the average KL divergence value of ibm_valencia was 0.457, while that of ibm_ourense was 0.831, substantially better than completely mixed state values of 1.149.
These results indicate that probability amplification using Grover on a 4-qubit problem, which has conventionally been considered difficult [6,7], is possible using current processors.For this particular problem, differences in the decoherence characteristics of the two processors result in the off-answer elements of the superposition decaying more rapidly than the correct answer, resulting in an unexpectedly small decrease in overall success probability in the processor with the smaller QV.However, we expect that in more general cases, the success probability will more closely track the KL divergence.Also, our algorithm scales reasonably well on processor topologies with degree 3 qubits.Therefore, as processors with higher QV appear in the future, we can benchmark the maximum executable size of the Grover algorithm using our algorithm.

A. Grover's Algorithm
Grover's algorithm is a quantum search algorithm to find the index of the target element x ∈ {0, 1, ...2 n −1} s.t.f (x) = y, given f and y, in O( √ N ) operations with high probability, where n is the number of qubits and N = 2 n is the size of the list [5].The feature of this algorithm is that even if the database is disordered, the square root acceleration is guaranteed with respect to the classical search, which requires an average of N 2 operations [8].
The procedure of Grover's algorithm is as follows: Here, f (x) = 1 if x is the target element, otherwise 0.

Diffusion
Apply the diffusion operator D to amplify the probability amplitude of the target element: Here, C ⊗n−1 X and H T denote n-controlled X gate and H to the target qubit of C ⊗n−1 X. H ⊗n corresponds to the gates for initialization.

Iteration
Repeat O and D. The optimal number of iterations is 4 π √ N when the number of targets is 1.

Measurements
Measure all qubits to read the target data.In general, Grover's algorithm uses an n-qubit data register and work space qubits for oracle execution, as in FIG. 2.

B. The MAX-CUT problem
MAX-CUT is the graph theory problem of finding the maximum cut of given graph G(V, E).MAX-CUT can be considered to be a vertex coloring problem using two colors that involves filling in some of the vertices with one color, and the rest of vertices with another color.Then we count the edges that exist between vertices of different colors as if they were cut.To solve this puzzle, we need to find a coloring combination which contains the highest number of edges connecting different color of vertices from 2 |V |−1 possible colorings.On a general graph, MAX-CUT is known to be an NP-hard class problem [9].

C. Current quantum processors
In recent years, NISQ (Noisy Intermediate Scale Quantum [1]) devices that can perform quantum computation with a short circuit length have appeared, although the scale and accuracy are insufficient to perform continuous, effective error correction.Various physical systems such as superconductors, ion traps, quantum dots, NV centres, and optics are used in NISQ devices [10,11].
The early 20-qubit superconducting processors from IBM had high connectivity and the maximum degree was 6, while the latest processors have a high gate accuracy but the maximum degree is 3 (FIG.3).

Quantum Volume and KQ
Quantum Volume (QV) is a measure proposed by IBM that shows the performance of NISQ [3].Quantum Volume QV is defined as where m denotes circuit width (number of qubits) and d denotes circuit SU (4) depth.The QV for each processor is calculated from single and two-qubits gate errors, connectivity, measurement errors, etc.The computation fails with high probability when a given circuit satisfies Here, eff is an effective CX gate error value that gradually increases with connectivity.
In this paper, we experimented with two 5qubit processors, ibmq_ourence with QV= 8 and ibmq_valencia with QV= 16.
KQ is a measure of the capabilities of the machine, independent of the algorithm.In 2003, Steane proposed a similar measure focusing on the algorithmâĂŹs needs and on error correction [12].For an algorithm using Q qubits and requiring K time steps on those qubits (in suitable units), the space-time product KQ is a guideline to the required error rate, which should be below 1/(KQ).
Open Quantum Assembly Language (QASM) The IBM Q processors accept gates written in the QASM language [13].All circuits are decomposed into four types of gate.We describe those gates and the required pulses in the IBM Q superconducting processors in Tab.I. Since no pulse is required, we can per- Cross-resonance pulses and One π 2 pulse.

TABLE I. Gate set for QASM
form U 1 with zero cost.The error level of U 3 is twice U 2 and approximately an order of magnitude less than the CX gate [4].The performance of ibmq_ourense and ibmq_valencia is shown in Tab.IV and V in the appendix.

III. GROVER ALGORITHM TO SOLVE MAX-CUT PROBLEM
We propose Grover's algorithm for solving the MAX-CUT of a given graph G.The following simple coloring approach is an exhaustive classical search: Step 1. Color all vertices black or white.
Step 2. Count the number of edges with different color vertices at both ends.
Step 3. Color the vertices with a different pattern from the existing one and return to Step 2.
Step 4. After testing all possible coloring patterns, the pattern with the largest number of edges counted corresponds to the MAX-CUT.
We can apply Grover's algorithm by assigning black to |0 and white to |1 in this procedure [14].To illustrate this correspondence, we show a simple example using a star graph K 1,2 in FIG. 4. The MAX-CUT for a graph with m edges and n vertices can be found by the following procedure.
Step 2. Flip the sign of the input where the number of edges to be cut exceeds t. (the oracle) Step 3. Amplify the probability of any input whose sign is inverted.(diffusion) Step Step 5. Increase t if the output is legal for the graph, decrease if the output is illegal.If t returns to a value taken in a prior iteration, it is MAX-CUT, and the algorithm ends.Otherwise, the process returns to Step 1.
The number of iterations can be optimized by the quantum counting algorithm [15].In addition, if an excessively low value t is set such that the sign of the majority of inputs is inverted, the probability of the input with the sign not inverted is amplified.Since a binary search can be done by appropriately increasing and decreasing t, we can get accurate MAX-CUT by log 2 m iterations.
The most straightforward way to implement an oracle for a counting problem is by using a binary accumulator register.We describe the oracle's construction below.

A. Oracle circuit design
We discuss how to apply the above procedure when given a star graph K 1,4 (FIG.5a).First, we prepare 5 data qubits to describe the state of nodes.When there is an edge between node A and B, as a cut checker for each edge, we introduce the following sub-oracle O S(A,B) [14]: Here, S is an accumulator register large enough to store the number of cut edges.For this problem, log(|E| + 1) = 3 qubits are enough.When the states of A and B are different, the edge between A, B is cut, and the information of cut edges on S is updated.We can implement O s(A,B) using a quantum increment circuit as shown in FIG.5b.After the execution of O S for all edges, we set the threshold value t and perform the phase inversion operation for inputs that equal or exceed t using the flag qubit.(In this problem, t corresponding to MAX-CUT is obviously 4.) We show the circuit corresponding to these operations in FIG. 6.We also show in detail how to configure phase shift (Pshift) operation in Appendix A.

B. Complete circuit implementation
When t = 4, we can get |01111 and |10000 as solutions by combining the above oracle and diffusion and repeating those the appropriate number of times.When implementing on a processor with the current QV, the proposed circuit is too large in both number of qubits and depth.
For example, the half adder contains a Toffoli gate that requires 6 CX gates on IBM Q devices.From the discussion in Sec.II, the upper limit of CX gates that can be used to obtain valid results is understood to be around 10. Taking into account the need to uncompute portions of the circuit, we will not be able to include multiple sub-oracles and anticipate successful execution.
We have already proposed a method to reduce CX gates by eliminating adders and increasing ancilla qubits [14].We still need 36 CX per iteration to solve MAX-CUT in the smaller graph K 1,3 .Needless to say, there is room for improvement in our proposed oracles.However, in order to solve MAX-CUT with Grover's algorithm on a real processor in the near future, drastic improvement is necessary.Therefore, we next propose a new data structure that does not store the number of cut edges in binary data.

IV. APPROXIMATED GROVER SEARCH FOR MAX-CUT
In this section, we describe Grover's algorithm using phase subdivided oracle operators instead of the conventional 0 and π.By using this method, we can remove the adders used in the previous section and reduce the circuit length significantly.We also propose a diffusion operator implementation that requires fewer CX gates for an actual processor design by using relative phase Toffoli gates [16,17].We describe those methods and the verification of the effectiveness for the MAX-CUT problem below.

A. Oracle circuit using subdivided phases
In Sec.III, storage of the evaluation value k (the number of cut edges) and its calculation using adders led to a large increase in the number of CX gates and occupied the largest portion of the whole circuit.
Therefore, we propose a method to express the evaluation value by the number of subdivided phases.In the MAX-CUT problem, we use the same data structure for node color as in Sec.III and unit phase where |E| denotes the number of edges in the graph G.
For the cut edge determination, we introduce the following sub-oracle O s using sub-divided phase θ 0 .If an input |ψ a has a cut edge between vertices A and B, then we add θ 0 to the phase information: Similarly, based on the whole oracle operation O , the best answer input |ψ b becomes as follows, (for MAX-CUT value.): where kθ 0 does not exceed π.We next discuss the validity of θ 0 and how to find the optimal subdivided phase θ opt .

Optimal subdivided phase
When the given graph is a tree (|V | = |E| + 1 for a connected graph), the average value of the added phase α(θ) after applying the above oracle O is: From Eq. ( 2), the probability amplitude after diffusion execution becomes: If |V | = 5, the oracle adds the phase e i4θ to the input corresponding to the MAX-CUT.When θ = θ 0 , the probability of finding MAX-CUT p(θ) becomes: We can maximize the amplification factor by adjusting the subdivided phase: Then, maximized p(θ) and optimal subdivided phase are: The amount of amplification depends on the difference between the average value of the added phase.Therefore, the probability of the worst solution that does not cut any edges is amplified similarly to the proper MAX-CUT solution.
On the other hand, since the average value increases as the graph become denser, the worst-case probability becomes larger than MAX-CUT.Despite such drawbacks, this algorithm requires many fewer gates than searching for an exact solution.Next, we show a specific implementation method.

Implementation of oracle
When the θ is not 0 or π, the sub-oracle in Eq. 8 consists of the following gate sequence: Due to the limitations of the current IBM Q processors within the framework of QASM [13], we need two CX gates and single-qubit gates to execute one CR A,B Z (θ) exactly.
Here, the error values on single-qubit gates are one order of magnitude smaller than that of two-qubit (CX) gates [4].Therefore, we focused on reducing the number of CX gates, and the number of single-qubit gates such as U 3 gate is basically not a problem.Hence we approximate the whole sub-oracle with two CX gates and six U 3 gates by KAK decomposition [18,19] as shown in FIG. 7. The error level of a CX gate of the latest IBM FIG. 7. Approximation of sub-oracle circuit using KAK decomposition at θ0 = π 4 .The approximation accuracy is over 99%, and the average error of the CX gate of the Q processor as of January 2020 is about 1%.Until the CX gate error is halved, the total error will be dominated by the two-qubit gates.
Q processors used in this paper is about 1% at best [4].Hence, we approximate this oracle circuit with two CX gates [3].

Introduction of virtual vertex
The output of the approach in Sec.III has redundancy due to the symmetry of the problem.In order to eliminate this and double the solution space in a given number of qubits, we introduce a virtual vertex whose state is fixed at |0 V .
The oracle for the edge connected to this virtual vertex can be replaced by a single qubit operation R Z (θ 0 ) on the other vertex.In order to reduce the number of CX gates in the oracle part, it is effective to virtualize the highest degree vertex.For example, when the given graph is K 1,4 , we can perform the oracle circuit without using CX gates, as shown in Fig. 8.

B. Implementation of diffusion
After executing the oracle in Sec.IV A, we perform the normal diffusion operator for Grover's algorithm.As described in Sec.II, the diffusion circuit for n+1 data qubits require one n-controlled NOT (C ⊗n X) gate.Therefore, we discuss how to implement a C ⊗n X gate under the constraints of the IBM Q processors.
FIG. 8. Implementation of oracle circuit O using sub-divided phase for the star graph K1,4.All sub-oracles O s(V,k) can be replaced with RZ (θ) by assigning the highest degree vertex to the virtual qubit.

C ⊗n X gate implementation
To construct C ⊗n X, we introduce the relative phase Toffoli gates RT OF .A Toffoli gate is known to require 5 controlled unitary gates or 6 CX gates [20], but RT OF works almost like a Toffoli gate, requiring only 3 CX gate.In compensation for the reduced number of CX gates, an undesired phase is added to the target which must be compensated for later.We adopt two types of RT OF , shown in the FIG. 9.Both of these RT OF can The controlled-controlled-iX gate [21] (RT OFiX ).This gate uses four U 1 gates and two U 2 gates (see Tab. I).
The Margolus gate (RT OFM ).In addition to the normal Toffoli operation, the sign of |101 is inverted.This gate uses four U 3 gates.
FIG. 9. Two RT OF implementations adopted for C ⊗n X.
be implemented on a system with only a one-dimensional qubit layout.Although the number of CX gate is equal, RT OF iX does not require U 3, which reduces single qubit rotation errors.We also introduce the Toffoli gate with built-in SWAP operation.A Toffoli gate implementation with the minimal 6 CX gates requires three qubits interconnected in a triangle.Recent IBM Q devices after ibm_tokyo do not have a structure that can embed triangles.To deal with this situation, we propose a Toffoli circuit suitable for a one-dimensional layout, as shown in FIG.10.This FIG. 10.Toffoli with SWAP circuit.By adding the CX gates surrounded by a broken line to the general Toffoli gate decomposition, SWAP is built in, and the circuit can be performed with qubits connected in a straight line.
circuit requires one additional CX, the minimum overhead.However, since SWAP is built in, it is necessary to consider the location of qubits in the output state.
By using those components, we can configure a C ⊗n X gate for recent IBM Q devices using 6n − 5 CX gates.It is known that a C ⊗n X gate can consist of 2n − 3 Toffoli gates with n − 2 ancillary qubits (initialized to |0 ) [16].A Toffoli gate contains at least 6 CX gates.As shown in FIG.11 implementations, we show the procedure for C ⊗n X in Algorithm 1.

Algorithm 1 C ⊗n X gate implementation
Input: n + 1 data qubits d, n − 2 ancillary qubits a. Output: Data qubits on which the C ⊗n X gate is performed and SWAP gate between last two data qubits.RTOF(d0, d1, a0) for k=0; k<n-3; k++ do return Target states.12: end procedure If the processor can embed the structure shown in FIG. 12, the procedure can be executed without additional SWAPs.When n = 1, we can embed this in all recent processors, including the 5-qubit processors ibmq_vigo (ourense).
Similarly, when n = 8 or less, we can embed in the processors ibmq_boeblingen (singapore), which have 20 qubits.

C. Experiments on IBM Q devices
We evaluate our proposed algorithm by finding MAX-CUT of K 1,3 and K 1,4 on current processors.If the given graph is K 1,4 , our algorithm requires 5 physical qubits (4 data, 1 ancillary) and 1 virtual qubit.FIG. 13 illustrates the correspondence between the given graphs and qubits.We investigate the performance of each component and the whole algorithm.
C ⊗n X gate performance The CX gate error rates of ibmq_ourence and ibmq_valencia are around 1 %, an order of magnitude higher than errors of single-qubit gates (see TAB. IV and V in Appendix F).We performed several experiments to verify the performance of U 3 gates and measurement error mitigation [22].The results in FIG.20 (Appendix B 1) show that the single-qubit gate error and the mitigated measurement error are much smaller than the CX error.
Using Algorithm 1, we can assemble a C ⊗3 X gate from a Toffoli with SWAP gate, and two types of RT OF gates.To evaluate these gate performances, we reconstructed output states.We calculated fidelities of those states as shown in FIG.14. Additionally we also confirm the output of RT OF gates C ⊗3 X gates in the computational basis in FIG.21 and 22 (Appendix B 2).
These results show that ibmq_valencia is a better device than ibmq_ourense in accordance with their QV values in terms of average fidelity and variance.

Whole circuit performance on real processors
To evaluate our algorithm performance, we first execute the whole circuit (see FIG. 23) with 7 CX for K 1,3 .In this experiment, we adopt two subdivided phases and obtained from Eq. (11).FIG. 15 shows the execution results using two processors.In all experiments, the output probability of the correct answer |111 is about 28%, which is a good result even when compared to the ideal value of 33.4% with θ 0 and 34.7% with θ opt .For a more quantitative evaluation we show the KL divergence in FIG.16.A better value for ibm_ourense would suggest that the circuit is small enough for both processors.We next execute the whole circuit (see FIG. 24) with 13 CX for K 1,4 .As discussed in Sec.IV A, we adopted both 0.25π and 0.323π for the angle of divided phase oracle.We also adopt RT OF iX and RT OF M in C ⊗3 X gate.We show results on two processors in FIG. 17.In these experiments, the probabilities of the correct answer |1111 are increasing.Those probabilities are maximum when θ opt is used in any processors, and is about 11%, about half the theoretical probability of 21.2%.On the other hand, there is a significant difference between the processors in the probability amplification and suppression of incorrect answers.This may be due in part to the |1111 output being susceptible to relaxation errors.We show the difference in performance between the two processors using KL in FIG.18. Due to the symmetry of the problem, the probability of |1111 , which is the MAX-CUT value, and |0000 , where no edge is cut, should be amplified the most.Nevertheless, only one of the results is greatly amplified.In the circuit used in this experiment, the oracle does not include CX, and diffusion includes the theoretically minimum number of CX in current IBM Q processors.The fact that we were unable to achieve the ideal probability amplification even when such a circuit was adopted seems to indicate that the number of qubits and circuit depth exceed the current processor capability.Further, considering the effect of relaxation, an increase in the probability of a solution containing more |0 values seems natural.However, depending on the qubit mapping, the probability of solutions containing |1 clearly increases.This may be due to an unknown difference between the data structure in the development environment Qiskit and the data structure on the actual IBM Q system.

V. CONCLUSION
As of this writing, there has been no report that any problem has been solved using 4-qubit unmodified Grover search on a solid-state quantum computer.As shown in Sec.III, the scale of the circuit required for the algorithm exceeds the limit that existing quantum processors can handle.Thus, we investigated alternate solutions appropriate for the NISQ era, reducing the number of qubits and gates required by over one order of magnitude via the sub-divided phase oracle.This oracle, rather than the normal 0/π phase flip of ordinary Grover, applies a smaller phase shift to less desirable outcomes and a larger phase shift to more desirable ones.While this initially appears less favorable, the dramatic reduction in required fidelity makes it a good tradeoff for small problems, as shown by our experimental results demonstrating effective amplitude amplification for 4-qubit search problems as exemplified by solving the MAX-CUT problem.Further work will help to determine the range of problem sizes and characteristics for which this technique can be applied.
With our current modest circuit depths, overall performance is still strongly affected by measurement errors, but it is worth comparing the KQ of our algorithms with the reported QV of the processors.We found that the K (1,3) solution using 7 CNOTs on 3 qubits (KQ = 7 × 3 = 21) works well on quantum volume QV=8, and very similarly on QV=16.The K (1,4) solution using 13 CNOTs on 4 qubits (KQ = 13 × 4 = 52) works, although not well, on QV=8; it performs much better, but still with limited effectiveness, on QV=16.This circuit is one of the largest KQ values reported to have been run successfully on a solid-state quantum computer to date.KQ and QV are similar measures and it will be interesting to continue tracking their relationship and predictive value for execution success over the coming generations of computers.
In addition, we designed a diffusion operator using the minimum number of CX gates within the constraints of recent IBM Q processors, by incorporating Toffoli gate variants with phase shifts that we compensate for later in the algorithm.This technique is exact, and will benefit a broad range of algorithms beyond the NISQ era.

ACKNOWLEDGMENTS
This research was supported by the Q-LEAP program of Japan Science and Technology Agency (JST).The results presented in this paper were obtained in part using an IBM Q quantum computing system as part of the IBM Q Network.The views expressed are those of the authors and do not reflect the official policy or position of IBM or the IBM Q team.We thank Miguel Sozinho Ramalho and Lakshmi Prakash for working with TS and YO on the project that inspired this paper at Qiskit Camp Vermont 2019.TS would like to thank Yuri Kobayashi, Atsushi Matsuo, and Shin Nishio for their collaborative activities for the Quantum Challenge, which helped refine the ideas in this paper.We are grateful for meaningful discussions with Shota Nagayama at Mercari, Inc. errors by applying the inverse of M to the raw data matrix R from the actual circuit (e.g., FIG.21): However, in general, M is not invertible; instead, the corresponding Qiskit filter object derived from M applies a least-squares fit.All of the real-device data figures in this paper utilize this approach.
Appendix C: The circuits for the MAX-CUT problem We show the circuit to find MAX-CUT of K 1,3 in Fig. 23.Unlike Eq. ( 2), we adopted ZH(HZ) for HX(XH) and Toffoli with SWAP gate for Toffoli gate.The former change allows us to reduce the number of U 3 gates, thereby reducing gate errors (in the case where a series of single-qubit gates are not integrated into one U 3 gate).The latter change avoids connectivity constraints with minimal overhead.
We show the circuit to find MAX-CUT of K 1,4 in Fig. 24.The gate set of the diffusion part except ZH and HZ constitutes one C ⊗3 X gate.

Appendix D: Qiskit Versions
The version of Qiskit packages we use are listed in Table II.
24.The circuit of MAX-CUT solver for K1,4.Each U k gate is determined by the type of adopted RT OF gate.

Q
w a re e n h a n c e d Q C C l a s s i c a l C o m p u t i n g Q u a n tu m C o m p u ti n g

FIG. 3 .
FIG.3.Qubit topology of IBM Q processors Early devices (left side) had a dense structure, while the recent devices (right side) are composed of relatively sparse qubit connections.

FIG. 4 .
FIG.4.Data structure for MAX-CUT.We can find MAX-CUT |010 012 (or |101 012) by counting the cases where the states of the qubits corresponding to both ends of the edge are different.

FIG. 5 .
FIG. 5. (a) A star graph K1,4.Each node number denotes the corresponding data qubit.(b) If the states of qubit A and B are different, the accumulator register |ψs becomes |ψs + 1 .

FIG. 6 .
FIG.6.Oracle circuit.O denotes the sequence of all suboracles OS.After the execution of Pshift, we have to uncompute O † to propagate sign reversal for inputs equal to or exceeding the threshold value t.

FIG. 11 .
FIG. 11.Configuration of C ⊗n X using n − 2 ancillary qubits and 2n − 4 Toffoli gates.Toffoli gates other than the one enclosed in the dashed box can be replaced with the relative phase gate.Thereby, the number of CX gates can be reduced.

8 (b) Example mapping for n = 8 FIG. 12 .
FIG. 12. Qubit connections for Algorithm 1. Data and ancilla qubits are denoted by d k and a k , respectively.(a) shows the interactions required by the algorithm; (b) shows how they might map to one of the 20-qubit machines.

FIG. 13 .
FIG.13.Correspondence between qubits and star graphs K1,n.(a),(b)  We assign the virtual qubit V to the highest degree node and the other nodes to physical qubits.(c) Mapping of variables to the machine for K1,4 on both processors.

FIG. 14 .
FIG. 14. Gate fidelities of various Toffoli gates on real devices.Light blue points are the gate fidelity on ibmq_ourense with QV = 8 and deep blue points are the gate fidelity on ibmq_valencia with QV = 16.For each gate type, we tested all possible mappings to the processor topology, collecting the results of 8192 shots for each pattern.The top and bottom bar of each data bar are the maximum and minimum values of the experimental results.

( a )FIG. 15 .
FIG. 15. Results of the complete subdivided oracle search for K1,3 MAX-CUT.On each processor, we created eight different qubit mappings, and executed each circuit 819200 times with measurement error mitigation.Error bars represent the standard error 1σ.

( a )
FIG. 17. Results from execution of the complete subdivided oracle search for K1,4 MAX-CUT.Two qubit mappings were tested for each circuit.Each circuit is executed 819200 times with measurement error mitigation.Error bars represent the standard error 1σ.

( a )
FIG. 21.Execution of three types of Toffoli gate on the real devices.To generate the results for each row, 8192 trials were performed for each input.Entries are output probabilities, with each row summing to approximately 1.Each row denotes the input value, and each column the output value.

( a )FIG. 22 .
FIG.22.Execution of C ⊗3 X gate on the real devices.To generate the results for each row, 8192 trials were performed for each input.Entries are output counts, with each row summing to approximately 1.Each row denotes the input value, and each column the output value.

FIG. 23 .
FIG.23.The circuit of MAX-CUT solver for K1,3.The θ of RZ changes the amplification rate for correct answer.

TABLE II .
Qiskit packages version Appendix E: Date-time Each experiment was performed on the dates listed in Table III.Date-time Performance of RT OFiX gate, RT OFM gate and Toffoli with SWAP gate on ibmq_ourense 2019/12/24 Performance of C ⊗3 X with RT OFiX gate and C ⊗3 X with RT OFM gate on ibmq_ourense Performance of RT OFiX gate, RT OFM gate and Toffoli with SWAP gate on ibmq_valencia 2020/1/6 Performance of C ⊗3 X with RT OFiX gate and C ⊗3 X with RT OFM gate on ibmq_valencia 2020/1/6 Performance of RY gate on ibmq_ourense 2020/1/8 Performance of RY gate on ibmq_valencia 2020/1/8

TABLE III .
Date and time when experimental data have been takenAppendix F: Performance of IBM Q processorsWe show single-qubit gate and readout performance of IBM Q processors in TAB.IV.We also show two-qubit gates performance in TAB.V.

TABLE IV .
Qubit performance on Jan 1 2020.

TABLE V .
CX gate performance on Jan 1 2020.