Pulse-engineered Controlled-V gate and its applications on superconducting quantum device

In this paper, we demonstrate that, by employing OpenPulse design kit for IBM superconducting quantum devices, the controlled-V gate (CV gate) can be implemented in about half the gate time to the controlled-X (CX or CNOT gate) and consequently 65.5\% reduced gate time compared to the CX-based implementation of CV. Then, based on the theory of Cartan decomposition, we characterize the set of all two-qubit gates implemented with only two or three CV gates; using pulse-engineered CV gates enables us to implement these gates with shorter gate time and possibly better gate fidelity than the CX-based one, as actually demonstrated in two examples. Moreover, we showcase the improvement of linearly-coupled three-qubit Toffoli gate, by implementing it with the pulse-engineered CV gate, both in gate time and the averaged output-state fidelity. These results imply the importance of our CV gate implementation technique, which, as an additional option for the basis gate set design, may shorten the overall computation time and consequently improve the precision of several quantum algorithms executed on a real device.


I. INTRODUCTION
There are several type of platforms for implementing quantum computer, such as superconducting, ion, and optical devices. In this paper, we study the problem of reducing the circuit depth (total gate time) in the superconducting quantum device provided by IBM (called IBM Quantum), where Qiskit serves as the software development environment. Qiskit has two representation languages for designing quantum programs: OpenPulse [1] and QASM.
OpenPulse is a language for specifying and physically controlling the pulse level of a target quantum gate, that enables introducing a large freedom in circuit design. As a result, OpenPulse can reduce the execution time through optimal pulse design for various type of quantum gates [2], [3]; also it can be applied to generate a new gate specific to a particular physical simulation [4]. Recently, a computational framework has been proposed to aid such synthesis problems [5].
QASM is the language for the circuit design with several quantum gates. Physically, each gate is decomposed into a set of precisely calibrated gates chosen from the universal quantum gate set [6], [7]. The universal gate set used in IBM Quantum is composed of single-qubit gates and the Controlled-X (CX, or often called CNOT) gate [8]. The point of taking this fixed gate set is that, because it contains only 1 two-qubit interaction gate (i.e., CX gate), the calibration process is relatively easy. In particular, CX gate can be implemented precisely via the cross resonance (CR) Hamiltonian [9]- [13], with the help of the echo scheme and the cancellation pulse technique [14]. However, the error rate of CX gate is still much higher than that of single-qubit gates [15], due to the longer pulse length (gate time) than that of single-qubit gates and the effects of cross-talk [16]- [18]. Hence, if a quantum algorithm must be realized on a circuit with unnecessarily many CX gates due to the QASM constraint, the accuracy of circuit will significantly decrease.
The above-mentioned issue may be resolved by adding some two-qubits gates to the default universal gate set composed of single-qubit gates and CX gate. In this work, we take the Controlled-V (CV) gate whose matrix representation VOLUME X, 2020 1 arXiv:2102.06117v3 [quant-ph] 27 Apr 2022 in the computational basis is given by which readily leads to the relation CV 2 = CX. Note that, in the CX-based default implementation of CV gate on QASM, one needs 2 CX gates to create the two-qubits interaction process as shown in Fig. 1.
The main reason for choosing CV gate is its potential ability to reduce the gate time in several QASM-based quantum algorithms. The point is that, by using OpenPulse, we can effectively implement CV gate by just halving the pulse length of the CR pulse used for generating CX gate, as suggested by the relation CV 2 = CX. That is, the gate time of pulse-engineered CV gate is half that of CX gate, while the QASM-based CV gate shown in Fig. 1 needs the gate time at least twice that of CX. Therefore, if some CX gates on a quantum circuit can be replaced with the same or less number of CV gates, the total gate time of this circuit is reduced and thereby the accuracy of the circuit will be improved. A typical example is the Toffoli gate; it needs at least 6 CX gates to implement if CX is only given to us, but it can be implemented using 2 CX and 3 CV gates if CV gate is further available [6], [19], [20]. This paper is organized as follows. In Section II, we describe how to implement CV gate using OpenPulse and then show the experimental result; the gate time of the pulse-engineered CV gate is shortened by 65.5% and the gate fidelity is improved by 0.66%, compared to the default QASM-based implementation of CV gate. In Section III, we first use the theory of Cartan decomposition to characterize the set of all two-qubit gates implemented with only two or three CV gates; because the pulse-engineered CV gate can be implemented with shorter gate time, those two-qubit gates can also be implemented with shorter gate time and possibly better gate fidelity than the default CX-based one. Actually, we show the experimental demonstration to generate √ iSW AP and √ SW AP using the pulse-engineered CV gates and confirm that, in both cases, the gate fidelity is improved thanks to the shorter gate time. In Section IV, we showcase an efficient method for implementing a linearlycoupled three-qubit Toffoli gate using the pulse-engineered CV gate.

A. CROSS RESONANCE INTERACTION
On IBM Quantum devices, the cross resonance (CR) interaction is used to couple two qubits [10], by irradiating the control qubit with a microwave pulse at the transition frequency of the target qubit. The microwave pulse has a Gaussiansquare type envelope in the default setup; see Appendix A. Under some approximation, we obtain the following model CR Hamiltonian [11], [12], [14], [20]: where the qubit ordering is control⊗target. ω ZP and ω IQ represent the interaction strength, which are functions of the amplitude A and the phase φ of the microwave pulse. Note that the CR Hamiltonian is valid under the condition that the microwave pulse with transition frequency of the target qubit is irradiated to the control qubit. In the absence of noise, the two qubits are driven by the unitary operator

B. PULSE-ENGINEERED CX AND CV GATES
Let us define the general two-qubit unitary operator where D and E are arbitrary single-qubit operators. With this notation, the CX gate is represented as [10]: That is, the two-qubit operation required to form the CX gate can only be served by the Z ⊗ X Hamiltonian. However, the CR Hamiltonian (2) contains terms other than Z ⊗ X term, which thus should be eliminated by some means for implementing the CX gate via the CR Hamiltonian. This goal can be achieved, by using the echo sequence pulse scheme and applying a direct cancellation pulse on the target qubit as illustrated in Fig. 2; in other words, these techniques are effectively used to generate the unitary evolution driven by the effective Hamiltonian,H ZX , composed of only the Z⊗X term [12], [21]. In general, one can implement the unitary operator [ZX] θ driven by the effective HamiltonianH ZX , by setting the interaction strength in terms of the pulse duration t as θ = ω ZX (A, φ)t/π; [ZX] θ =Ũ ZX = exp(−iπtH ZX ), For the CX gate case, the two-qubit interaction time t CX should be t CX = π/2 ω ZX (A, φ) to realize θ = −1/2. Next, from Eq. (5) and the fact that IX, ZX, and ZI commute with each other, one can see that CV gate is decomposed as which is half the value of calibrated CX gate's CR pulse duration. The CR pulse envelope is a GaussianSquare pulse, i.e. a square pulse with Gaussian-shaped rising and falling edges [21] (see also Appendix A). Note that, in all experimental demonstration shown in this paper, we keep the basic structure of the pulse schedule and amplitude parameters, for the combined CR and cancellation pulses in Fig. 2

C. EXPERIMENTAL ENVIRONMENT
In the present work, we used the 0th, 1st, and 4th qubits of ibmq_toronto, as shown in Fig. 3. Single-qubit gate operations on qubits 0, 1, and 4 are realized by the microwave irradiation to the drive-channel, d0, d1, and d4, respectively, whereas the CR-pulses for the two-qubit interactions between qubits 0 and 1, and qubits 1 and 4 are applied to the control channels, u0 and u3. Each experiment demonstrated in this paper was conducted 8192 times (meaning that 8192 measurement was performed for each circuit). There exist measurement errors that accidentally flips the detected bit; we applied the readout error mitigation technique [22] to fix this error. We list the single-qubit gate error and the readout error of the device in Table 1. Also the two-qubit CX gate errors are 1.065% and 1.5969% for the 0-1 qubits pair and 1-4 qubits pair, respectively.

D. EXPERIMENTAL RESULTS
We implemented the gate (6) with several values of the pulse duration τ d of the two CR pulses, which correspond to CR − and CR + shown in Fig. 2, from 45.5 ns to 161 ns. For each τ d we test the following trial CV gate: where θ(τ d ) = −τ d /4t CV . Note that the duration for realizing the CX gate is 196 ns (see Appendix B for details); hence, from the relation CV 2 = CX, ideally τ d would be identical to τ CV = 98 = 196/2 ns to realize CV gate. We make this duration adjustment only for the flattop part, and the Gaussian flanks are fixed. We applied the quantum process tomography (QPT) to construct the trial CV gate, to evaluate its gate fidelity F p to the ideal CV gate [23]- [25]. Note that we can use interleaved randomized benchmarking [20] or randomized_benchmarking function in the Qiskit libraries [22], to estimate the gate fidelity. Figure 4 shows the gate fidelity of the trial CV gate (9) as a function of the CR pulse duration τ d , with and without the readout mitigation; these are the averages of three experimental results conducted three different days. The black line represents the theoretically calculated gate fidelity between the exact CV gate and the trial CV gate (9), as a function of the CR duration; in the latter, [ZX] θ can be analytically calculated using Eq. (4), and θ(τ d ) linearly increases with respect to τ d . Also for reference, the gate fidelity of the CV gate implemented in the QASM format (denoted as QASM CV) are shown. First, note that the readout error-mitigation works well and gives better fidelity values compared to the raw (unmitigated) results. The mitigated fidelity of CV gate implemented with OpenPulse (denoted as Pulse CV) takes the maximum value 99.23% (averaged value for three different days) at the CR duration τ d = 101.5 ns, which VOLUME X, 2020 Fidelity (readout error mitigated) Fidelity (raw data) Theoretical value Fidelity of QASM CV (readout error mitigated) Fidelity of QASM CV (raw data) FIGURE 4. Gate fidelity of the trial CV gate (9) to the ideal CV gate, as a function of the duration of CR pulse. Red vertical line denotes half duration of CR pulse in CX pulse schedule. The black line represents the theoretically calculated gate fidelity between the exact CV gate and the trial CV gate (9). Here, the physical control and target qubit is the 0-th and the 1st one depicted in Fig. 3, respectively.
is close to the expected value τ CV = 98 ns, i.e., half the duration of CR pulse of the calibrated CX gate. Throughout all three different experiments, the maximum value is taken at 101.5 ns, which indicates that the optimal pulse duration is robust against calibration change. Another important finding is that the maximum value 99.23% is 0.66% higher than that of the CV gate fidelity achieved via the default QASM-based implementation using 2 CX gates. Figure 5 shows the actual pulse sequence of CV gate implemented in (a) the default QASM format with 2 CX gates (see Fig. 1) and (b) OpenPulse with the optimal pulse duration 101.5 ns. The total gate time of CV gate is 994 ns for the former, while it is 343 ns for the latter. Hence the present OpenPulse-based implementation achieves 65.5% reduction in the total gate time of CV gate, compared to the default one (a), in addition to 0.66% improvement in the gate fidelity.

III. TWO-QUBIT GATE DESIGN WITH CV GATES
Arbitrary two-qubit gates can be implemented with three CX gates [26], [27]. However, generating two-qubit interactions only with CX gates can unnecessarily prolong the gate time.
In this section, we study the set of two-qubit gates that can be configured with up to three CV gates instead of the same number of CX gates, based on the theory of Cartan decomposition. In particular, we consider √ SW AP gate and √ iSW AP gate as examples; they can be implemented with three and two CV gates, respectively, and thus the resulting gate-time is obviously shortened compared to the default CX-based implementations. We have also experimentally confirmed that the gate fidelity of those CV-based gates is superior to that of the CX-based one.

A. CARTAN DECOMPOSITION
The Cartan decomposition proves that an arbitrary two-qubit unitary operation U ∈ SU (4) can be represented in the form where k 1 , k 2 ∈ SU (2) ⊗ SU (2) are local single-qubit operations. When two-qubit unitaries U and V are connected through U = k 1 V k 2 , we call that U and V are locally equivalent. The Cartan decomposition is directly used to construct Weyl chamber that provides a clear view of geometric structure of the set of all non-local two-qubit gates. The Weyl chamber is illustrated as the tetrahedron OA 1 A 2 A 3 in Fig. 6(a); the point [a, b, c] represents a locally equivalent class of two-qubit gate [26], [28]. Shown in Fig. 6(b) are particularly important points corresponding to familiar twoqubit gates, L = [π/2, 0, 0] for {CX, CY, CZ}, A 2 = [π/2, π/2, 0] for {DCX, iSWAP}, A 3 = [π/2, π/2, π/2] for SWAP, and B 3 = [π/4, π/4, π/4] for  that satisfies the following condition: This equation implies that n = 3 operations of CX (or any of locally equivalent gate to [π/2, 0, 0]) with appropriate local gates can span the entire area of Weyl chamber, i.e., tetrahedron OA 1 A 2 A 3 ; that is, as is well known, 3 CX gates can generate arbitrary two-qubit unitary gates. Similarly, by using two [γ, 0, 0] gates, we can create arbitrary two-qubit gate [a, b, 0] that satisfies the following condition: Thus, two CX gates can generate any two-qubit gate represented by the point inside the triangle OA 1 A 2 , which corresponds to the base of the Weyl chamber (see Fig. 7).

B. CONFIGURABLE CV-BASED TWO-QUBIT GATES
We can now characterize the set of two-qubit gates generated by two or three operations of CV gate represented by C 1 = [π/4, 0, 0]. First, Eq. (12) with γ = π/4 indicates that 2 CV gates can generate any unitary gate represented by the point in the locally equivalent areas OLB and A 1 LC illustrated in Fig. 7 [28]. These areas are included in the triangle OA 1 A 2 . Hence, there exist gates such that 2 CX gates can generate while 2 CV gates cannot, such as DCX (Double-CX gate, i.e., a 2-qubit gate composed of two back-to-back CX gates with alternate controls) or equivalently iSWAP represented by A 2 = [π/2, π/2, 0]. However, there are still many useful two-qubit gate in OLB and A 1 LC, and it is thus important to have the pulse-engineered CV gate for generating those gates with significantly shorter time and possibly better gate fidelity than the case using the default QASM-based implementation with only CX. For example, the controlled-U gate plays an essential role in several quantum algorithms such as Quantum Fourier Transform; fortunately, an arbitrary controlled-U gate is specified by the point [γ, 0, 0] on the line OL or A 1 L and thus can be generated using two CV gates.
Second, Eq. (11) with n = 3 and γ = π/4 elucidates the set of two-qubit gates that can be generated with 3 CV gates, which is depicted in the colored area in Fig. 6(c). We can expect the same advantage as the 2 CV case, in implementing some two-qubit gates contained in this area via three pulseengineered CV gates.

C. EFFICIENT IMPLEMENTATION OF √ iSW AP AND √ SW AP VIA PULSE-ENGINEERED CV GATES
Here we show an experimental demonstration to implement the following 2 two-qubit gates via the pulse-engineered CV gates. That is, we consider √ iSW AP gate represented by the point B = [π/4, π/4, 0] in Fig. 7: and √ SW AP gate represented by the point B 3 = [π/4, π/4, π/4] in Fig. 6: VOLUME X, 2020 Each of these gates together with some single-qubit gates can construct a universal gate set. Recall that we cannot determine the Cartan decomposition (10) uniquely, for any two-qubit unitary matrix U . Thus, we used the decomposition algorithm 'TwoQubitBasisDecomposer' implemented in Qiskit [22]. Figure 8 shows two types of decomposed gate layout of √ iSW AP based on CX (middle) and CV (lower), which we call √ iSW AP CX and √ iSW AP CV , respectively. Also the case of √ SW AP is shown in Fig. 9, where the CX-and CV-based decompositions are called √ SW AP CX and √ SW AP CV , respectively. Here, U 2 (φ, λ) and U 3 (θ, φ, λ) are the single qubit gates in the QASM language [29], defined as follows: The pulse schedules corresponding to these four decomposed circuits are shown in Figs. 10 and 11, where the pulse for CX and CV were implemented with the optimized CR duration time identified in Section II.

√
SW AP CV is 532 ns shorter than that of √ SW AP CX . We compute the gate fidelities of these four gates to their ideal correspondence, using QPT. The results are summarized in Table 2, together with the gate time;

√
iSW AP ( √ SW AP ) gate with CV gates achieves the better fidelity by 0.87 (2.14) % compared with the default CX-based implementation. This might be thanks to the shortened gate time realized via the pulse-engineered CV gate. Note that, when √ iSW AP or √ SW AP is involved in some larger 6 VOLUME X, 2020 quantum circuits, the gate-time advantage of the CV-based implementation may lead to significant improvement in the fidelity of those circuit.

IV. HIGH-SPEED AND HIGH-PRECISION TOFFOLI GATE WITH CV GATES
The pulse-engineered CV gate can be applied to improve the speed and precision of bigger size gates beyond the twoqubit case. As a demonstration, here we study the three-qubit Toffoli gate (or the Controlled-Controlled-X gate). The idea presented here is applicable to the general multi-qubit Toffoli gate appearing in many long-term algorithms such as QRAM database [30] and the diffusion operator in Grover's search algorithm [31].  [32]. Note that the former exchanges q0 and q1, while maintaining the functionality of Toffoli gate.

A. GATE IMPLEMENTATION FOR LINEARLY-COUPLED THREE QUBITS
If three qubits are fully connected, then we can construct Toffoli gate using 6 CX gates (and some single-qubit gates), while the combination of 3 CV and 2 CX gates also constructs Toffoli gate; hence the pulse-engineered CV gate enables reducing the total gate time. However, the standard structure of the current IBM Quantum devices is of the linear coupling form of qubits, in which case the number of necessary gates increase.
Here we consider two different construction of Toffoli gate with and without CV gates, T OF CV and T OF CX gate shown in Fig. 12; note that q j (j = 0, 1, 4) represents the jth qubit of ibmq_tronto device shown in Fig. 3 and thus q 0 and q 4 are not directly connected. T OF CV gate has 3 CX and 3 CV gates; hence, with the use of pulse-engineered CV gate, the total gate time of T OF CV becomes shorter than that of the textbook Toffoli with 6 CX gates as well as T OF CX . Note that the SWAP gate is built in there to connect q 0 and q 4 ; consequently, T OF CV exchanges q 0 and q 1 , while maintaining the functionality of Toffoli gate. However, the pure Toffoli composed of T OF CV and subsequent SWAP gate needs 6 CX and 3 CV gates, meaning that it still can be realized with shorter gate time than T OF CX by the pulse engineering of CV.

B. EXPERIMENTAL RESULTS
We conducted an experiment to compare the actual performance of T OF CV (3 CX and 3 CV) to T OF CX (8 CX), where the pulse-engineered CV is used in the former, on ibmq_tronto processor shown in Fig. 3. The pulse sequences corresponding to these Toffoli gates are depicted in Fig. 13. As expected, the total gate time are 1778 ns and 2835 ns for T OF CV and T OF CX , respectively, suggesting that T OF CV would have better precision than T OF CX . Since QPT requires an excessive number of experiments, we have adopted the quantum state tomography and calculated the state fidelity [33]: where ρ ide denotes the ideal target density matrix and ρ exp denotes the reconstructed density matrix in the experiment using the state tomography. As the input state to Toffoli VOLUME X, 2020 gate, we prepared 12 states listed in Table 3, where |± = (|0 ± |1 )/ √ 2. We performed 8192 shots (measurements) for each initial state and calculated F s (ρ exp , ρ ide ). Table 3 summarizes the results, showing the superiority of T OF CV for all input states except |111 . As a result, T OF CV has 4.06 % higher average fidelity than T OF CX . This is a bigger superiority of the CV-based gate over the conventional CXbased one, compared to the previous case shown in Table 1, simply because the gate length becomes longer.

V. CONCLUSION
Using only CX gates for entangling qubits in quantum computation is now a de facto standard. IBM Quantum is no exception. While this approach is less burdensome for calibration, it has the disadvantage that some gate/circuit structure become redundant. To resolve this issue, in this paper we proposed using CV gates in addition to the default gate set; actually OpenPulse allows us to realize CV gate with shorter gate time than that of CX gate as well as the default CV gate composed of 2 CX gates. The parameters of the corresponding CR Hamiltonian for realizing such pulseengineered CV gate are the same as those of the CX gate, except for the pulse length and some local gate parameters, meaning that the calibration burden is not significant. In particular, the result of Section II (Fig. 4) indicates that the optimal pulse length does not change in each calibration. The gate-time improvement in circuit design, which eventually leads to the gate-fidelity improvement, has been demonstrated with √ SW AP , √ iSW AP , and Toffoli gates. Note that the gate fidelity improvement were not totally great (0.66% improvement for the CV implementation, 0.87% for √ iSW AP , and 2.14% for √ SW AP ), and this may be due to the presence of ZZ interactions that cannot be counteracted by the echo scheme [14] that was employed in our method. Suppression of the ZZ interactions [34]- [37] would allow us to further improve the gate performance.
In summary, from the practicality and feasibility viewpoint, we believe that the new gate set that contains the proposed pulse-engineered CV gate can be used to effectively reduce the redundancy of several quantum circuits, thereby realize shorter gate time in total, and eventually improve several quantum algorithms. Actually, to investigate a wider range of applications, we plan to execute comparative verification of the proposed method on a bigger-size circuit or a near-term quantum algorithm.

A. GAUSSIAN SQUARE PULSE ENVELOPE
In all experiments we employed the Gaussian-Square pulse composed of the constant-amplitude part of length (width) τ w and Gaussian-formed rising and falling edges of length τ r . The overall pulse waveform f (t), as a function of time t, is thus given by where A is the maximum amplitude and σ 2 is the variance of the Gaussian part, respectively. Note that the overall pulse length or the duration is defined as Figure 1 shows the pulse schedule for implementing CX gate, where the CR pulse duration is 196 ns and accordingly the total gate time 462 ns; this is actually the best value that achieves the maximum gate fidelity. Here we show the detail of the OpenPulse experiment to identify this optimal duration. The experiment was conducted in the same setting described in Section II-C, with the use of qubit 1 and 4. We evaluated the following trial CX gate with changing the duration τ d ∈ [144, 259] ns:

B. OPTIMAL PULSE DURATION OF CX GATE
where θ(τ d ) = −τ d /2τ CX with the nominal value τ CX = 196 ns. The yellow and blue lines in Fig. 14 depict the gate fidelity between the pulse-engineered CX trial (τ d ) and the ideal CX gate, with and without the readout error mitigation respectively. The black dotted line depicts the gate fidelity between the theoretical CX trial (τ d ) and the ideal CX gate. The figure thus shows that the optimal duration is exactly the nominal value, i.e., τ d = τ CX = 196 ns, which achieves the perfect gate fidelity.
Fidelity (readout error mitigated) Fidelity (raw data) Theoretical value FIGURE 14. Gate fidelity of the trial CX gate (17) to the ideal CX gate, as a function of the duration of CR pulse.