Optimal Multi-Bit Toffoli Gate Synthesis

Multi-bit Toffoli gates form an essential quantum gate class for quantum algorithms. They should be efficiently decomposed into elementary single- or multi-qubit quantum gates, such as CNOT, T, and Hadarmard, for a scalable implementation of a quantum algorithm. We propose an engineering method for the practical synthesis of a multi-bit Toffoli gate. Two optimization models and their closed-form solutions are presented for optimal decomposition of the multi-bit Toffoli gate. These models are based on linearized multi-objective integer programming with parameters such as the number of target ancillae, ancillae states, and basis gates. The proposed method supports the systematic handling of quantum circuit constraints, including the total number of available qubits and maximum circuit depth, which depend on various quantum hardware specifications. Our approach exhibits promise in the noisy intermediate-scale quantum environment by providing a rapid and optimal method for synthesizing multi-bit Toffoli gates in diverse and unpredictable quantum hardware specifications.


I. INTRODUCTION
Quantum computing is an emerging research area that has achieved promising results in various fields. Successful quantum algorithms, including Grover's [1] and Shor's [2] algorithms, theoretically provide quadratic or exponential acceleration in the computation time compared to classical algorithms. Quantum algorithms require a quantum circuit implementation [3]. The circuit of a quantum algorithm is composed of quantum gates (single-/two-qubit or qudit/qutrit gates) and is evaluated by certain criteria, including the number of two-qubit gates, circuit depth, or number of ancilla qubits.
The multi-bit Toffoli gate is one of the famous multi-qubit gates [4], [5], [6] that are essential for complex quantum algorithm implementation [1], [2], [7], [8], [9] and quantum error/fault tolerance studies [10], [11], [12], [13]. In the implementation of the Toffoli gate (controlled-controlled-NOT), it has been reported that six CNOT gates are required for optimal cost implementation [14] and that five two-qubit gates, including non-CNOT gates, are required as a minimum resource [15], [16]. The extended N -bit Toffoli gate The associate editor coordinating the review of this manuscript and approving it for publication was Christos Anagnostopoulos .
uses N − 1 control qubits to invert the quantum state of a target qubit when the conditions of the control qubits are met [14], [17]. In the implementation of the N -bit Toffoli gate, the number of two-qubit gates, which is a commonly used cost measure, generally increases, and subsequently, the quantum algorithm performance deteriorates owing to the low computing fidelity and limited coherence time that are inherent in quantum computing systems [18]. Several decomposition methods for the N -bit Toffoli gate have been proposed [19], [20], [21], [22], [23] to mitigate the performance deterioration.
The implementation of the multi-bit Toffoli gate is not unique; it varies in single-and multi-qubit gates in terms of the gate library that it uses and the number and states of ancillary qubits. For an N -qubit universal gate without ancillae, the theoretical lower bound in the number of two-qubit gates has been demonstrated to be (4 N −3N −1)/4 [24]. Several studies have been conducted on efficient circuit implementation [18] to approach the theoretical result. The methods proposed in [25], [26], [27], [28], and [29] decompose the multi-bit Toffoli gate into single-/two-qubit gates using ancilla qubits, whereas those presented in [18], [30], [31], [32], [33], [34], [35], and [36] use the quantum gates of higher-dimensional Hilbert space, such as qudit/qutrit gates.
Our study focuses on the multi-bit Toffoli gate decomposition method with ancillary qubits, which is applicable to currently accessible quantum hardware. In a previous study [20], a framework that uses a single ancilla or N − 3 ancillae was introduced to decompose the N -bit Toffoli gate into single-qubit gates and CNOT gates. Moreover, Maslov [26] employed the relative-phase Toffoli gates to reduce the decomposition circuit depth and number of ancillae to ⌈(N − 3)/2⌉, where ⌈ξ ⌉ denotes the smallest integer that is greater than or equal to ξ . The best-known N -bit Toffoli decomposition method that was presented in [27] requires N −1 ancillae, O(log N ) depth, and N − 3 measurement operations. All previous studies considered only a fixed number of ancilla qubits and had a single objective to minimize, e.g., the T-depth [21], [29], [37], [38], [39]. Therefore, it is difficult to deal with the transitions in quantum hardware specifications, viz. varying quantum costs [40].
The objective of this study is to develop an efficient scheme to synthesize the multi-bit Toffoli gate with the minimum total quantum resource usage, while accommodating the varying input requirements. Studies that address such synthesis involving flexible user-specified input requirements are rare. An N -bit Toffoli gate implementation method with an arbitrary number of ancilla qubits has previously been presented [28]. Our study proposes an optimal N -bit Toffoli gate implementation that minimizes the weighted quantum cost with parameters including the targeting ancillae number, ancillae states, gate library, and building block gates. Our method decomposes the N -bit Toffoli gate into scalable multi-bit building block gates using M determined/arbitrary state ancillae. Moreover, we consider scalable relative-phase Toffoli gates as an instance of a building block that employs the Clifford+T gate library, and derive the optimal face and closed-form solutions of the corresponding model.
The contribution of this study is threefold. First, we present a new optimal N -bit Toffoli gate implementation method that can handle user-specified parameters, such as the state and number of ancillae, gate library class, gate quantum cost, and weight of each gate. Second, we propose a scalable multi-bit relative-phase Toffoli gate implementation that does not require additional ancilla qubit resources. Third, our study provides a rigorous analysis on the optimal face and closed-form solutions to the proposed mathematical models. Our approach exhibits potential usefulness in the Noisy Intermediate Scale Quantum (NISQ) era, in which the rapid changes in quantum hardware and software make parameters such as the quantum cost of the gate library unpredictable. The optimal models based on linearized multi-objective integer programming accommodate parameters such as the circuit size, gate library, quantum cost, and building blocks. As quantum hardware and software are altered, our models can substitute the new gate library, quantum cost, and building blocks for the Clifford+T library, proposed quantum cost, and relative-phase Toffoli building blocks, respectively. Moreover, our approach guarantees optimality and provides the closed-form solutions that can be used in determining the optimal number of ancillae for the N -bit Toffoli gate implementation with respect to the quantum cost structure.
The remainder of this paper is organized as follows. In Section II, the optimal Toffoli gate implementation model for the target number and states of ancillae is presented. Section III describes the multi-bit relative-phase Toffoli gate that employs the Clifford+T library. Section IV presents the optimal face and closed-form solutions to the optimization models. In Section V, we outline our computational experiments and their results. Finally, Section VI presents the conclusions and directions for future research.

II. OPTIMAL N -BIT TOFFOLI GATE SYNTHESIS FRAMEWORK
The Toffoli gate synthesis problem (TGSP) involves determining the number of elementary buliding blocks to produce an N -bit Toffoli gate synthesis circuit with the minimum weighted quantum cost sum for the given parameters. We assume that the multi-bit building block gates are implementable with no ancillae. The states of the ancillae label the mathematical models of the TGSP as TGSP |0⟩ and TGSP |χ⟩ , which correspond to fixed and arbitrary, respectively.
First, the synthesis framework suggests the circuit structure of the TGSP |0⟩ (N , M ) using M pairs of multi-bit building blocks and a Toffoli gate. The circuit structure is described as follows: are modeled as follows, in which the problems minimize the weighted sum of the quantum costs r nj ∈ R and weights VOLUME 11, 2023 p j ∈ P of the gates j ∈ J, thereby addressing the target N -bit Toffoli gate and M ancillae: x n − y n >= 0 for n ∈ N (9) x n ∈ Z 0+ for n ∈ N (10) The first integer optimization model TGSP |0⟩ pertains to the determined ancillae, and the second, TGSP |χ⟩ , relates to the arbitrary ancillae. In Equations (1) and (5), the parameters indicate the weights and costs of the gates in the building blocks and a Toffoli gate. The constraints in Equations (2) and (6) denote the total number of building block pairs/quads, which is the number of ancilla qubits M . Equations (3) and (7) represent the total number of qubits pairs/quads, which is the sum of N target qubits, M overlapping control qubits, M qubits that are used for ancillae, and a three-qubit for a Toffoli gate. In the TGSP |χ⟩ , Equation (8) indicates that the n-bit building blocks are used twice, whereas the remaining gates are used four times.

III. BUILDING BLOCK: MUTLI-BIT RELATIVE PHASE TOFFOLI GATE SYNTHESIS
As an illustrative example of the practical implementation of the TGSP's solutions, we provide an instance of the gate library and building blocks. First, we refer to previous research [26], providing a three-bit Toffoli gate implementation using the Clifford+T gate library. Next, based on the three-/four-bit Toffoli gate implementation in the same study, we present a scalable n-bit relative-phase Toffoli gate implementation without ancillae.

A. Clifford+T LIBRARY
The Clifford+T gate library is an instance of the parameter J, the gate library of TGSP. The Clifford+T library identifies a universal logic gate with Clifford gates; NOT, CNOT, H, Z, S, and S † gates; and T and T † gates [41]. Definitions of each gate are given below.

B. THREE-BIT TOFFOLI GATE SYNTHESIS
The matrix representation of the three-bit Toffoli gate, a building block in our analysis, is as follows: Among others, the three-bit Toffoli gate can be decomposed with the Clifford+T gate library [26], as shown in Figure 1.
Hereafter, we refer to the number of gates as the quantum cost as an example, i.e., This section presents the n-bit relative-phase Toffoli RTOF(n) gate implementation and its quantum cost. The RTOF(n) gate   is a relative-phase-shifted version of the n-bit Toffoli gate TOF(n). The advantage of this implementation is that no auxiliary qubits are required. This advantage implements the N-bit Toffoli gate synthesis method that corresponds to an arbitrary number of auxiliary qubits. Let I ′ , X ′ be a relative phase version of I , X . Then, RTOF(n) is described as follows [42]: Based on this definition, we propose an RTOF(n) (n ≥ 5, n = n 1 + n 2 + n 3 − 2) implementation, given RTOF(n m ) (m = 1, 2, 3) gates, to utilize RTOF(n) as the building block.
The correctness of the above circuit follows from the following equations for arbitrary relative-phase gates I ′ m , X ′ m , m = 1, 2, 3 and I ′ , X ′ : To implement RTOF(n) (n < 5), we replace RTOF(2) with CNOT and construct a circuit of RTOF(3) and RTOF(4), as shown in Figures 3 and 4. The quantum cost r n = (r n,T , r n,CNOT , r n,H ) of the RTOF(n) circuit is defined by the recurrence relation. As n increases, two RTOF(3+⌊ n−5 3 ⌋) gates replace two RTOF(2+ ⌊ n−5 3 ⌋) gates, and the quantum cost increases exponentially ( Figure 5). This section presents the relative phase Toffoli gate implementation as a practical instance of the n (< N )-bit building block that constructs the N -bit Toffoli gate. With the proposed RTOF(n), the optimal solution to the TGSP yields a physically implementable quantum circuit.

IV. CLOSED-FORM SOLUTIONS
We obtain the optimal face of TGSP and provide a closed-form solution to this problem. The following theorems and lemmas provide the optimal faces of TGSPs, and the corollaries of each theorem and lemma provide an optimal solutions for the TGSPs.
Theorem (Jensen's Inequality [43]): For any convex function f , numbers m 1 , m 2 , · · · , m K in its domain, and positive weights u k , Equality holds if and only if m 1 = m 2 = · · · = m K or if f is linear in a domain containing m 1 , m 2 , · · · , m K .
We assume that ( The optimal solutions u * = (u * 1 , · · · , u * K ) to (P1) lie on the optimal face: u * : Proof: We show that the optimal solutions u * are feasible solutions with the lower bound value of the objective function. First, the first and second terms of the optimal face are the constraints of (P1), so u * are feasible solutions.
To obtain the lower bound value, define a function f with the same function value as the objective function at discrete positive integers. Let f : R → R be a piecewise linear function defined on intervals of integer numbers, where Note that f (m) is a convex function because of the assumption that (α k+1 − α k ) ≥ (α k − α k−1 ). Because f (k) = α k , the objective function can be written as We use Jensen's inequality [43] to find the lower bound of the objective function. Because f (m) is a convex function, Jensen's inequality [43] states that Thus, the lower bound on the objective value for all feasible solutions in (P1) can be obtained as The feasible solutions are the optimal solutions if solution u * yields the lower bound of the objective value of (P1).
Therefore, the optimal solutions to (P1) are the non-negative integer solutions to the following linear system: 1: A closed-form solution u * for the optimal face of (P1) is

Proof:
The optimal solutions to (P1) are the non-negative integer solutions to the following linear system: Initially, the solution u * is a non-negative integer solution because N /M is greater than or equal to h and Mξ is N − Mh, which is an integer value.
We show that the optimal solution satisfies the first and second equations by The right-hand side of the third equation is The non-negative integer solution u * is the optimal solution to (P1). □ Theorem 1: The optimal solutions x * = (x * 3 , · · · , x * N −1 ) to the TGSP |0⟩ lie on the optimal face. □ Corollary 2: A closed-form solution on the optimal face of the TGSP |0⟩ is Proof: Using Corollary 1, the non-negative integer solution x * is an optimal solution to TGSP |0⟩ . □ Lemma 2: Let the integer programming problem (P2) be formulated as Let k * be an integer such that v k * = 1. Additionally, let l and φ be l = ⌊(N − k * )/(M − 1)⌋ and φ = (N − k * )/(M − 1) − l, respectively. Then, the optimal solutions u * , v * = (v * 1 , · · · , v * K ) to (P2) lie on the optimal face. u * , v * : Proof: We show that the optimal solutions u * are feasible solutions with the lower bound value of the objective function. First, as the first to third terms of the optimal face are the constraints of (P2), u * , v * are feasible solutions.
To obtain the lower bound value, let f : R → R be a piecewise linear function defined on intervals of integer numbers, where Because α k = f (k), the objective function can be written as As per Jensen's inequality [43], the objective function follows Because the right-hand side is the lower bound of the objective function, u * , v * are the optimal solutions of (P2). □ Corollary 3: A closed-form solution u * , v * on the optimal face of (P2) is where k * is the solution to Proof: First, u * satisfies the first to third equations of the optimal face, which means that u * is a feasible solution.
Next, we show that the fourth equation is a function of k * and is minimal when Eq. 12 is satisfied. Given M and N , the right-hand side of the fourth equation is a function of k * .
x * , y * : Proof: Let f : R → R be a piecewise linear function defined on intervals of integer numbers, where Substituting k := n − 2, u k := x k+2 , v k := y k+2 , α k := 2 j∈J 2p j r nj , K := N − 3, M := M , and N := N − 3, Lemma 2 induces the optimal face of TGSP |χ⟩ . □ Corollary 4: A closed-form solution x * , y * on the optimal face of the TGSP |χ⟩ is where n * is the solution to j∈J p j (r n * +1,j − r n * ,j ) Proof: Using Corollary 3, the non-negative integer solution lies on the optimal face of the TGSP |χ⟩ . □ Example 1: We provide the optimal solutions to the instances of the TGSP with N = 11, M = 2, P = {1, 0, 0} and determined/arbitrary ancillae. The optimal faces in Theorems 1 and 2 represent the optimal solutions to TGSP |0⟩ and TGSP |χ⟩ , and Corollaries 2 and 4 provide closed-form solutions.
The optimal solutions to the example with the determined ancillae (TGSP |0⟩ ) are    x * 6 = 2, ,and the others are zero x * 4 = x * 8 = 1, and the others are zero, x * 5 = x * 7 = 1, and the others are zero. The first solution is obtained by Corollary 2. The optimal solutions yield a lower bound for the objective value of 103.
, and the others are zero.
, and the others are zero.
,and the others are zero. The first solution was obtained from Corollary 4. The solutions yielded a lower bound of the objective value of 112.

V. EXPERIMENTAL ANALYSIS
This section describes the experiments conducted to evaluate the TGSP model and the closed-form solution to the model on a real quantum computer and a quantum simulator. We observe the objective value of the syntheses circuit in the quantum simulator (IBMQ), which has an all-to-all coupling network and infinite precision to evaluate performance without interference by a quantum hardware error. The Toffoli gate implementation circuit, obtained from the closed-form solution of the model, was executed on the simulator, and the objective value was calculated as the weighted sum of the counted gates of the circuit. The comparison experiment between a recent toolkit (Qiskit) and ours was performed in the NISQ system (Rigetti) to evaluate the accuracy of the implemented Toffoli gate. The accuracy was measured by the number of shots observed in the desired CBS, the resulting CBS obtained from the simulator, among total shots.
To evaluate the performances of the models between previous studies and ours, we compared the objective values of each synthesis for N = 5, 11, M = 1 ∼ N − 3, and fixed/arbitrary ancillae. Figures 6 and 7 show the results of the five methods. Each column in the figures represents the weighted sum of quantum cost, which is the number of gates in this experiment. If the synthesis with given parameters N, M, and P is not available, we leave that column blank. Particularly, Barenco's study [20] utilized the NCV library instead of the Clifford+T library; thus, we record only the number of CNOT gates and the objective value when the weight P = (0, 1, 0). Additionally, He et al.'s study [27] included measurement operations within the Toffoli gate synthesis circuit. Although this method does not guarantee optimality, we adopted the best objective value of three trials with Baker et al.'s model [28] for each M , N because it can deal with an arbitrary number of ancillae and yield a single solution.
Based on the results, our method determines the optimal number of ancillae for a given ancilla resource by the closed-form optimal solution for each number of ancillae.  With the increasing use of ancilla resources, the objective value decreases to a minimum at a specific point. Thus, there is no need to use the surplus resources after the minimum. As an example, in the following table, we present the optimal number of ancillae for each method for a given ancilla resource in the 11-bit Toffoli gate synthesis to minimize the number of multi-qubit gates, where P = (0, 1, 0). Additionally, the synthesis for arbitrary state ancillae can be implemented for the determined state. Each cell in the table below represents the optimal number of ancilla bits, and the values in parentheses indicate the corresponding numbers of gates. Our method provides the minimum cost of 11-bit Toffoli gate synthesis, except when ten ancilla qubits are available.
To check the potential of quantum error reduction by the proposed models, we acquired the result of the N -bit Toffoli gate synthesis circuit execution in the NISQ device. Compared with the best result of the Qiskit multi-control gate implementation provided by IBMQ with four modes, it took   an average accuracy of five experiments with one thousand shots. Each experiment was executed on a Rigetti quantum system with N (5 ∼ 10)-bit Toffoli gate M (1 ∼ N − 3) determined/arbitrary ancillae synthesis. Most of the results showed a higher accuracy and lower gate depth than with Qiskit's implementation. (Figures 9, 10, 11)

VI. CONCLUSION
We presented a pragmatic approach to synthesizing a multi-bit Toffoli gate for various user-specified input parameters, including the number of available ancillae, initial states of ancillae, gate libraries to represent building blocks, relative costs of quantum resources, and quantum costs of basis quantum gates. The core of our approach lies in the two combinatorial optimization models and their closed-form solutions. The proposed approach provides users with the flexibility to accommodate various quantitative design factors in creating a context-dependent synthesis circuit for multi-bit Toffoli gates.
The proposed optimization models can yield the optimal cost circuit for a multi-bit Toffoli gate with alternative RTOF(n) and TOF(3) implementations (e.g., an improved future version). Moreover, the quantum basis gates for Toffoli gate implementation are replaceable. As a result of these properties, the proposed framework provides flexible Toffoli gate synthesis in the NISQ environment in which the precise setting of the quantum circuit parameters is difficult to predict.
In future studies, we plan to use the proposed synthesis framework for the implementation of a quantum circuit with a large number of multi-bit Toffoli gates. The restricted size and length of the quantum circuit will result in different implementations of each generalized Toffoli gate (e.g., different numbers of ancillae). Furthermore, the choice of qubits, used as ancillae for each Toffoli gate, affects the depth and error of the quantum algorithm circuit. Therefore, we seek the optimal basis gate network to minimize the overall cost of a circuit, which requires a large number of multi-bit Toffoli gates.