The Optimization and Application of 3-Bit Hermitian Gates and Multiple Control Toffoli Gates

The well-known 3-bit Hermitian gate (a Toffoli gate) has been implemented using Clifford+T circuits. Compared with the Peres gate, its implementation circuit requires more controlled-not (cnot) gates. However, the Peres gate is not Hermitian. This article reports four 3-bit Hermitian gates named LI gates, whose realized circuits have the same T-count, T-depth, and cnot-count as the Peres gate. Furthermore, two decomposition methods of a multiple-control Toffoli (MCT) gate are proposed for different primary optimization goals. Then, we design the equality, less-than, and full comparators with the minimum circuit width using proposed Hermitian gates and optimized MCT gates. A fault-tolerant circuit is required for robust quantum computing. Clifford+T circuits are accepted solutions for fault-tolerant implementation. Considering T-count, T-depth, cnot-count, and circuit width as the primary optimization goals, we design the optimized Clifford+T circuits of three comparators using LI gates and optimized MCT gates. Comparison and analysis show that the proposed comparators have better overall performances for T-count, T-depth, cnot-count, and circuit width than the best-known comparators without quantum measurements.

An efficient multiple control gate decomposition in the circuit model is crucial for using less executing time and producing fewer errors [15]. Barenco et al. [16] proposed the well-known multiple control Toffoli (MCT) decomposition, i.e., using one-qubit and two-qubit controlled-not (cnot) gates to construct an MCT gate. Liu and Long [15] gave two analytic expressions for building general n-qubit controlled unitary gates.
The fault-tolerant implementation of quantum gates is needed for robust quantum computing in the presence of noise [17]. Clifford+T circuits are widely accepted solutions for fault-tolerant implementation [18], [19]. The T gates are more expensive than other gates in terms of space and time cost due to their increased tolerance to noise errors [20], [21], [22]. But neglecting the cost of the cnot gates may lead to a significant underestimate [23]. Therefore, the number of T gates (T-count), the maximum number of T gates in any circuit path (T-depth), and the number of cnot gates (cnotcount) are the main performance indicators of Clifford+T circuits.
The multiple control gates, such as Toffoli, Peres, Fredkin, TR [24], and MCT gates, are the staple of quantum arithmetic circuits [25]. Quantum circuits without ancillae for Toffoli gates proposed in [22], [26], and [27] have Tcount 7 and T-depth 3. The T-depth one representation of the Toffoli gate is presented with four ancillae [22], [28]. Jones utilized quantum measurement and an ancilla to implement the Toffoli gate with T-depth 1 [29]. There are different decomposition methods of the MCT gate [22], [23], [25], [29], [30]. Amy et al. [22] proposed a T-par algorithm to reduce T-count significantly. However, the T-par algorithm does not effectively reduce the cnot-count. Compared with the T-par algorithm, an automated optimization algorithm reduces the cnot-count and has the same T-count [23]. Maslov [25] replaced suitable pairs of the MCT gates with their relative phase implementations and obtained the smaller T-count and cnot-count than the Tpar algorithm and automated optimization algorithm. But, Maslov's approach does not involve the optimization of Tdepth. Two n-qubit MCT decompositions presented in [29] and [30] can obtain the smallest T-count using n quantum measurements. A few alternative works for the MCT/Toffoli gate decomposition have been carried out in the recent past. For instance, Philipp et al. [31] explored the mapping of reversible MCT circuits to IBM quantum computers. Gokhale et al. [32] used three-level qutrits to optimize quantum circuits.
Quantum algorithms may likely be implemented in these noisy intermediate-scale quantum (NISQ) devices, such as quantum chemistry [39]. Excessive ancillary qubits cause circuit width (the qubit number of circuits) to become too big. As a result, it blocks algorithms applied to NISQ devices. Therefore, in this work, we consider T-count, T-depth, circuit width, and cnot-count as the main optimization goals for comparators based on Clifford+T circuits. First, we present four 3-bit Hermitian gates named LI gates. Next, two MCT decomposition methods based on LI gates are proposed with different optimization goals. Then, the equality, less-than, and full comparators are designed by proposed Hermitian gates and (n + 1)-b MCT gates. Finally, the optimized Clifford+T circuits of three comparators are presented. The contributions of this article can be summarized as follows.
1) This article presents two decompositions of the MCT gate with lower T-depth and T-count against the bestknown optimization methods without quantum measurements. The significant advantage of the proposed MCT gates is that it is convenient for implementing full comparators (see Table 8). 2) This article proposes the equality, less-than, and full comparators with the minimum circuit width. 3) Considering T-count, T-depth, circuit width, and cnotcount, this article gives optimized Clifford+T circuits of the proposed comparators.
The rest of this article is organized as follows. Section II introduces background knowledge. Section III presents four LI gates. In Section IV, we implement and optimize the MCT gate. Section V describes the design of three comparators. Comparison and analysis are given in Section VI, and conclusions are drawn in Section VI-A.

II. BACKGROUND
The Pauli matrices I, X, and Z, the Hadamard gate H, the phase gate S, the gate cnot, and the non-Clifford gate T are elements in the Clifford+T set where I, X, Z, H, S, and T are defined by The gate set {H, S, S † , cnot, T, T † } is universal for quantum computation [26]. Clifford+T circuits for the Toffoli, Peres, and inverse-Peres gates are presented in Fig. 1 [26], [27]. An MCT gate with n qubits can be implemented by 4(n − 3) Toffoli gates [16], whose 6-qubit example is given in Fig. 2.

III. 3-BIT HERMITIAN GATES
The TR2 gate presented in Fig. 3 is a previous work [27]. It is considered a variant of the TR gate in [27]. We discover whose adjoint is equal to itself. That is, the inverse of the TR2 gate is not the Peres gate or Peres gate's variant. Therefore, the TR2 gate is fundamentally different from the TR gate because the TR gate is the inverse of the Peres gate. Thus, the TR2 gate should not be seen as a variant of the TR gate. Through the above analysis, we revise the symbol and name of the TR2 gate in Fig. 4(b). The other three Hermitian gates are discovered and designed in Fig. 4(a), (c), and (d). Equation (1) and Fig. 5 show that the proposed 3-b gates are Hermitian.
The proposed four 3-b gates in Fig. 4 can implement where the symbols "." and "⊕" are multiplication and exclusive-or operators, respectively. A and B are equal to Analyzing the circuits in Figs. 1 and 4, we give parameters for Toffoli, Peres, TR, and LI gates in Table 1. Compared with the Toffoli gate, the proposed LI gates reduce a cnot gate. Therefore, the proposed LI gates have an improvement for some applications. For instance, two 2-b less-than comparators realized by Toffoli gates and LI2 gates in Fig. 6 show that their cnot-counts are 28 and 22, respectively. That is, the latter uses six fewer cnot gates than the former.
On the other hand, since the TR gate is not Hermitian, its inverse (i.e., the Peres gate) needs to be added to implement a less-than comparator [27]. Therefore, the proposed LI gates provide an alternative method to design comparators. For instance, we only use LI2 and cnot gates to realize a less-than comparator in Fig. 14. Furthermore, due to the symmetry of the Hermitian gates, the proposed LI gates are convenient for the implementation of full comparators (see Fig. 18 and Table 8).

IV. IMPLEMENTATION AND OPTIMIZATION OF THE MCT GATE A. IMPLEMENTATION OF THE MCT GATE
An MCT gate is labeled TOF n Not in Fig. 7(a). Then, inspired by the method in Fig. 2 [16], we present the implementation of TOF n Not in Fig. 7(b) by 4(n − 3) LI gates where the (n − 3)-qubit unknown state |xx . . . x is adopted as ancillae. Finally, for clarity, we give the implementation of TOF 6 Not in Fig. 8 as an example.

B. OPTIMIZATION OF THE MCT GATE
In this section, two optimization methods of the MCT gate are proposed. The first method preferentially reduces T-count and T-depth. The primary purpose of the second method is to optimize T-count and cnot-count. For clarity, optimization rules 1-9 for the first method are presented in Appendix A-A. The optimization results of these rules are summarized in Table 2.
The optimized circuits for TOF 4 Not and TOF 5 Not are presented in Fig. 9 using the above rules. Fig. 9 illustrates that TOF 4 Not and TOF 5 Not both employ an ancillary qubit |x . We infer that TOF 4 Not has T-count 16, T-depth 8, and cnot-count 17. TOF 5 Not can be realized with T-count 24, T-depth 12, and cnot-count 28.    For an even number n with n > 5, one rule 6, (n/2 − 3) times of rule 7, and one rule 9 are used to optimize TOF n Not . Therefore, the number of ancillary qubits is 1 + (n/2 − 1) + 1 = n/2 − 1. For instance, the optimized circuit for TOF 6 Not in Fig. 10 uses one rule 6 and one rule 9 with two ancillary qubits. From Table 2, T-count, T-depth, and cnot-count are calculated by 16   Meanwhile, TOF n Not with an odd number n can be optimized by (n − 7)/2 rule 7, one rule 8, and one rule 9. The number of their ancillary qubits is (n − 7)/2 + 1 + 1 = (n − Optimization rules n1-n9 for the second method are presented in Appendix A-B. In addition, the optimization results of these rules are summarized in Table 3. We obtain the alternative optimized circuits for TOF n Not using rule ni instead of rule i (i ∈ {5, 6, 7, 8, 9}). For instance, the alternative optimized circuits for TOF 4 Not and TOF 5 Not are presented in Figs. 11 and 12 using the rules in Table 3. Tables 2 and 3 show that the T-counts of TOF n Not (n > 5) for the two methods are both 8n − 16. The T-depth of TOF n Not is also 8n − 16 for the second method. The cnotcount of TOF n Not with an even number n is 16 + 16(n/2 − 3) + 12 = 8n − 20. Meanwhile, the cnot-count of TOF n Not with an odd number n is calculated by 16

Results in
For convenience, the optimized circuits for TOF n Not using the first and second methods are labeled as 1 TOF n Not and 2 TOF n Not , respectively. Through the above analysis, we can implement 1 TOF n Not with T-count 8n − 16, T-depth 4n − 8, and cnot-count 12n − 32, and 2 TOF n Not with T-count 8n − 16, T-depth 8n − 16, and cnot-count 8n − 20.

V. QUANTUM COMPARATORS
In this section, we design the equality, less-than, and full comparators for the relationships of two integers a and b: a = b or a = b; a < b or a ≥ b; a > b, a = b, and a < b.

A. EQUALITY COMPARATOR
We select the MCT gate TOF n+1 Not to realize an n-bit equality comparator, whose 5-b example is presented in Fig. 13. The  1 for a = b. Fig. 13 infers that the n-bit equality comparator consists of 4(n − 2) LI gates and 2n cnot gates. Fig. 13 illustrates that |a 2 , |a 3 , and |a 4 are ancillary qubits implementing TOF 6 Not . Using two methods in Section IV-B, we obtain the optimized implementation circuits of TOF 6 Not with two ancillary qubits |a 3 |a 4 . Since qubits storing an operand also are used as ancillae, the proposed optimization methods do not increase the width of the equality comparator. Thus, the n-bit equality comparator based on 1

B. LESS-THAN COMPARATOR
Modifying the comparator proposed in [27], we only use LI2 and cnot gates to design the less-than comparator in Fig. 14(a). For example, a 5-b comparator is presented in Fig. 14(b). Here, |c is the comparison result of two numbers, i.e., if b ≥ a, |c = |0 ; otherwise, |c = |1 . Fig. 14 shows that the less-than comparator consists of 2n − 1 LI2 gates and 4(n − 1) cnot gates.
In Appendix B-A, we give rules 10 and 11 with primary optimization goals {T-count, T-depth, width } and rules n10 and n11 with primary optimization goals { T-count, cnotcount, width}. Their optimization results are summarized in Table 4. For clarity, rules 1 and n1 are also placed in Table 4. Table 4 shows that rule 11 reduces T-depth of two times of rule 1 in Appendix A-B from 8 to 6. We can infer that each additional rule 1 in the manner provided by rule 11 only increases T-depth 2. The n-bit less-than comparator with primary optimization goals {T-count, T-depth} and n ≥ 3 is optimized by one rule 10 and (n − 2) times of rule 1. For instance, the optimized circuit of the 3-b less-than comparator is given in Fig. 15. The T-count, T-depth, and cnot-count of the n-bit less-than comparator can be calculated by 12 + 8(n − 2) = 8n − 4, 4 + 2(n − 2) = 2n, and (4n − 5) + 12 + 8(n − 2) = 12n − 9, respectively.
Since rules in Table 4 do not use ancillary qubits, two optimized circuit widths of the n-bit less-than comparator are still 2n + 1.

C. FULL COMPARATOR
The equivalent circuits of the equality comparator are presented in Fig. 17. Since the LI2 gate is Hermitian, the circuit in the dashed box in Fig. 17 does not change integers a and b. Therefore, the circuit at the top right is also for the equality comparator. Applying the LI2 gate on the state |a 1 |b 0 |a 0 , we obtain |a 1 ⊕ b 0 a 0 |b 0 |a 0 ⊕ b 0 . Thus, eliminating two cnot gates, we give another equivalent circuit of the equality comparator at the bottom right in Fig. 17.
Substituting the implementation of TOF 6 Not into Fig. 17, we obtain the circuit in the dashed box 2 in Fig. 18, implementing the equality comparator. Comparing the dashed box 1 in Fig. 18 with the circuit in Fig. 14(b), we infer that the state |c 1 stores the result of the less-than comparator. That is, Fig. 18(a) implements a 6-b full comparator. Then, we propose an n-bit full comparator in Fig. 18(b). The results are stored in |c 1 |c 0 : |c 1 |c 0 = |0 |0 for b > a, |c 1 |c 0 = |0 |1 for b = a, and |c 1 |c 0 = |1 |0 for b < a.

TABLE 5. Optimization Results of Rules for the Full Comparator
illustrates that the full comparator consists of 4(n − 1) LI gates and 4(n − 1) cnot gates.
In Appendix B-B, we present rules 12, 13, and 14 with primary optimization goals {T-count,T-depth }, and rules n12, n13, and n14 with primary optimization goals { Tcount,cnot-count}. The optimization results of these rules are summarized in Table 5. In addition, rules 4 and n4 are placed in Table 5.
We optimize the n-bit full comparator with primary optimization goals {T-count, T-depth} and n ≥ 3 using one rule 12, one rule 13, and (n − 3) times of rule 4. The optimized circuit of the 3-b full comparator with T-depth 12 is given in Fig. 19(a). From Table 5, we obtain that each additional rule 4 in the manner provided by rule 14 increases T-depth 4. Therefore, the n-bit full comparator has a T-depth of 12 + 4(n − 3) = 4n. Its 4-b example is presented in Fig. 19(b).

VI. COMPARISON ANALYSIS A. COMPARISON ANALYSIS OF THE MCT GATE DECOMPOSITIONS
The best-known MCT gate TOF n has been realized in [22], [23], [25], [29], and [30]. The gate TOF n Not can be built by a TOF n gate and 2n − 2 NOT gates. We summarize the decomposition results of TOF n Not for primary optimization goals {T-count, T-depth } or { T-count, cnot-count} in Table 6 and compare them against the best known. Table 6 shows that the MCT gates in [29] and [30] have the best T-count of 4n − 8. They are realized by n − 2 quantum measurements,    which cannot be directly compared with the T-count. In addition, the two MCT gates use n − 2 ancillae |00 . . . 0 at least. When TOF n Not is used to implement an equality comparator, the ancillae |00 . . . 0 will increase the circuit width. While the ancillae |xx . . . x can store an operand, they do not increase the circuit width of the equality comparator (see Fig. 13). It is an advantage for the TOF n Not using the ancillae |xx . . . x .
Results presented in Table 6 for n = 4, 5, 6, 11 reveal that the proposed methods and Maslov's approach have the advantage over other works without quantum measurements. The proposed TOF n Not for primary optimization goals {Tcount, T-depth }, i.e., 1

TOF n+1
Not is more suitable for designing a full comparator than Maslov's work (see Table 8).

B. COMPARISON ANALYSIS OF COMPARATORS
Considering {T-count, T-depth, width } and { T-count, cnotcount, width } as primary optimization goals, we summarize the results of the proposed less-than comparators in Table 7 and compare them with existing works [21], [27], [38]. The proposed less-than comparators keep the minimum width, which is one of the advantages of our previous works [27], [38]. Furthermore, the proposed less-than comparators have less cnot-count than the comparators presented in [27] and [38]. Specifically, compared with the work in [38], the proposed less-than comparator with primary optimization goals { T-count, cnot-count, width} reduces approximately the cnot-count by 28%. Orts's first comparator has a T-count of 4n, a T-depth of 2n, a width of 3n, and an M-count of n, where M-count denotes the number of quantum measurements [21]. Therefore, the proposed comparators reduce the width by 33%. As a result, Orts's first comparator has the best T-count. However, Orts's first comparator uses n quantum measurements. Thus it is not directly comparable with T-count. Similarly, Orts's second comparator has the best option with a T-depth of log(n) and the worst options with a width of 6n − 2W (n) − 2 log(n) and an M-count of 3n.
Comparing the proposed full comparator with two nbit full comparators in [36] and [37], we give results in Table 8. The two full comparators in [36] and [37] consist of 5n − 3 Toffoli gates. Therefore, the two n-bit full comparators in [36] and [37] have T-count 35n − 21 by the Clifford+T circuit for the Toffoli gate in Fig. 1. Though the detailed circuit of the full comparator was not given in [36],

Engineering uantum
Transactions on IEEE we conclude that the T-depth of the n-bit full comparator should be greater than 9n + 3 by observing the 3-b full comparator. The results in Table 8 show that the proposed full comparator for primary optimization goals {T-count, T-depth, width} produces approximately 55% T-depth and 66% T-count reductions relative to the existing works in [36] and [37]. Furthermore, compared with the results in [36] and [37], the proposed full comparator for primary optimization goals {T-count, cnot-count, width} reduces approximately the cnot-count by 60%.
A straightforward implementation method for a full comparator is the combination of a less-than comparator and an MCT gate. Therefore, a full comparator can be realized by the proposed less-than comparator + TOF n+1 Not in [25]. A 5-b example is presented in Fig. 21. Table 8 reveals that the proposed TOF n Not is convenient for building a full comparator. That is, we only add four LI2 gates and some cnot gates on the gate TOF n+1 Not to create an n-bit full comparator. It results in the proposed full comparators being superior to earlier works in Table 8.

VII. CONCLUSION
This article has proposed four 3-bit Hermitian gates labeled as LI gates, whose implementation circuits have fewer cnot gates. Then, the equality, less-than, and full comparators are designed by LI gates. These comparators have the minimum circuit width. Two approaches for primary optimization goals {T-count, T-depth} and {T-count, cnot-count} have been proposed to optimize MCT gates. Furthermore, we have illustrated that the proposed method for primary optimization goals {T-count, T-depth} produces 50% T-depth reductions than the best-known implementation of the MCT gate without quantum measurements. Selecting {T-count, Tdepth, width} and {T-count, cnot-count, width} as primary optimization goals, we have designed the Clifford+T circuits of the proposed comparators by using the optimized circuits of the MCT gate. Comparison results showed that the proposed comparators have an overall advantage over the known comparators without quantum measurements for T-count, Tdepth, cnot-count, and circuit width.
Some algorithms have been proposed for NISQ devices [40], [41]. For instance, a quantum convolutional neural network (QCNN) on NISQ devices is implemented by multiple control gates [41]. Future works will extend and apply the proposed MCT gate decomposition methods and comparators to realize and optimize QCNN on NISQ devices.

APPENDIX A TWO OPTIMIZATION METHODS OF THE MCT GATE
Because the T and cnot gates are more difficult to implement than other Clifford gates [20], [21], [22], two optimization methods of the MCT gate are proposed for different primary optimization goals. Rules of the first optimization method are built by the primary optimization goals {T-count, T-depth}. The second optimization method adopts the primary optimization goals {T-count, cnot-count}.

A. RULES OF THE FIRST OPTIMIZATION METHOD FOR THE MCT GATE
The optimizing rules for pairs of LI gates are presented in Fig. 22, where U, V , and W are the combination of LI gates and Clifford gates. Then, eliminating the gates in the dashed boxes in Fig. 22(a), we give rules 1 and 2, respectively.   When the unitary gate V is the inverse of U, i.e., V = U −1 , rule 4 can be simplified into rule 5 in Fig. 23(a). Then, combining rules 2, 3, and 4, we present other rules for TOF n Not in Fig. 23.

B. RULES OF THE SECOND OPTIMIZATION METHOD FOR THE MCT GATE
The primary purpose of the MCT decomposition is to optimize T-count and cnot-count. Therefore, the optimization

APPENDIX B OPTIMIZATION RULES FOR COMPARATORS
In this section, the optimization rules for comparators are given by primary optimization goals {T-count, T-depth, width} and {T-count, cnot-count, width}.

A. OPTIMIZATION RULES FOR THE LESS-THAN COMPARATOR
Using rule 1 in Fig. 22 and primary optimization goals {T-count, T-depth, width }, we provide optimization rules 10 and 11 for the less-than comparator in Fig. 26. When primary optimization goals { T-count, cnot-count, width} are adopted, optimization rules 10 and 11 are modified into the corresponding rules n10 and n11 in Fig. 27.

B. OPTIMIZATION RULES FOR THE FULL COMPARATOR
Using rules in Fig. 22, we propose rules 12, 13, and 14 for primary optimization goals {T-count, T-depth, width} in Fig. 28. The corresponding rules n12, n13, and n14 for primary optimization goals {T-count, cnot-count, width} are presented in Fig. 29.