Emulation of Quantum Algorithms Using CMOS Analog Circuits

Quantum computers are regarded as the future of computing, as they are believed to be capable of solving extremely complex problems that are intractable on conventional digital computers. However, near-term quantum computers are prone to a plethora of noise sources that are difficult to mitigate, possibly limiting their scalability and precluding us from running any useful algorithms. Quantum emulation is an alternative approach that uses classical analog hardware to emulate the properties of superposition and entanglement, thereby mimicking quantum parallelism to attain similar speeds. By contrast, the use of classical digital hardware, such as field-programmable gate arrays (FPGAs), is less inefficient at emulating a quantum computer, as it does not take advantage of the fundamentally analog nature of quantum states. Consequently, this approach adds an inherent hardware overhead that also prevents scaling. In this work, an energy-efficient quantum emulator based on analog circuits realized in UMC 180-nm CMOS technology is proposed along with the design methodologies for a scalable computing architecture. A sixfold improvement in power consumption was observed over the FPGA-based approach for a ten-qubit emulation of Grover's search algorithm (GSA). The proposed emulator is also about 400 times faster than a Ryzen 5600x six-core processor performing a simulation of six-qubit Grover's search algorithm.


I. INTRODUCTION
Quantum computers provide a new computational paradigm that is fundamentally different from classical computers [1].This revolutionary idea reconciles the theory of computation with quantum physics and has shown great promise in solving a broad variety of problems that are not believed to be efficiently solvable on traditional digital computers [3].Following Moore's law, digital computers are getting increasingly dense and ever so slightly approaching the quantum domain.This makes quantum computers an attractive candidate for replacing classical digital computers, as they fundamentally exploit quantum phenomena.Google has already claimed to have shown quantum supremacy in its seminal paper, as have others [4], [5], [6].
There are various ways to realize a quantum computer, such as using superconducting circuits [2], trapped ions [7], photons [8], or neutral atoms [9].In all these approaches, maintaining coherence between the states is a difficult task, as the slightest interaction with the surroundings can cause the system to lose its coherence and perform poorly.At present, we are far from implementing quantum algorithms fault tolerantly and on a useful scale.Quantum emulation offers an alternative approach that can be of practical utility.Unlike simulations, which perform numerical computations on classical digital processors, emulation exploits the natural parallelism of a quantum computer.However, due to their classical nature, they require resources that scale exponentially with the number of quantum bits (qubits).This may be seen as a consequence of the Strong Church-Turing hypothesis, which states that any realistic model of computation can be efficiently simulated on a Turing machine [10].(Quantum computers are believed to violate this hypothesis.)This establishes a tradeoff between time complexity and hardware resources, which can be exploited to build faster emulators at the expense of larger circuits.Consequently, numerous fieldprogrammable gate arrays (FPGAs)-based quantum emulators have been proposed in recent years owing to their huge parallelization capabilities.Khalid et al. [17] studied an efficient implementation of quantum gates in Very High-Speed Integrated Circuit Hardware Description Language (VHDL), but it lacked the programmability of gates after synthesis.Lee et al. [19] proposed a combination of parallel and serial processing on an FPGA for efficient computation, but it was not focused on emulating the full parallelization of quantum systems.Similarly, in [21], efficient implementation of only one and two-qubit gates was considered.Aminian et al. [18] proposed universality by efficiently implementing necessary quantum gates, but this approach also lacks the programmability of gates after synthesis.Pilch and Długopolski [20] developed a programmable set of gate operations, emulating full parallelization of quantum systems.However, they could only emulate a two-qubit Deutsch-Jozsa [11] due to the higher hardware complexity of their approach.To the best of authors' knowledge, this is the largest FPGA-based emulation that is programmable after synthesis.Mahmud and El-Araby [34] proposed a scalable architecture using pipelining for a four-qubit emulation of Grover's search algorithm (GSA) and quantum Fourier transform (QFT).Although Kish proposed an emulator using analog circuits, no actual implementation of such a device was presented [22].In [12] and [13], a theoretical framework for analog signalsbased quantum emulation was proposed with a numerical simulation of the Deutsch-Jozsa algorithm [11], and Cour et al. [14] proposed techniques for improving hardware resource utilization but without a hardware demonstration.
Existing works on FPGA-based quantum emulation can be broadly categorized into the following two types: 1) efficient hardware implementation of a preselected set of gates or algorithms, losing either full parallelization or programmability of gates after synthesis; 2) fully parallelized implementation reflecting the physical behavior of quantum computers but losing efficiency and, in turn, scalability.
There also exist numerous works on optimizing the HDL packages and software for efficient integration with the hardware, which is out of the scope of this work.There are two major challenges faced by an FPGA-based quantum emulator.The first and foremost challenge is that FPGAs are digital processing blocks, whereas quantum systems are analog in nature.As there is no provision for representing real numbers on FPGAs natively, they either use fixed-point or floatingpoint representations, introducing a tradeoff between accuracy and hardware resources.The second major challenge is the efficient use of hardware resources to design a quantum emulator that uses full parallelization.
Our work offers an analog alternative to digital emulation.Analog circuits can truly emulate the physical behavior of a quantum computer, as they can represent real numbers natively.This eliminates the need for designing efficient encoding schemes to represent quantum states in classical form.In addition, with a careful design methodology, they can be fully parallelized and consume fewer hardware resources than FPGAs, which makes them suitable for designing scalable emulators.On the other hand, analog devices are practically limited in resolution and can be more susceptible to noise.
There exist several large-scale emulations of quantum circuits, such as a 32-qubit emulation of GSA [33].However, the runtime of this system is a staggering 79.2 × 10 9 s, as given in Table 8.A 32-qubit emulation was also achieved using full-state vector simulation [35].However, this approach is inexact and does not truly emulate a quantum system as it incorporates data compression to circumvent the huge memory requirement of the 32-qubit circuit.In contrast, we propose a method that is an exact emulation without compromising on accuracy.A 64-qubit emulation was performed using circuit splitting [36], but the methodology used is circuit-dependent and requires supercomputers with several petabytes of storage for full simulation.Furthermore, their approach becomes exponentially harder to implement for circuits with larger depth, whereas we propose an approach that is not restricted by depth.FPGA-based approaches use enormous hardware resources to emulate quantum circuits, whereas we provide an alternate strategy using CMOS analog circuits that are efficient to implement.While FPGAbased emulations provide convenience, they cannot match the power and compactness of an analog emulator in terms of computational efficiency.Furthermore, this approach could lead to the early integration of quantum emulation algorithms into current CMOS-based system-on-chips (SoCs), similar to the path that led to the development of GPUs from CPUs.Initially, graphic computations were performed on CPUs with specialized hardware for graphic operations, such as matrix multiplications.Over time, this progressed into the development of dedicated GPUs that greatly augmented CPU performance.
Our contributions include the following: 1) representing qubits in analog circuits and developing a general framework for implementing quantum algorithms; 2) addressing various nonidealities that prevent scaling and presenting our mitigation techniques; 3) efficiently implementing a six-qubit GSA using analog circuits in 180-nm CMOS technology.
To the best of authors' knowledge, six-qubits is the largest implementation of a quantum emulator in analog circuits.
The rest of this article is organized as follows.Section II briefly summarizes the necessary quantum computing terminologies and concepts, Section III presents the framework for representing qubits and their operations, Sections IV and V extend this framework for GSA and the QFT, respectively, Section VI discusses the implementation details while addressing scalability issues, Section VII presents numerical simulation results, and Section VIII contains the conclusion and presents future directions and prospects.

II. BACKGROUND
In this section, we introduce a few terms that are extensively used in the following sections.

A. UNIVERSALITY
In classical computers, a nand gate is considered universal because any function on classical bits can be computed from a composition of nand gates.A quantum analogue of a nand gate is a set of all single-qubit gates and cnot gates, which are collectively known as a universal set of quantum gates [23].According to the Solovay-Kitaev theorem, any multiqubit gate operation can be efficiently approximated by a composition of cnot gates and single-qubit gates [24].

B. SIMULATION
Quantum simulation involves numerically representing a multiqubit state and applying the necessary gate operations.For an n-qubit system, there are 2 n complex coefficients to store and process, so performing quantum simulations on a digital computer demands exponentially more memory than a quantum computer.These digital simulators do not exploit the parallelism and simply unroll quantum parallel transformations and perform them sequentially.Thus, the time complexity is also exponentially more than a quantum computer.Several techniques [25], [26] and software packages [27], [28] exist to efficiently implement quantum algorithms, but the scalability issue is a fundamental limit and cannot be bypassed.

C. EMULATION
Unlike digital simulations, emulation on special hardwaresuch as FPGAs, takes advantage of the inherent parallelization capabilities to perform parallel quantum transformations, ideally, completing any gate operation in a single clock tick regardless of the number of qubits.While emulators do achieve the original time complexity of a quantum computer, they still need exponentially more resources than their quantum counterparts.This is because there is no classical equivalent of superposition that allows us to store 2 n states in n bits.

III. EMULATION IN ANALOG CIRCUITS
Having summarized the necessary terminologies, in this section, we discuss the mapping of qubits and gate operations onto analog circuits and their implementation.

A. QUBITS IN ANALOG CIRCUITS
A single qubit can be represented by a vector of complex numbers, α and β, as shown in the following: Mapping from qubits to the analog domain is achieved by storing these complex numbers as amplitudes of sinusoidal signals of a fixed frequency ω.For example, a single-qubit mapping to the analog domain is given by where α = α r + jα i and β = β r + jβ i are complex numbers with real parts α r and β r and imaginary parts α i and β i .
Similarly, the mapping for a two-qubit state is given by ⎡ where γ and δ are also complex.By extension, an n-qubit system with 2 n states is represented by a pair of 2 n real sinusoidal signals.

B. GATE-LEVEL VERSUS PULSE-LEVEL QUANTUM CIRCUITS
Gate-level and pulse-level quantum circuits are the two most common paradigms in use today for quantum computation.Gate-level quantum circuits are built by composing sequences of quantum gates.These gates, such as the Hadamard gate or the cnot gate, operate on individual qubits or pairs of qubits and can be used to perform a variety of quantum operations [31].Pulse-level quantum circuits, on the other hand, use a sequence of precisely timed electromagnetic pulses to manipulate qubits [32].However, pulse-level implementation requires a qubit to hold onto its state until the next pulse, which is harder to implement than a gatelevel circuit.For instance, simulating an X gate operation in a gate-level circuit requires an inverting amplifier, whereas simulating the same gate in a pulse-level circuit requires a latch to remember the qubit's state, as well as a pulse detection and classification circuit to distinguish between different gates.In addition, pulse-level circuits can be very sensitive to noise and other environmental factors because of the precise timing and shape of the electromagnetic pulses.Consequently, pulse-level circuits are not suitable for larger circuits as coordinating the precise timing and shape of pulses across multiple qubits can be challenging, especially when there are many pulses and qubits involved.Furthermore, pulse-level circuits are typically designed to implement specific quantum gates or operations, and may not be as flexible as gate-level circuits for implementing more general quantum algorithms.Pulse-level circuits can be very flexible as the sequence of pulses in a circuit can be adjusted on the fly to implement different quantum operations.However, gate-level circuits are typically more modular because each gate operates on one or a few qubits, and can be easily combined with other gates to create more complex circuits.For example, a Hadamard gate or a cnot gate can be combined with other gates to create more complex operations, such as entangling gates, which can be used to implement quantum algorithms.These gates can then be further combined to create larger circuits, or subcircuits, which can be optimized and tested independently.This modularity allows for easier design and optimization of quantum circuits, which can be a crucial factor in designing CMOS analog circuits.Therefore, we adopt the gate-level paradigm in this paper to design quantum circuits.

C. GATE OPERATIONS IN ANALOG CIRCUITS
All the gates are just matrix operations on the state vectors.Consider a single-qubit Hadamard gate acting on the qubit, defined in the following: where the subscript a in |ψ a indicates that the qubit is in the analog domain.For simplicity, the imaginary components are not shown, as the numerical expressions are the same for both real and imaginary parts.The Hadamard operation on a single qubit is shown in Fig. 1.
Similarly, consider a single-qubit X gate and a Z gate acting on |ψ a The circuit implementations of X and Z gates are shown in Figs. 2 and 3, respectively.Now, consider a two-qubit cnot gate acting on the qubit defined in (3).The matrix operation is shown in the following, and the circuit-level implementation is presented in Fig. 4: Now, consider a generic single-qubit gate G acting on the qubit defined in ( 2) where a, b, c, d, e, and f are all complex numbers.Here, we used α and β instead of α r and β r , as the real and imaginary calculations are not separable.The circuit implementation of e from ( 9) is shown in Fig. 5.The circuit for f is similar and omitted for simplicity.Now that a cnot gate and a generic single-qubit gate have been implemented, any multiqubit gate can be implemented, due to the universality of these gates.Thus, any qubit operation or algorithm can be implemented from just adders and multipliers.However, from Fig. 1, we can notice that after the initialization of the state |ψ a , the Hadamard gate is implemented by just using adders, and, from Figs. 2 and 4, the X and cnot gates are implemented by swapping wires.Also, from Fig. 3, the Z gate is implemented by using an inverter.These gates do not need any multipliers, which makes them very efficient to implement at a large scale.We will exploit  this fact by implementing an algorithm that only uses these gates, namely, GSA.

IV. GROVER'S SEARCH ALGORITHM
In this section, we map the entire GSA to analog circuits for a six-qubit system.For that, let us define the uniformly superposed state of a six-qubit system as where all the states have the same amplitude of 1/ √ 64.We get this state by acting on the |000000 state with Hadamard gates, as shown in the INIT part of Fig. 6.Thus, after initialization, we are left with a uniformly superposed state, which is then used to perform the algorithm.

A. ALGORITHM
The mathematical details of this algorithm are beyond the scope of this work.GSA for a six-qubit system is shown in Fig. 6.Measurement is not shown in the figure for simplicity.
There are a couple of things to note from Fig. 6.First, the oracle in Fig. 6 is designed to amplify the state |000000 .To amplify any other state a different oracle must be used.A general oracle can be obtained by just removing the X gate wherever the qubit is in |1 state.For example, to amplify the state |000001 , we need to remove the X gate from the first row in the oracle from Fig. 6 and keep the remaining X gates intact.Lastly, as evident from Fig. 6, we need to implement only three types of gates here: six Hadamard gates, six X gates, and a CCCCCZ gate, which is a controlled gate where the control qubits are the first five qubits and the target qubit is the last (right-most) qubit.The matrix forms of these three gates are H ⊗6 , X ⊗6 , and where and CCCCZ can be found using (11) recursively.Given all the matrix forms, we discuss the mapping of these gates into the analog domain in the next section.

B. SIX-QUBIT GATES IN ANALOG CIRCUITS
A six-qubit system has 2 6 = 64 basis states and can be represented as a 0 sin(ωt ) a 1 sin(ωt ) . . .
where a 0 to a 63 are complex with The realization of the six-qubit gates H ⊗6 , X ⊗6 , and CCCCCZ is discussed as follows. 1) The output of the Hadamard gates is where Ignoring the overall scale factor, we do not need multipliers for implementing the operation, and inverters may be used for obtaining −a i .With this, the six-qubit Hadamard gate can be implemented by using adders and inverters alone.
To obtain each b x , 63 additions are required, which can be achieved by using an opamp adder with 64 inputs to perform all the additions at once, as shown in Fig. 7. 2) X ⊗6 -GATE In matrix form, the X gate is simply a swap gate that reverses all the states as a 63 sin(ωt ) a 62 sin(ωt ) . . .
So this requires no multiplications or additions, as simple rewiring can perform the gate operation, as shown in Fig. 8.

3) CCCCC-Z-GATE
The CCCCC-Z gate operates only on the states for which the first five qubits are |1 , which is only the last two out of the 64 states, as shown in the following: Thus, this gate only needs an inverter to operate, as shown in (17) and Fig. 9. Consequently, these gates make GSA very efficient to implement.In addition, if the a i 's are real in (13), then all the subsequent operations in GSA are also real, as it includes only additions, swapping, and negation.The input to GSA is |000000 is a real state with an analog domain Therefore, the entire Grover's algorithm has only real calculations, which further simplifies its circuit.The entire block diagram of GSA in the analog domain with the operation of each block on the analog signals is shown in Fig. 10.Initialization produces a superposition, the oracle flips the sign of the selected state, and amplification boosts the amplitude of that state.In this case, only one iteration is needed as it meets the required accuracy for evaluating GSA.

C. GENERALIZATION
So far the oracle was designed to amplify the state |000000 .However, we can make the oracle programmable to amplify any arbitrary state by noticing two properties of the circuit in Fig. 10.First, as oracle consists of an X ⊗6 gate followed by a CCCCCZ gate and another X ⊗6 gate, the combined operation is . . .
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.which is the same as the operation of an isolated CCCCCZ gate.With this, the entire oracle operation can be implemented using an inverter.Second, the initialization in Fig. 10 can be removed if the input is itself a uniformly superposed state [|ψ u from (10)] instead of |000000 .We exploit these two properties to propose the following architecture, as shown in Fig. 11.In Fig. 11, the oracle is replaced with an analog multiplexer (MUX) that flips the sign of any arbitrary state chosen by the digital select pins.Consequently, we achieved runtime programmability of the GSA circuit that allows us to search for any arbitrary planted state.Designing the MUX is outside the scope of this article, as any standard analog multiplexer works.

A. ALGORITHM
The circuit for the QFT [16] for six-qubits is shown in Fig. 12.To implement this circuit, we need to find the matrix representation of each gate.The two types of gates used in Using these single-qubit gates, we can then form the sixqubit gate whose matrix forms are listed in Table 1 in the order they are applied in the circuit in Fig. 12.
The first Hadamard gate can be written as H ⊗ I ⊗5 , as only the first qubit is acted upon by a Hadamard gate, which is represented in Table 1 as {H, I, I, I, I, I}.The first controlled R 2 gate can be written as I ⊗ I 1 ⊗ I ⊗4 + R 2 ⊗ I 2 ⊗ I ⊗4 , which performs an R 1 operation on the first qubit controlled by the second qubit.Note that I 1 occupies the second place in the first term as it is the control qubit and R 2 occupies the first place in the second term as it is the target qubit.This gate is represented in Table 1 as {R 2 , C, I, I, I, I}, where R 2 denotes the rotation operation on the first qubit and C denotes the control operation on the second qubit.The rest of the gates from Fig. 12 can be obtained similarly.

B. SIX-QUBIT GATES IN ANALOG CIRCUITS 1) HADAMARD GATES
As explained in the previous section, Hadamard gates can be implemented by only using adders.For the QFT, all the Hadamard gates are single-qubit gates so two-input adders can be used to implement the operation, unlike the 64-input adders in GSA.

2) C-ROT GATES
Every qubit in Fig. 12 is acted upon by a single-qubit Hadamard gate and then by a series of controlled rotation gates whose matrix forms are listed in Table 1.However, all the C-ROT operations are diagonal matrices of size 64 × 64.where subscripts (x, r) and (x, i) indicate the real and imaginary components of the respective entry in the xth row.Thus, every C-ROT operation can be implemented as a complex multiplication, as also shown in Fig. 13 where the resistance values are fixed for a particular gate as the diagonal values are fixed.For example, the diagonal entries of various two-qubit C-ROT gates are tabulated in Table 2. Consequently, every row of ( 21) requires two opamp adders to implement.However, the entire QFT circuit can be further simplified by combining all the C-ROT gates acting on a qubit into one gate, which is also a diagonal matrix.For example, R 2 -R 6 gates acting on the first qubit can be combined into a single gate operation, and as the product of diagonal matrices is also a diagonal matrix, the combined rotation operation is also a diagonal matrix.Now we have all the necessary circuit components to implement both GSA and QFT.In the next section, we discuss the general nuances in designing these circuit elements for a scalable architecture.

VI. IMPLEMENTATION
The necessary circuit blocks required to implement GSA and QFT are described in Section IV.These blocks are inverters and adders, which can be implemented from an opamp alone.In this section, we discuss the design considerations for an opamp that can not only efficiently implement any algorithm, but also be suitable for scaling.

A. OPAMP 1) OPEN-LOOP GAIN
In Fig. 14, the equivalent circuit of an opamp adder is shown.Ideally, the output should be v out = −(v 0 + v 1 + . . .+ v 63 ), but due to the finite gain of the opamp, the output becomes where N is the number of inputs to the opamp, in this case, 64, and A is the open-loop gain.From (23), it is evident that to reduce the degradation caused by adding a large number Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Transactions on IEEE TABLE 3. Signal Degradation for Larger Quantum Systems by Using an Opamp With an Open-Loop Gain of 60 dB
of inputs, a very high voltage gain is needed.In this case, as N = 64, A should at least be third order, i.e., 60 dB.As we scale any algorithm for more qubits, the gain should also be increased accordingly.For example, the signal degradation of using an opamp with a gain of 60 dB, which is designed for a six-qubit system, for more qubits is shown in Table 3.
As evident from Table 3, an opamp designed for a sixqubit system fails to implement any quantum system with more than nine qubits due to a low noise margin.This can be prevented by either increasing the open-loop gain of the opamp or using another opamp to restore the signal strength.

2) OUTPUT IMPEDANCE
Nonzero output impedance also degrades the output of the adder.As there are a large number of inputs to the opamp, a huge current (i out ) flows into the output, as shown in Fig. 14.For a high input impedance, this current is given as which degrades the output in ( 23) by i out R o .So we need an opamp with very low output impedance.

3) VOLTAGE SWING
As a huge number of signals are added, we need an opamp with a large voltage swing to avoid saturating the output.Therefore, the desired qualities of the opamp needed to implement any algorithm are large voltage swings, low output resistance, and high gain.We are not concerned with input common-mode range and slew rate, as we are dealing with small input signals and the frequency of operation does not affect the functionality.Considering all these parameters, we choose to use a twostage opamp because a single-stage opamp does not provide the necessary gain and also has a relatively higher output impedance.In addition, employing a cascode or foldedcascode structure to increase the gain of the single-stage opamp decreases the voltage swing [30].A two-stage opamp better suits our requirement, as it decouples the gain and voltage swing requirements while providing a lower output impedance.Determining specific circuit parameter values, such as W/L ratios, bias currents, compensation resistance, and capacitance, is outside the scope of this work, as there exist many resources for it [29], [30].Now that we have the opamp adder, and we discuss the nonidealities that prevent scaling as well as methods to mitigate them.

B. CROSSTALK
Crosstalk is the phenomenon by which the components in an integrated circuit interfere and couple, degrading the signal integrity.This could be due to inductive or capacitive coupling between the components.So for scalable architectures, reducing the crosstalk is necessary.Capacitive coupling can be reduced by using shorter wires, reducing the number of vias, and increasing the clearance between the wires.Inductive coupling can be mitigated by employing a differential design paradigm, as shown in Fig. 15 where we combine two single-ended opamps by applying a signal to the first opamp and its 180 • phase-shifted signal to the second opamp.The output is taken as the difference between the individual opamp outputs.There are a couple of advantages to using this approach.

1) Differential signals reduce the common mode noise
and increase the voltage swing, therefore, increasing the signal-to-noise ratio.2) Differential signals produce less electromagnetic induction (EMI) due to the opposing magnetic fields.They are also immune to external EMI.
Due to the mentioned reasons, differential design is extremely advantageous for high-frequency and scalable designs, although it doubles the circuit size.In addition, differential design has both positive and negative signals, so we do not have to design an inverter, as all the negative signals are readily available in the circuit.Thus, inverting is achieved by a simple swapping of the positive and negative signals.

C. OFFSET
Offset in opamps is caused by various reasons, such as input offset voltage, input bias current, temperature drift, flicker noise, etc. Offset has a detrimental effect in an opamp circuit because it saturates the output.As the GSA is implemented with a series of 64 input opamp adders, if a single adder picks up any stray signals it could kill all the subsequent adders and eventually the entire circuit.So it is essential to eliminate the offset to increase the circuit depth and, in turn, the scalability of the circuit to implement larger algorithms.
To mitigate the offset, we use a technique called chopping that employs a modulation technique to eliminate not just the dc offset but also any low-frequency noise, such as the flicker (1/f) noise.Chopping modulates the input signal by a high-frequency square wave before it is processed by the main amplifier.So any low-frequency noise introduced by the amplifier does not alias with the signal.At the amplifier's output, the signal is demodulated by the same square wave to restore the original signal.However, the low-frequency noise is subject to only one modulation that will stay at higher frequencies, and can be filtered out by a low-pass filter.This entire procedure is shown in Fig. 16.
To perform chopping, mixers and low-pass filters are needed.As one of the inputs to the mixer is a square wave, a switch can be used to multiply the two signals, as shown in Fig. 17.Here, a transmission gate circuit with its gate terminal tied to the square wave and source terminal to the input is used as a mixer.Instead of mixing the differential signals separately we combine them and use a low-complexity switch to operate them.Now the differential opamp is chopped using this mixer as shown in Fig. 18.The inputs are chopped for each opamp as well as the outputs, which are then low-pass filtered to obtain offset-free signals.The design of the low-pass filter is omitted here but any precision filter can be used.

VII. RESULTS
This section discusses the complexity, power, and area of the circuits used for realizing Grover's algorithm and QFT in the UMC 180-nm CMOS process.Simulation results of both algorithms are also presented followed by a comparison of  the earlier works on quantum emulation with the proposed work.

A. COMPLEXITY
The main constituent for both GSA and QFT is an opamp.So we quantified the complexity of the circuits by the number of opamps utilized by the respective algorithm as shown in Table 4.Each n-qubit gate operation requires 2 n+2 opamps, but for GSA the gate operations are all real, reducing the required opamps by half, i.e., 2 n+1 .However, we need 2 n+2 opamps, as there are two gate operations in GSA.Similarly, for QFT, we need 2 n+2 opamps for every gate operation, as the QFT is complex and there are n operations in an n-qubit QFT.Thus, the required opamps for QFT are n2 n+2 .

B. POWER
Power consumption is an indicator of the degree of scalability of an architecture, as it directly measures the hardware resource utilization.To compare the proposed analog emulation with the FPGA-based emulation in the context of total power consumption, we performed an extensive analysis on an Artix 7 AC701 board by running GSA for varying numbers of qubits with an 8-bit mantissa.We also implemented the same using analog circuits in UMC 180-nm CMOS technology to obtain the power consumption for the respective methods.The results are plotted in Fig. 19.Although both methods consume exponentially more power with an increasing number of qubits, the analog implementation is substantially more energy efficient than the FPGA-based implementation.For example, for ten-qubits, FPGA-based emulation consumed 53.5 W, whereas analog emulation consumed only 8.3 W, which is a sixfold improvement over FPGA-based emulation.

Engineering uantum
The hardware resources utilized by analog emulation in terms of opamps and FPGA-based emulation in terms of the number of lookup tables (LUTs) are given in Table 5.However, it should be noted that the power consumption values and number of LUTs from seven-qubit emulation and higher are interpolated values due to the exhaustion of I/O ports on Artix 7. From Table 5, it is clear that the digital nature of FPGA-based emulation demands more components consuming more area and power.

C. AREA
For integrated circuits, the area consumed by the circuit is another indicator of the degree of scalability of an architecture.To quantify this, we implemented a six-qubit GSA in UMC 180-nm CMOS technology.The primary component of the circuit is a two-stage opamp, so we designed one with the characteristics given in Table 6.
Using this opamp, we designed a 64-input adder, and, as discussed earlier, we employed a differential design along with chopping to get rid of offsets and interference.The layout of the differential adder with chopping (see Fig. 18) is shown in Fig. 20.This adder is the building block of the rest of the circuit as all the six-qubit gates needed for implementing GSA can be built from it.The layout of GSA built from these adders is shown in Fig. 20.This circuit occupies a total area of 850 μm × 1000 μm.Considering that the modern ICs can go up to 10 mm × 10 mm, we can safely implement a circuit that is 10 times the size of the circuit in Fig. 20 in 180 nm, which is sufficient to emulate a nine-qubit GSA.Furthermore, using a 5-nm process within the same area, we can implement up to 15-17 qubits.
However, in practice, a large portion of the area is taken up by wires, as shown in Fig. 20.This is because of the fundamental limitation of classical systems in representing quantum states efficiently.In our methodology, states are represented by wires, and a six-qubit system needs 64 wires to represent every state.In addition, routing these wires to a large number of opamps increases the crosstalk, degrading signal integrity.So, additional care must be taken to reduce the parasitic capacitances as much as possible.

D. EMULATION
Using the circuit from Fig. 20, we ran an emulation of GSA on six qubits with a single amplification stage and an oracle whose planted solution is reconfigurable by an additional multiplexing circuit, as shown in Fig. 20.Note that the results

TABLE 7. Runtimes of Analog Emulation of GSA
presented hereafter are obtained from a postlayout simulation of the parasitics extracted model of the circuit.We set the oracle to amplify the states |000000 , |001111 , and |100000 .We collected the output signals, measured all the amplitudes, and squared them to obtain the probabilities of the respective states.These probabilities are plotted in Fig. 21.We can see that the states |000000 , |001111 , and |100000 have the highest probability, as expected in their respective cases.We also presented the schematic-level emulation results of the QFT in Fig. 22 for three different input sequences.
We compared our method with the existing methods in Table 8 from which it is evident that building a fully parallel and reconfigurable architecture in FPGAs is difficult.We can also see that to implement more qubits, serialization becomes unavoidable in FPGAs.Moreover, the proposed analog circuitsbased architecture achieved full parallelization along with reconfigurability, consuming significantly less power.To gain a better understanding of the proposed method's scalability, we estimated the runtimes for a greater number of qubits through extrapolation and presented the values in Table 7.Our observations reveal an exponential increase in runtime with the number of qubits.In addition, crosstalk and noise also increase with an increasing number of qubits, which altogether imposes limitations on the practical applications of our method to large-scale quantum emulations.However, the runtime and the power consumption of our approach are an order of magnitude less when compared with other existing methods for six-qubit systems, as evident from Table 8 and Fig. 19.Thus, analog emulation of quantum algorithms presents an efficient strategy for moderate-scale quantum emulations.
We also benchmarked our results against software simulations based on an open-source quantum library,

TABLE 8. Comparison With Earlier Works on Quantum Emulation
libquantum [28].The simulation of a six-qubit GSA is performed on a Ryzen 5600x six-core processor with 3.7 GHz clock frequency that took 404 μs on average to run the algorithm.By contrast, the emulation of GSA on an analog integrated circuit consuming only 152 mW of power took 1.05 μs on average after parasitic extraction.That is a factor of 400 improvement for six-qubits.

E. DISCUSSION
To the best of authors knowledge, six-qubits is the largest implementation of a quantum emulator in analog circuits.Note that the qubits designed are physical qubits that serve as both the physical and logical qubits.No hardware or qubit combining is necessary as these physical qubits inherently represent the logical qubits required for the computations.
Although IBM announced its Osprey processor with 433 qubits, we want to highlight that all of the qubits in it are not currently usable [37].Since we are still in the era of noisy intermediate-scale quantum computing (NISQ), the accepted method for measuring the number of qubits in a processor is through quantum-volume [37].Although IBM has not disclosed the quantum-volume of their Osprey processor, we do have the quantum-volume for their previous Eagle processor, which had a total of 127 qubits but only six of them were usable (refer to [37,Table 2]).With Osprey, IBM tripled the number of qubits, which, if we extrapolate the quantum-volume, would yield around 18-20 usable qubits, representing the current state of the art.On the other hand, using analog emulation allows us to achieve nearly 100% quantum-volume, as there are no entanglement or quantum coherence issues to contend with.Our six-qubit circuit can emulate six-qubit quantum algorithms, which can be called moderate-scale when compared with the state of the art of 18-20 qubits.
It should be noted that IBM claims a quantum volume of 32 for its Eagle processor [37] but it is important to emphasize that the quantum volume of a device relies on several factors, including the quantum-volume protocol (such as the number of circuit-runs), circuit compiler, qubit layout, and fidelity [37].Hence, a higher quantum volume does not necessarily indicate a greater capability of the device, but it could also reflect superior compilation techniques or a more rigorous quantum volume protocol.To evaluate the device's true capability while minimizing the impact of other factors, a subgraph compilation technique is popularly used [37].This approach enables us to identify the optimal set of physical qubits that can be combined to form the logical qubit set required for executing the circuit effectively thus, eliminating the dependency of quantum-volume on other factors.Second, we would like to compare our approach's hardware capabilities with IBM's backend devices.In this context, it is important to note that an analog emulator is a deterministic device, meaning it produces similar results consistently for the same input.To ensure a fair comparison, it is crucial to employ a quantum-volume protocol that exhibits deterministic behavior.However, using a generic quantum-volume protocol that considers all the factors involved does not guarantee deterministic behavior.This is due to the fact that such a protocol selects a different set of physical qubits each time and may yield less consistent results as all qubits are different.Consequently, we have opted for subgraph compilation, which identifies the set of qubits with the highest fidelity.This allows the given circuit to be executed with similar efficiency every single time, ensuring the desired level of consistency for a reliable comparison with analog emulation.The quantum-volume value of six for the Eagle processor accurately represents the number of physical qubits that consistently demonstrate high fidelity over an extended period of time.This value becomes particularly significant when compared with the physical qubits of analog emulation, as they share similar characteristics of high fidelity and consistency.
In addition, we implemented the circuit on a UMC 180-nm node, which is an older node and has only six metal layers for interconnections as opposed to the latest technology nodes that have 10-12 metal layers to facilitate efficient interconnections with minimal crosstalk.Thus, using the latest technology node, such as 5 nm with 12 metal layer stack, we can achieve implementation of up to 15-17 qubits, which is comparable to the Osprey processor [37] in terms of quantum volume.

VIII. CONCLUSION
In this article, we proposed a general framework for implementing moderate-scale quantum algorithms using CMOS analog circuits, and then studied GSA and the QFT in detail.We also presented and addressed the challenges in designing a scalable architecture using analog circuits.Using the proposed method for a ten-qubit emulation of Grover's search algorithm, a sixfold improvement in power consumption was observed over FPGA-based emulation.We also proved the advantage of analog emulation over software simulation on digital computers.The analog emulation of six-qubit GSA is about 400 times faster than the simulation performed on a Ryzen 5600x six-core processor running at 3.7 GHz clock frequency.
There exists a few applications that already demonstrate the utility of quantum emulation on a CMOS platform.For instance, in [38], [39], and the references therein, custom CMOS circuits are utilized to address combinatorial optimization (CO) problems.Aminian et al. [38] demonstrated superior overall performance compared with prior works employing Ising machines (i.e., spin states) for CO problem solving.While the authors in [38] and [39] do not represent universal quantum computers, they exemplify the advantages of using dedicated classical hardware to emulate quantum systems over other conventional approaches.Conversely, the variational quantum eigensolver (VQE) [40] is another algorithm that benefits from a dedicated emulator.This algorithm forms the foundation of quantum simulations, leveraging both classical and quantum computers to determine the ground state of a given physical system.As VQE involves a classical-quantum interface, employing dedicated hardware for the quantum component enhances the algorithm's efficiency.For FPGA-based emulators, there exists a compromise between postsynthesis programmability and the number of emulated qubits.An FPGA-based implementation of a quantum emulator that is programmable after synthesis for more than two qubits is yet to be demonstrated.On the other hand, programmability in analog circuits can be achieved relatively easily, as analog circuits natively support analog computation, without compromising in the number of qubits.This makes analog ASICs achieve higher programmable qubit count, which would benefit the VQE in simulating larger qubit systems.In addition, ASICs consume less power and are significantly faster than their FPGA counterparts, as given in Table 8 and Fig. 19.
However, the fundamental limitation on the classical systems to emulate larger quantum algorithms cannot be bypassed due to the exponentially large hardware utilization for storing and processing all the states.Unlike in FPGA-based quantum emulators where the states are stored in memory, in analog circuits-based emulators, the states are the wires themselves.Thus, to increase the number of qubits, we need to route exponentially more wires that increase the crosstalk and the parasitic effects in the circuit that in turn increase the runtime and decrease the performance.So an efficient routing protocol is essential for emulating large algorithms.Analog circuits have found extensive use as hardware accelerators in deep learning and neuromorphic computing applications, particularly for intensive multiplication operations [41].Efficient multiplication architectures based on crossbar arrays, such as memristors and memory arrays, have been employed [41].In a study [41], two crossbar arrays of size 1024 × 1024 were implemented without experiencing crosstalk issues, as they were implemented as differential circuits that are resilient to crosstalk.Consequently, the same circuit corresponds to a total of 11 qubits when utilized in analog quantum emulation, which is promising.In addition, qubits can be designed not only in the spatial domain but also 3102116 VOLUME 4, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
in the temporal and spectral domains [14], reducing circuit size and facilitating an increase in the number of qubits.Furthermore, the continuous progress in technology is leading to advancements in 3-nm technologies, which would allow us to design more qubits in the same space achieving a qubit count and quantum volume as high as 20-25.
On the other hand, the number of input and output pads is limited in any technology and they quickly become insufficient even for an emulator of modest size.This could be another area for future work, i.e., to implement efficient multiplexing strategies.With these, a quantum emulator built from analog circuits can represent a more efficient strategy for implementing quantum algorithms, which is further enhanced by advancements in memory architectures and largescale integration of MAC operations.This, coupled with the integration of qubits in temporal and spectral domains, makes analog quantum emulation a competitive approach compared with FPGA-based emulators.

FIGURE 1 .
FIGURE 1. Implementation of Hadamard gate in the analog domain where inputs are sinusoidal with amplitudes α r and β r .The circuit corresponding to the imaginary parts of α and β is omitted for simplicity.

FIGURE 2 .
FIGURE 2. Implementation of X gate in the analog domain, where inputs are sinusoidal with amplitudes α r and β r .The circuit corresponding to the imaginary parts of α and β is omitted for simplicity.

FIGURE 3 .
FIGURE 3. Implementation of Z gate in the analog domain where inputs are sinusoidal with amplitudes α r and β r .The circuit corresponding to the imaginary parts of α and β is omitted for simplicity.

FIGURE 4 .
FIGURE 4. Implementation of CNOT gate in the analog domain, where inputs are sinusoidal with amplitudes α r , β r , γ r , and δ r .The circuit corresponding to the imaginary parts of α, β, γ, and δ is omitted for simplicity.

FIGURE 5 .
FIGURE 5. Circuit for e in (9) showing the implementation of a generic single-qubit gate in the analog domain, where inputs are sinusoidal with amplitudes α and β.The circuit for f is similar and omitted for simplicity.

FIGURE 6 .
FIGURE 6. GSA for a six-qubit system shows initialization, oracle, and amplification.Measurement is omitted for simplicity.

FIGURE 7 .FIGURE 8 .
FIGURE 7. Opamp adder with 64 inputs where R is the feedback resistance.

FIGURE 9 .FIGURE 10 .
FIGURE 9. Implementation of CCCCCZ gate in the analog domain where the inputs are sinusoidal.

FIGURE 11 .
FIGURE 11.Block diagram of generalized GSA in the analog domain with the initialization block removed by changing the input itself to a uniform superposed state and replacing the oracle with a multiplexer (MUX) circuit that inverts arbitrary input signal chosen by the select pins.

FIGURE 12 .
FIGURE 12. Circuit for a six-qubit QFT.Each qubit is acted upon by a single-qubit Hadamard gate followed by a series of two-qubit controlled rotation gates.In the end, all the qubits are reversed using SWAP gates.

FIGURE 13 .
FIGURE 13. Circuit for complex multiplication, where all the resistor values are fixed for a given gate operation.Negative signs in the output can be neglected, as they do not affect the functionality of the circuit.

FIGURE 14 .
FIGURE 14. Equivalent circuit of an opamp adder.R i and R o are input and output impedances.R is the feedback resistance.A is the open-loop voltage gain of the opamp.

FIGURE 15 .
FIGURE 15.Pair of opamps with single-ended input and single-ended output combined to get a differential opamp.v in is the difference between the two inputs and v out is the difference between the two outputs.v icm is the common mode voltage, which is usually V DD /2.

FIGURE 16 .
FIGURE 16.Chopping an amplifier in five steps.1) Input to be amplified (shown in black).2) Input is modulated with a square wave at f ch .3) Offset is added (shown in red).4) Signal is demodulated with the same square wave.5) Low-pass filtering.At the output, the original signal is restored (with gain) and low-frequency noise is filtered out.

FIGURE 17 .
FIGURE 17. Mixer built from transmission gates.Here, v in is modulated by a square wave (cl k), but the output is reduced by half, as it is a single-ended output whereas the input is differential.When paired with its differential half, the factor of two vanishes.

FIGURE 18 .
FIGURE 18. Chopping a differential opamp.First, the inputs are modulated at cl k frequency and then amplified by the opamp pair to get v 1 and v 2 .These signals are then demodulated at the same cl k frequency and low-pass filtered to remove the offset and flicker noise.

TABLE 4 .FIGURE 19 .
FIGURE 19.Total power consumption of FPGA-based emulation versus analog emulation for GSA.

FIGURE 20 .
FIGURE 20.Layout of six-qubit GSA in UMC 180-nm CMOS technology showing the respective components.The opamp adder is the unit block of the circuit, whereas the MUX facilitates the circuit to modify the oracle.

TABLE 1 . Matrix Forms of Various Gates in QFT QFT
are the Hadamard and rotation gate, given by

TABLE 5 . Resource Utilization of Analog Versus FPGA-Based Emulation of GSATABLE 6 .
Opamp Characteristics