Automated Quantum Circuit Design with Nested Monte Carlo Tree Search

Quantum algorithms based on variational approaches are one of the most promising methods to construct quantum solutions and have found a myriad of applications in the last few years. Despite the adaptability and simplicity, their scalability and the selection of suitable ans\"atzs remain key challenges. In this work, we report an algorithmic framework based on nested Monte-Carlo Tree Search (MCTS) coupled with the combinatorial multi-armed bandit (CMAB) model for the automated design of quantum circuits. Through numerical experiments, we demonstrated our algorithm applied to various kinds of problems, including the ground energy problem in quantum chemistry, quantum optimisation on a graph, solving systems of linear equations, and finding encoding circuit for quantum error detection codes. Compared to the existing approaches, the results indicate that our circuit design algorithm can explore larger search spaces and optimise quantum circuits for larger systems, showing both versatility and scalability.


Introduction
The variational quantum circuit (VQC, also known as parameterised quantum circuit, PQC) approach, first proposed for solving the ground state energy of molecules [1], have been extended to many open research problems including in the field of quantum machine learning [2], quantum chemistry [3], option pricing [4] and quantum error correction [5,6].The performance of VQC methods largely depend on the choice of a suitable ansätze, which is not an easy task because generally the search space is very large and it is not well established whether there is a common principle for designing such ansätze.For problems involving physical systems such as in quantum chemistry, we can rely on the well-defined properties of molecular systems for ansätz designing, like the hardware efficient ansätze [7] and physical-inspired ansätze, such as k-UpCCGSD [8].However, this cannot be generalised to other areas such as designing variational error correction circuits or quantum optimisation problems.For example, in [6], when developing a variational circuit that can encode logical states for the 5-qubit quantum error correction code, the authors adopted an expensive approach by randomly searching over a large number (order of 10000) of circuits.It is anticipated that, with the increasing number of application areas for VQCs and the need for scalability to tackle large problem sizes without relying on fundamental physical properties, such random search methods or methods based purely on human heuristics will struggle to find suitable ansätzes.Therefore, it is important to develop efficient methods for the automated design of variational quantum circuits.Here we focus on the development of algorithms for the automated design of VQCs by leveraging the power of artificial intelligence (AI) which can be deployed for a wide range of applications.
Although modern AI research often focuses on applications of image and natural language processing, the power of AI can also bring new knowledge in many areas, especially scientific discovery.AlphaFold2 managed to discover new mechanism for the bonding region of the protein and inhibitors [9] with competitive accuracy on predicting the threedimensional structure of proteins in the 14th Critical Assessment of protein Structure Prediction (CASP) competition.In 2021, machine learning algorithms helped mathematicians discover new mathematical relationships in two different areas of mathematics [10].Like variational quantum circuits, modern deep neural networks (DNN) also face a design problem when composing the network for certain tasks.With the help of AI algorithms, researchers developed techniques to efficiently search suitable network architectures in a large search space.Famous algorithms for neural architecture search (NAS) include the DARTS algorithm [11], which models the choice of operations placed in different layers as an independent categorical probabilistic model that can be optimised via gradient descent methods, and the PNAS algorithm [12], which models the search process with sequential model-based optimisation (SMBO) strategy.Tree-based algorithms were also proposed for NAS, such as AlphaX [13], which models the search process similarly as the search stage of AlphaGo [14].Recently, a new NAS algorithm based on tree search and combinatorial multi-armed bandits, proposed in [15], outperforms other NAS algorithms, including the previously mentioned algorithms.
Based on progress in neural architecture search algorithms, efforts have been made on developing similar approaches for Quantum Ansätz (Architecture) Search (QAS) problems.Zhang et.al [16] adapted the DARTS algorithm [11] from NAS for QAS, which models the distribution of different operations within a single layer with the independent category probabilistic model.The search algorithm will update the parameters in the VQC as well as the probabilistic model.However, it has been shown in NAS literature that DARTS tend to assign fast-converge architectures with high probability during sampling [17,18].Also, the off-the-shelf probabilistic distributions for modelling the architecture space tend to have difficulties when the search space is large.Later, the same group of authors developed a neural network to evaluate the performance of parameterised quantum circuits without actually training the circuits, and incorporated this neural network into quantum architecture search [19].While NAS algorithms often focus on image related tasks and it has been proved through many experiments that one neural network architecture can act as a backbone feature extractor for many downstream tasks, the structures of variational quantum circuits for different problems often vary a great deal with different problems, casting some doubts on the generalisation abilities of such neural predictor based QAS algorithms.Kuo et.al [20] proposed a deep reinforcement learning based method for tackling QAS.The reinforcement learning agent is optimised by the advantage actor-critic and proximal policy optimisation algorithms.However, NAS algorithms based on policy gradient reinforcement learning have been shown to get easily stuck in local minimal, producing less optimal solutions [21,22].Also, the data size for training a reinforcement learning agent will explode when the number of actions the agent can choose from is large.He et.al [23] applied meta-learning techniques to learn good heuristics of both the architecture and the parameters.Du et al. [24] proposed a QAS algorithm based on the one-shot neural architecture search, where all possible quantum circuits are represented by a supernet with a weight-sharing strategy and the circuits are sampled uniformly during the training stage.After finishing the training stage, all circuits in the supernet are ranked and the best performed circuit will be chosen for further optimisation.Later Linghu et.al [25] applied similar techniques on search to a classification circuit on a physical quantum processor.Meng et.al [26] applied Monte-Carlo tree search to ansätz optimisation for problems in quantum chemistry and condensed matter physics.However, these studies often restrict their demonstrations within one or two types of problems and small-sized systems.
In order to develop a search technique that can be applied to larger search spaces and different variational quantum problems, we introduce an algorithm for QAS problems based on combinatorial multi-armed bandit (CMAB) model as well as Monte-Carlo Tree Search (MCTS).In order to explore extremely large search spaces compared to previous work in the literature, the working of our strategy is underpinned by a reward scheme which dictates the choices of the quantum operations at each step of the algorithm with the naïve assumption [27].This enabled our strategy to work on larger systems, more than 7 qubits, whereas the existing examples [16,23,20,16,24,19] are restricted to typically 3 or 4 qubits, with the largest being 6 qubits.To demonstrate the working of our method, we showed its application to a variety of problems including encoding the logic states for the [[4,2,2]] quantum error detection code, solving the ground energy problem for different molecules as well as linear systems of equations, and searching the ansätz for solving optimisations problems.Our work confirms that the automated quantum architecture search based on the MCTS+CMAB approach exhibits great versatility and scalability, and therefore should provide an efficient solution and new insights to the problems of designing variational quantum circuits.
This paper is organised as follows: Section 2 introduces the basic notion of Monte-Carlo tree search, as well as other techniques required for our algorithm, including nested MCTS and naïve assumptions from the CMAB model.Section 3 reports the results based on the application of our search algorithm to various problems, including searching for encoding circuits for the [[4,2,2]] quantum error detection code, the ansätz circuit for finding the ground state energy of different molecules, as well as circuits for solving linear system of equations and optimisation.In Section 4 we discuss the results and conclusions.

Problem Formulation
In this paper, we formulate the quantum ansätz search problem, which is aimed to automatically design variational quantum circuits to perform various tasks, as a tree structure.We slice a quantum circuit into layers, and for each layer there is a pool of candidate operations.Starting with an empty circuit, we fill the layers with operations chosen by the search algorithm, from the first to the final layer.After that, we formulate the combinations of different choices of operations at different layer position in the circuit (d) as a search tree (e).In (f), we evaluate our circuit on a quantum processor or quantum simulator to get value of the loss or reward function, and according to the value of the loss/reward function we update the parameters on a classical computer, then use MCTS to search for the current best circuit.We then send the updated circuit structure together with the updated parameters to the quantum processor/simulator to obtain a new set of loss/reward values.The process depicted in (f) will repeat until a circuit that meets the stopping criteria is found.Then, as shown in (g), we will follow the usual process to optimize the parameters in the searched variational quantum circuit by classical-quantum hybrid computing.
A quantum circuit is represented as a (ordered) list, P, of operations of length p chosen from the operation list.The length of this list is fixed within the problem.The operation pool is a set with |C| = c the number of elements.Each element U i is a possible choice for a certain layer of the quantum circuit.Such operations can be parameterised (e.g. the R Z (θ) gate), or non-parameterised (e.g. the Pauli gates).A quantum circuit with four layers could, for instance, be represented as: where, according to the search algorithm, the operations chosen for the first, second, third and fourth layer are U 0 , U 1 , U 2 , U 1 .In this case, p = 4 and the size of the operation pool |C| = c.The search tree is shown in Fig. 3 In this paper, we will only deal with unitary operations or unitary channels.The output state of such a quantum circuit can then be written as: where |ϕ init is the initial state of the quantum circuit.For simplicity, we will use integers to denote the chosen operations (such operations can be whole-layer unitaries, like the mixing Hamiltonians often seen in typical QAOA circuits, or just single-and two-qubit gates).For example, the quantum circuit from Eqn. 3 can be written as: and the operation at the i th layer can be referred as k i .For example, in the quantum circuit above, we have k 2 = 1.
Figure 3: The tree representation (along the arc with blue-shaded circles) of the unitary described in Eqns. 3 and 4 as well as Fig. 2. The circle with s 0 is the root of the tree, which represents an empty circuit.Other circles with s j i in it denote the j th node at the i th level of the tree.i can also indicate the number of layers currently in the circuit at state s j i .For example, on the leftmost branch of the tree, there is a node labelled s 0 2 , indicating that it is the 0 th node at level 2. At s 0 2 , the circuit would be P s 0 2 = [U 0 , U 0 ], which clearly only has 2 layers.We can also see that some of the possible branches along the blue-node path are pruned, leading to the size of operation pool at some node smaller than the total number of possible choices c = |C|.
The performance of the quantum circuit can be evaluated from the loss L or reward R, where the reward is just the negative of the loss.Both are functions of P, and the parameters of the chosen operations θ: where λ is some penalty function that may only appear when certain circuit structures appear, as well as other kinds of penalty terms, like penalty on the sum of absolute value of weights or the number of certain type of gates in the circuit; L and R are the loss/reward before applying the penalty.The purpose of the penalty term λ is to 'sway' the search algorithm from structures we do not desire.Instead of storing all the operation parameters for each different quantum circuit, we share the parameters for a single operation at a certain location.That is, we have a multidimensional array of shape (p, c, l), where l is the maximum number of parameters for the operations in the operation pool.If all the operations in the pool are just the U 3 gate [28]: as well as its controlled version CU 3 gate on different (pairs of) qubits, then in this case l = 3.
To reduce the space required to store the parameters of all possible quantum circuits, for a quantum circuit with operation k at layer i, the parameter is the same at that layer for that specific operation is the same for all other circuits with the same operation at the same location, which means we are sharing the parameters of the unitaries in the operation pool with other circuits.For example, in Fig. 3, besides the blue-node arc there are also other paths, such and since the first two operations in P and P are the same, then we will share the parameters of U 0 and U 1 between these two circuits by setting the parameters to be the same for the U 0 and U 1 in both circuits, respectively.Such a strategy is often called "parameter-sharing" or "weight-sharing" in the neural architecture search literature.
As shown in Fig 3 and mentioned earlier, the process of composing or searching a circuit can be formulated in the form of the tree structure.For example, if we start from an empty list P = [ ] with maximal length four and an operation pool with three elements C = {U 0 , U 1 , U 2 }, then the state of the root node of our search tree will be the empty list The root node will have three possible actions (if there are no restrictions on what kind of operations can be chosen), which will lead us to three children nodes with states . For each of these nodes, there will be a certain number of different operations that can be chosen to append the end of the list, depending on the specific restrictions.There will always be a "placeholder" operation that can be chosen if all other operations fail to meet the restrictions.The penalty resulting from the number of "placeholder" operations will only be reflected in the loss (or reward) of the circuit.The nodes can always be expanded with different actions, leading to different children, until the maximum length of the quantum circuit has been reached, which will give us the leaf node of the search tree.
The process of choosing operations at each layer can be viewed as a both a local and global multi-armed bandit (MAB).A multi-armed bandit, just as its name indicates, is similar to a bandit, or slot machine (in the casino), but has multiple levers, or arms, that can be pulled.Or equivalently, it can be viewed as someone who has multiple arms (maybe Squidward) that can pull the levers on different slot machines.In both cases, the rewards obtained from pulling different arms follow different (often unknown) distributions.The person pulling these arms needs to develop a strategy that can maximise his rewards from the machine(s).If we consider the whole circuit search problem as an MAB (the global MAB, M AB g ), then the "arms" are different circuit configurations.Although the rewards of these circuits are relatively easy to obtain based on the value of their cost functions after training of the circuits is finished (which still requires a fair amount of time for training), the exploding number of possible circuit configurations when the size of operation pool and number of layers increase makes it impossible to perform an informed search for suitable solutions while training every circuit we encountered during the search process.Since our circuit is basically a combination of different choice of layer unitaries, we can decompose the whole problem into the choices of unitaries at each layer, which is the local MAB, M AB i , i denoting the MAB problem from choosing the suitable unitary at layer i.In the local MAB for a single layer, the "arms" of the MAB are no longer the circuit configuration, instead the (permitted) unitary operations from the operation pool C.Although the number of choices for the local MABs is considerably smaller than the global MAB, the reward for each arm is not directly observable.In next section, we will introduce the naïve assumption [27] to approximate the rewards of the local MABs from the global MAB, which will help us determine the rewards of the actions on each node (state) on the search tree for MCTS.
• Local MAB : The choice of unitary operations at each layer can be considered a local MAB.That is, different unitary operations can be treated as different "arms" of the bandit; • Global MAB : We can also treat the composition of the entire quantum circuit as a global MAB.That is, different quantum circuits can be viewed as different "arms" of the global bandit.

Monte Carlo tree search (MCTS), nested MCTS and the naïve assumption
Monte Carlo tree search (MCTS) is a heuristic search algorithm for a sequence decision process.It has achieved great success in other areas, including defeating the 18-time world champion Lee Sedol in the game of Go [14,29].Generally, there are four stages in a single iteration of MCTS (see Fig. 4) [30]: • Selection:(Fig.4(a)) In the selection stage, the algorithm will, starting from the root of the tree, find a node at the end of an arc (a path from the root of the tree to the leaf node, the path marked by bold arrows and blue circles in Fig. 4).The nodes along the arc are selected according to some policy, often referred as the "selection policy", until a non fully expanded node or a leaf node is reached.If the node is a leaf node, i.e after selecting the operation for the last layer of the quantum circuit, we can directly jump to the simulation stage to get the reward of the corresponding arc.If the node is not a leaf node, i.e the node is not fully expanded, then we can progress to the next stage; • Expansion:(Fig.4(b)) In the expansion stage, at the node selected in the previous stage, we choose a previously unvisited child by choosing a previously unperformed action.We can see from the upper right tree in Fig. 4 that a new node has been expanded at the end of the arc; • Simulation:(Fig.4(c)) In the simulation stage, if the node obtained from the previous stages is not a leaf node, we continue down the tree until we have reached a leaf node, i.e finish choosing the operation for the last layer.After we have the leaf node, we simulate the circuit and obtain the loss L (or reward R).Usually, the loss L is required to update the parameters in the circuit; • Backpropagation:(Fig. 4(d)) In this stage, the reward information obtained from the simulation stage is back-propagated through the arc leading from the root of the tree to the leaf node, and the number of visits as well as the (average) reward for each node along the arc is be updated.
The nested MCTS algorithm [31] is based on the vanilla MCTS algorithm.However, before selecting the best child according to the selection policy, a nested MCTS will be performed on the sub-trees with each child as the root node.Then the best child will be selected according to the selection policy with updated reward information, see Fig. 5.
Figure 5: Nested Monte Carlo tree search.Left: The root node has three possible actions, which in this case are unselected initially.We perform MCTS on all three children nodes (generated by the three possible actions) to update their reward information.After one iteration of MCTS with each child as root node for the search tree that MCTS performed on, the rewards of these three actions leading to the three child nodes are 10, 10, 20, respectively.In this case, the right child has the highest reward.Middle: After selecting the right side child node, we perform the same MCTS on all three possible children nodes as before, which gives updated reward information.In this case the middle child node has the highest reward, meaning that at this level we expand the middle child node.Right: Similar operations as before.If we only perform nested MCTS at the root node level, then it will be a level-1 nested MCTS.
We denote a quantum circuit with p layers P = [k 1 , • • • , k p ], with each layer k i having a search space no greater than |C| = c (where c is the number of possible unitary operations, as defined earlier).Then each choice for layer k i is a local arm for the local MAB, M AB i .The set of these choices is also denoted as k i .The combination of all p layers in P forms a valid quantum circuit, which is called a global arm of the global MAB, M AB g .
Since the global arm can be formed from the combination of the local arms, if we use the naïve assumption [27], the global reward R global for M AB g can be approximated by the sum of the reward of local MABs, and each local reward only depends on the choice made in each local MAB.This also means that, if the global reward is more easily accessed than the local rewards, then the local rewards can be approximated from the global reward.With the naïve assumption, we can have a linear relationship between the global reward and local rewards: When searching for quantum circuits, we have no access to the reward distribution of individual unitary operations, however, we can apply the naïve assumption to approximate those rewards ("local reward") with the global reward: where R i is the reward for pulling an arm at local M AB i and R global is the reward for the global arm.Also, if we use the naïve assumption, we will not need to directly optimise on the large space of global arms as in traditional MABs.Instead, we can apply MCTS on the local MABs to find the best combination of local arms.
In the original work on nested MCTS [31], a random policy was adopted for sampling.In this paper we will instead change it to the famous UCB policy [32].Given a local M AB i , with the set of all the possible choices k i , the UCB policy can be defined as: where R(k i , arm j ) is the average reward for arm j (i.e the reward for operation choice U j for layer k i ) in local M AB i , n i is the number of times that M AB i has been used and n j is the number of times that arm j has been pulled.The parameter α provides a balance between exploration ( 2 ln n i n j ) and exploitation ( R(k i , arm j )).The UCB policy modifies the reward which the selection of action will be based on.
For small α, the actual reward from the bandit will play a more important role in the UCB modified rewards, which will lead to selecting actions with previously observed high rewards.When α is large enough, the second term, which will be relatively large if M AB i has been visited many times but arm j of M AB i has only been pulled a small number of times, will have more impact on the modified reward, leading to a selection favoring previously less visited actions.

QAS with Nested Naïve MCTS
Generally, a single iteration for the search algorithm will include two steps for nonparameterised circuits, and two more parameter-related steps for parameterised quantum circuits.The set of parameters, which will be referred to as the parameters of the super circuit, or just parameters, in the following algorithms, follow the same parameter sharing strategy as described in Section 2.1.That is, if the same unitary operation (say, U 2 ) appears in the same location (say, layer #5) across different quantum circuits, then the parameters are the same, even for different circuits.Also, with parameterised quantum circuits (PQC), it is common practice to "warm-up" the parameters by randomly sampling a batch of quantum circuits, calculating the averaged gradient, and update the parameters according to the averaged gradient, to get a better start for the parameters during the search process.During one iteration of the search algorithm, we have: 1. Sample a batch of quantum circuits from the super circuit with Algorithm 1; 2. (For PQCs) Calculate the averaged gradients of the sampled batch, add noise to the gradient to guide the optimiser to a more "flat" minimum if needed; 3. (For PQCs) Update the super circuit parameters according to the averaged gradients; 4. Find the best circuit with Algorithm 2.
We could also set up an early-stopping criteria for the search.That is, when the reward of the circuit obtained with Algorithm 2 meets a pre-set standard, we will stop the search algorithm and return the circuit that meet such standard (and further fine-tune the circuit parameters if there are any).
With the naïve assumption, which means the reward is evenly distributed on the local arms pulled for a global MAB, we can impose a prune ratio during the search.That is, given a node that has child nodes, if the average reward of a child node is smaller than a ratio, or percentage, of the average reward of the said node, then this child node will be removed from the set of all children, unless the number of children reached the minimum requirement.Back-propagate the reward information along the arc 3 Numerical Experiments and Results

Searching for the encoding circuit of [[4,2,2]] quantum error detection code
The [[4,2,2]] quantum error detection code is a simple quantum error detection code, which needs 4 physical qubits for 2 logical qubits and has a code distance 2. It is the smallest stabilizer code that can detect X-and Z-errors [33].One possible set of code words for the [[4,2,2]] error detection code is: The corresponding encoding circuit is shown in Fig. 6. [[4,2,2]] code [33] to detect X-and Z-errors.It needs 4 physical qubits for 2 logical qubits and has a code distance 2. By our settings, the number of layers equals to the number of operations in the circuit.In this figure, the number of layers is 6.
Quantum error detection and correction is vital to large-scale fault-tolerant quantum computing.By searching for the encoding circuit of the [[4,2,2]] error detection code, we demonstrate that our algorithm has the potential to automatically find device-specific encoding circuits of quantum error detection and correction codes for future quantum processors.

Experiment Settings
When searching for the encoding circuit of the [[4,2,2]] quantum error correction code, we adopted an operation pool consisting of only non-parametric operations: the Hadamard gate on each of the four qubits and CNOT gates between any two qubits.The total size of the operation pool is 4 + 4! 2!×2! × 2 = 16.When there are 6 layers in total, the overall size of the search space is 16 6 ≈ 1.67 × 10 7 .
The loss function for this task is based on the fidelity between the output state of the searched circuit and the output generated by the encoding circuit from Section 4.3 of [33] (also shown in Fig. 6) when input states taken from the set of Pauli operator eigenstates and the magic state |T are used: where . The input states (initialised on all four qubits) are We denote the unitary on all four qubits shown in Fig. 6 as U [[4,2,2]] , and the unitary from the searched circuit as U Searched [[4,2,2]] , which is a function of the structure P Searched [[4,2,2]] .The loss and reward function can then be expressed as: where The circuit simulator used in this and the following numerical experiments is Pennylane [34].

Results
To verify whether the search algorithm will always reach the same solution, we ran the search algorithm twice, and both times the algorithm found an encoding circuit within a small numbers of iterations (Fig. 7), although the actual circuit are different from each other, as shown in Fig. 8.The search process that gave the circuit in Fig. 8a met the early-stopping criteria in four iterations, and the search process that gave the circuit in Fig. 8b met the early-stopping criteria in eight iterations, as shown in Fig. 7.
] code.We can see that in both cases the algorithm was able find the encoding circuit that generated the required code words in just a few iterations.'Circuit a' refers to the search rewards for the circuit in Fig. 8a and 'Circuit b' refers to the search rewards for the circuit in Fig. 8b.

Solving linear equations
The variational quantum linear solver (VQLS), first proposed in [35], is designed to solve linear systems Ax = b on near term quantum devices.Instead of using quantum phase estimation like the HHL algorithm [36], which is unfeasible on near term devices due to large circuit depth, VQLS adopts a variational circuit to prepare a state |x such that In this section, we will task our algorithm to automatically search for a variantional circuit to prepare a state |x to solve Ax = b with A in the form of where A l are unitaries, and |b = H ⊗n |0 .We will also adopt the local cost function C L described in [35]: where U = H ⊗n , V is the (searched) variational circuit that can produce the solution state V |0 = |x , and P = 1 2 + 1 2n n−1 j=0 Z j [37].

Experiment Settings
The linear system to be solved in our demonstration is: with J = 0.1, ζ = 1, η = 0.2.The loss function we adopted follows the local loss C L in Eqn.19.However, since the starting point of the loss values often has a magnitude of 10 −2 ∼ 10 −3 , we will need scaling in the reward function: where λ is a penalty term depending on the number of Placeholder gates in the circuit.
The operation pool consists of CNOT gates between neighbouring two qubits as well as the first and fourth qubits, the Placeholder and the single qubit rotation gate Rot [28]: e −i(φ−ω)/2 sin(θ/2) e i(φ+ω)/2 cos(θ/2) The size of the operation pool c = |C| = 16, and number of layers p = 10, giving us a search space of size |S| = 10 16 .There is also an additional restriction of maximum number of CNOT gates in the circuit, which is 8, the number of CNOT gates required to created two layers of circular entanglement.

Results
The search rewards as well as fine-tune losses are shown in The four Hadamard gates at the beginning of the circuit are to put everything in an equal superposition, and not included when constructing the search tree, i.e. the composed circuits will always start with four Hadamard gates placed on the four qubits.When drawing the circuit, the Placeholder gates, which are just identity gates, are removed from searched P, although they were considered when constructing the search tree.The number of shots for measurement is 10 6 .We can see that the quantum results is very close to the classically obtained ones, showing that our algorithm can be indeed applied to finding variational ansätz for VQLS problems.

Search for quantum chemistry ansätze
Recently, there has been a lot of progress made on finding the ground state energy of a molecule on a quantum computer with variational circuit, both on theoretical [38,39,40] and experimental [1,41,42,43,44,45] front.Normally, when designing the ansätz for the ground energy problem either a physically plausible or a hardware efficient ansätz needs to be found.However, our algorithm provides an approach which can minimise the effort needed to carefully choose an ansätz and automatically design the circuit according to the device gate set and topology.
Generally speaking, solving the ground energy problem with quantum computers is an application of the variational principle [46]: where H is the system Hamiltonian, | 0 is the "trail ket" [46], or ansätz, trying to mimic the real wave function at ground state with energy E 0 , which is the smallest eigenvalue of the system Hamiltonian H. Starting from |0 ⊗n for an n−qubit system, the "trial ket" can be written as a function of a set of (real) parameters θ: Given an ansätz, the goal of optimisation is to find a set of parameters θ that minimises the right hand side of Eqn 24.However, in our research, the form of the trail wave function will no longer be fixed.We will not only vary the parameters, but also the circuit structure that represent the ansätz.

Experiment settings
Search an ansätz for finding the ground energy of H 2 : In this experiment, we adopted the 4-qubit Hamiltonian H hydrogen for the hydrogen molecule H 2 generated by the Pennylane-QChem [34] package, when the coordinates of the two hydrogen atoms are (0, 0, −0.66140414) and (0, 0, 0.66140414), respectively, in atom units.The goal of this experiment is to find an ansätz that can produce similar states as the four-qubit Givens rotation for single and double excitation.The unitary operator1 that performs single excitation on a subspace spanned by {|01 , |10 } can be written as And the transformation of the double excitation on the subspace spanned by {|1100 , |0011 } is2 : Following [47], we initialised the circuit with the 4-qubit vacuum state |ψ 0 = |0000 .We denote the unitary for the searched ansätz U SearchedAnsatz , which is a function of its structure P SearchedAnsatz and corresponding parameters.Then the loss and reward functions can be written as: The operation pool consists of Placeholder gates, Rot gates and CNOT gates with a linear entanglement topology (nearest neighbour interactions).The maximum number of layers is 30, with maximum number of CNOT gates 30/2 = 15, and no penalty term for the number of Placeholder gates: Such settings of operation pool and number of layers will give us an overall search space of size 14 30 ≈ 2.42×10 34 .However, the imposed hard limits and gate limits will drastically reduce the size of the search space.

Search an ansätz for finding the ground energy of LiH
The loss and reward functions for the LiH task are similar to the H 2 one: and the initial state is also the vacuum state |ψ 0 = |0 ⊗10 .The Hamiltonian is obtained at bond length 2.969280527 Bohr, or 1.5712755873606 Angstrom, with 2 active electrons and 5 active orbitals.The size of the operation pool c = |C| = 38, including Rot gates, Placeholder and CNOT gates operating on neighbouring qubits on a line topology.The maximum number of layers is 20, giving us a search space of size |S| = 38 20 ≈ 3.94 × 10 31 .
The 'hard limit' on the number of CNOT gates in the circuit is 20/2 = 10.

Search an ansätz for finding the ground energy of H 2 O
The loss and reward functions of the water molecule are shown as follows: and the initial state is also the vacuum state |ψ 0 = |0 ⊗8 .The Hamiltonian is obtained when the three atoms are positioned at the following coordinates: H : (0., 0., 0.); O : (1.63234543, 0.86417176, 0); H : (3.36087791, 0., 0.) Units are in Angstrom.Active electrons is set to 4 and active orbitals is set to 4. The size of the operation pool c = |C| = 30, including Rot gates, Placeholder and CNOT gates operating on neighbouring qubits on a line topology.The maximum number of layers is 20, giving us a search space of size |S| = 30 50 ≈ 7.18 × 10 73 .The 'hard limit' on the number of CNOT gates in the circuit is 25.

H 2 Results
The search reward when finding the suitable circuit structure is shown in Where CN OT 1,2 is the CNOT gate controlled by the first qubit and target on the second qubit, and RZ 2 (θ) is a Z-rotation gate on the second qubit.However, other parts of the circuit are not familiar, which indicates that the search algorithm can go beyond human intuition.The total number of gates in the circuit is 22, including 13 local CNOT gates.(a) Search rewards for the H2 ansätz.We can see that for most of the 50 iterations, the reward for the best circuit sampled from the search tree stays over 0.7.(b) Fine-tune loss for the searched H2 circuit.At the last iteration of optimisation, the energy is around -1.1359 Ha.The classically computed full configuration interaction result with PySCF [48,49], which is around -1.132 Ha and marked by the red horizontal dashed line.The difference between the energy achieved by the searched circuit and PySCF is close to chemical accuracy.)(a) Search rewards for the H2O ansätz.We can see that for most of the 50 iterations, the reward for the best circuit sampled from the search tree stays over 74.9.

Solving the MaxCut problem
As a classic and well-known optimisation problem, the MaxCut problem plays an important role in network science, circuit design, as well as physics [50].The objective of the MaxCut problem is to find a partition z of vertices in a graph G = (V, E) which maximises the number of edges connecting the vertices in two disjoint sets A and B: where C a (z) = 1 if the a th edge connects one vortex in set A and one vortex in set B, and C a (z) = 0 otherwise.To perform the optimisation on a quantum computer, we will need to transform the cost function into Ising formulation: where Z i is the Pauli Z operator on the i th qubit and w ij is the weight of edge (i, j) ∈ E for weighted MaxCut problem.For unweighted problems, w ij = 1.In this formulation, vertices are represented by qubits in computational bases.By finding the wave-function that minimises the cost Hamiltonian H C , we can find the solution that maximises C(z).
Previously, the major components of the QAOA (quantum approximate optimisation algorithm) ansätz are the cost Hamiltonian encoded by the cost unitary and the mixing Hamiltonians encoded by the mixing unitaries [51].Although this ansätz can find all the solutions in a equal superposition form, it is not always effective when the number of layers is small.Also, when the number of qubits (vertices) grows, the required number of layers and the number of shots during measurement to extract all of the solutions will also grow.Since we have already had a Hamiltonian as our cost function in Sec.3.3, we follow similar approach as quantum chemistry to find one of the solutions when the number of vertices is large.
The reward function is simply the negative of the loss function: We ran the search algorithm twice with the same basic settings, including the operation pool and the maximum number of layers.Since there is a random sampling process during the warm-up stage, the final solutions found by the algorithm are expected to be different.
The operation pool consists of CNOT gates between every two qubits, the Placeholder and the single qubit rotation gate [28]:

Unweighted MaxCut
The two runs of the search algorithms gave us two circuits (Fig. 20), leading to two of the six optimal solutions (Fig. 21).The search rewards and fine-tune losses for both circuits are shown in Figure 22.During the search stage, since we already know the maximum reward it could reach is 7, and the reward can only be integers, we set the early-stopping limit to 6.5 to reduce the amount of time spent on searching, which means the algorithm will stop searching and proceed to fine-tuning the parameters in the circuit after the reward exceeds 6.5.In a real-world application, we could let the search algorithm run through all of the pre-set number of iterations and record the best circuit structure as well as the corresponding rewards at each iteration at the same time.Then after the search stage finishes, we can choose the best circuit (or top-k circuits) in the search history to fine-tune, increasing our chance to find the optimal solution.
H  . Fig. 20a gives the solution 0110010 (see Fig. 21a) and Fig. 20b gives the solution 0111010 (see Fig. 21b).(a) The change of rewards w.r.t.search iteration during the search for the ansätz (in Fig. 20a) that gives the solution 0110010 (Fig. 21a).To reduce the amount of time for searching, we stopped the algorithm after the search reward exceeded 6.5.20a) that gives the solution 0110010 (Fig. 21a).We can see that the final loss is very close to -7, indicating that the circuit we found can produce an optimal solution.(c) The change of rewards w.r.t.search iteration during the search for the ansätz (in Fig. 20b) that gives the solution 0111010 (Fig. 21b).To reduce the amount of time for searching, we stopped the algorithm after the search reward exceeded 6.5.

Loss (Energy)
Objective at optimal solution (d) The change of loss w.r.t.optimisation iteration during the fine-tune for the ansätz (in Fig. 20b) that gives the solution 0111010 (Fig. 21b).We can see that the final loss is very close to -7, indicating that the circuit we found can produce an optimal solution.

Discussion
In this paper, we first formulated the circuit search problem as the tree structure.The sampled circuit can be represented as an arc (path from the root to a leaf) on the tree.We also introduced combinatorial multi-armed bandit and naïve assumption to model the selection of unitary operators for each layer in the circuit, and approximate the rewards of different unitaries with the reward of a fully constructed circuit.The search process is solved with Monte Carlo tree search (MCTS) algorithm.We demonstrated the effectiveness of our algorithmic framework with various examples, including finding the encoding circuit of the [[4,2,2]] quantum error detection code, developing the ansätz for variationally solving system of linear equations, searching the circuit for solving the ground state energy problem of different molecules, as well as circuits for solving optimisation problems on a graph.To our understanding, we are the first to propose such a versatile framework for the automated discovery of quantum circuits with MCTS and combinatorial multi-armed bandits.Results showed that our framework can be applied to many different areas, especially those with problems that can be formulated as finding the ground state energy of a certain Hamiltonian.
From the experiments and results shown in the previous sections, we can see that, by formulating quantum ansätz search as a tree-based search problem, one can easily impose various kinds of restrictions ('hard limits') on the circuit structure, leading to the pruning of the search tree and the search space.Also, by introducing Placeholders, one can explore smaller circuit sizes.Since current deep reinforcement learning algorithms struggle when the state space is large but the number of reward states is small.Compared to other research work in quantum ansätz search, including the differentiable quantum ansätz algorithm proposed in [16], and other QAS algorithms based on meta-learning [23] or reinforcement learning [20], which only investigate small-scale problems, like 3-or 4qubit quantum Fourier transform in [16], 3-qubit classification task and 4-qubit H 2 ground state energy problem in [24].A larger example can be seen in [19], which is a 6-qubit transversal Ising field model.In our research, we not only looked into 4-qubit systems like the H 2 molecule, but also larger systems like the LiH and H 2 O molecule as well as MaxCut on 7-node graphs.Our circuit depth is also often larger than the previous mentioned research.Since our operation pool consist of single-qubit gates on each qubit and CNOT gates either on neighbouring qubits or between every two qubits in the system, resulting a much larger size of operation pool compared to other research.With these two factors combined, our search space size is generally larger than other QAS research.
However, there are still several hyper-parameters that need to be tuned before the search algorithm can produce satisfying results, which leaves us space for improvement for the automation level of the algorithm.In the future, we would like to investigate the performance of our algorithm under noises, as well as improve the scalability of our algorithm by introducing parallelization to the tree search algorithm when using a quantum simulator.We would also like to introduce more flexible value and/or policy functions into the algorithm.
Overall, our research has shown that MCTS enhanced with combinatorial multi-armed bandit is a very efficient approach to search for quantum circuits for a variety of problems, even when the search space is large.Therefore, it took an important leap towards the widespread applications of variational quantum algorithms on many problems.

Figure 1 :
Figure 1: An overview of the algorithmic framework proposed in this paper.The operation pool (c) is obtained by tailoring the basic operations (a) with respect to the device topology (b).After that, we formulate the combinations of different choices of operations at different layer position in the circuit (d) as a search tree (e).In (f), we evaluate our circuit on a quantum processor or quantum simulator to get value of the loss or reward function, and according to the value of the loss/reward function we update the parameters on a classical computer, then use MCTS to search for the current best circuit.We then send the updated circuit structure together with the updated parameters to the quantum processor/simulator to obtain a new set of loss/reward values.The process depicted in (f) will repeat until a circuit that meets the stopping criteria is found.Then, as shown in (g), we will follow the usual process to optimize the parameters in the searched variational quantum circuit by classical-quantum hybrid computing.

Figure 2 :
Figure 2: An example of the circuit corresponding to the series of unitaries applied to |ϕ init in Eqn.3.

Figure 4 :
Figure 4: Four stages of Monte Carlo tree search.From left to right, up to down: Selection: Go down from the root node to a non fully expanded leaf node; Expansion: Expand the selected node by taking an action; Simulation: Simulate the game, which in our case is the quantum circuit, to obtain reward information R; Backpropagation: Back-propagation of the reward information along the path (arc) taken.

Figure 7 :
Figure7: Rewards when searching for encoding circuits of the[[4,2,2]] code.We can see that in both cases the algorithm was able find the encoding circuit that generated the required code words in just a few iterations.'Circuit a' refers to the search rewards for the circuit in Fig.8aand 'Circuit b' refers to the search rewards for the circuit in Fig.8b.

Fig 9 .
We can see that the search algorithm can produce a circuit with high reward (exceeds the threshold) quickly and the loss of the optimised parameters can reach close to 0. Although facing a large search space, our algorithm can still find a circuit (shown inFig 10)  that minimises the loss function (Fig 9b) and leads us to results close to the classical solution.A comparison of the results obtained by directly solving the linear equation Ax = b and the results obtained by sampling the state |x produced by the searched circuit is shown in Fig 11.
Search rewards for VQLS.The change of rewards with respect to the iterations is shown.We can see that the reward quickly reached the early stopping threshold at iteration 10.In the VQLS case, the reward is scaled since the initial reward with random sampled circuit structure and parameters is already at the magnitude of 10 −2 .Fine-tune loss for the VQLS circuit.After the searched stopped at iteration 10 as shown in Fig9a, the structure of the circuit is left unchanged and its parameters are optimised to achieve smaller losses.The final loss of the optimised parameters is very close to 0.

Figure 9 :Figure 10 :
Figure 9: The search rewards and fine-tune loss for VQLS experiment.

Figure 11 :
Figure 11: Comparison between classical probabilities, which obtained from solving the matrix equation with the classical method, i.e. x = A −1 b, of the normalised solution vector x ||x|| for Ax = b (left), and the probabilities obtained by sampling the state |x produced by the trained circuit in Fig 10 (right).The number of shots for measurement is 10 6 .We can see that the quantum results is very close to the classically obtained ones, showing that our algorithm can be indeed applied to finding variational ansätz for VQLS problems.
Fig 12a and the training process for the circuit produced by the search algorithm is shown in Fig 12b.The ansätz is presented in Fig 13.We can see from Fig 13 that the unitaries are not randomly placed on the four wires, instead there present familiar structures like the decomposition of the SWAP gate and Ising coupling gates.An example of the Ising coupling gates (often appears in quantum optimisation problems) is the R ZZ gate:

Figure 12 :Figure 13 :
Figure 12: The search rewards and fine-tune loss for H 2 circuit.experiment.

Figure 14 :Figure 15 :H 2 O
Figure 14: The search rewards and fine-tune loss for LiH circuit.experiment.
E FCI = −75.49Ha(b) Fine-tune loss for the searched H2O circuit.At the last iteration of optimisation, the energy is around -75.4220 Ha, close to the chemical accuracy compared to classically computed full configuration interaction energy with PySCF [48, 49], which is around -75.4917 Ha

Figure 16 :
Figure 16: The search rewards and fine-tune loss for H 2 O circuit.

Figure 17 :
Figure 17: Circuit for H 2 O produced by the search algorithm.

Figure 18 :
Figure 18: Problem graph for the unweighted MaxCut experiment

)
The size of the operation pool c = |C| = 28, and the number of layers p = 15, leading to a search space of size |S| = 28 15 ≈ 5 × 10 21 .The 'hard' restrictions on the maximum number of CNOT gates in a circuit, which is 7, can help reduce the size of the search space.Weighted MaxCutFor weighted MaxCut, we have a five-node graph, which is shown in Fig 19.The solution for this problem, 00011 (11100) is simpler than the unweighted version.The reward and loss function follow the same principle of the unweighted problem.The size of the operation pool c = C = 20, and the number of layers p = 10, leading to a search space of size |S| = 20 10 ≈ 1.02 × 10 13 .The 'hard' restriction on the maximum number of CNOTs in the circuit is 5.

Figure 20 :
Figure 20: Two different circuits finding two different solutions of the MaxCut problem shown in Fig. 18.Fig.20agives the solution 0110010 (see Fig.21a) and Fig.20bgives the solution 0111010 (see Fig.21b).

Figure 21 :
Figure 21: Two different optimal solutions found by the circuits in Fig. 20a and Fig. 20b, respectively.
The change of loss w.r.t.optimisation iteration during the fine-tune for the ansätz (in Fig.

Figure 22 :
Figure 22: Search and fine-tune rewards for the circuits in Fig 20.

Figure 23 :
Figure 23: The search rewards and fine-tune losses of for the five-node MaxCut problem.
The solution sampled, which is 00011, from the circuit shown left.

Figure 24 :
Figure 24: The searched circuit and sampled solution for the five-node MaxCut problem.

Algorithm 1
SampleArc Input: sample policy P olicy, parameters of the super circuit param, number of rounds in sampling N ExploitArc Input: exploit policy P olicy, parameters of the super circuit param, number of rounds in exploitation N Output: list representation P of quantum circuit curr ← GetRoot(T r) Starting from the root node of the tree T r while curr is not leaf node do ExecuteSingleRound Input: current node n, selection policy P olicy, parameters of the super circuit param