Bayesian Optimization for QAOA

The quantum approximate optimization algorithm (QAOA) adopts a hybrid quantum-classical approach to find approximate solutions to variational optimization problems. In fact, it relies on a classical subroutine to optimize the parameters of a quantum circuit. In this article, we present a Bayesian optimization procedure to fulfill this optimization task, and we investigate its performance in comparison with other global optimizers. We show that our approach allows for a significant reduction in the number of calls to the quantum circuit, which is typically the most expensive part of the QAOA. We demonstrate that our method works well also in the regime of slow circuit repetition rates and that a few measurements of the quantum ansatz would already suffice to achieve a good estimate of the energy. In addition, we study the performance of our method in the presence of noise at gate level, and we find that for low circuit depths, it is robust against noise. Our results suggest that the method proposed here is a promising framework to leverage the hybrid nature of QAOA on the noisy intermediate-scale quantum devices.

The Quantum Approximate Optimization Algorithm (QAOA) adopts a hybrid quantum-classical approach to find approximate solutions to variational optimization problems.In fact, it relies on a classical subroutine to optimize the parameters of a quantum circuit.In this work we present a Bayesian optimization procedure to fulfil this optimization task, and we investigate its performance in comparison with other global optimizers.We show that our approach allows for a significant reduction in the number of calls to the quantum circuit, which is typically the most expensive part of the QAOA.We demonstrate that our method works well also in the regime of slow circuit repetition rates, and that few measurements of the quantum ansatz would already suffice to achieve a good estimate of the energy.In addition, we study the performance of our method in the presence of noise at gate level, and we find that for low circuit depths it is robust against noise.Our results suggest that the method proposed here is a promising framework to leverage the hybrid nature of QAOA on the noisy intermediate-scale quantum devices.

I. INTRODUCTION
Hybrid quantum-classical variational algorithms [1][2][3] play a central role in the current research on noisy intermediatescale quantum (NISQ) devices [4].In a hybrid variational setting, a classical computer is entrusted with the non-trivial task of optimizing the parameters of a quantum state.These algorithms implement a heuristic protocol to approximately solve variational problems including combinatorial optimization tasks, which are ubiquitous and have a great practical importance [5], and are indeed one of the main drivers of the industry interest towards quantum computing applications.Unfortunately, the problems belonging to this class are hard to solve with classical methods [6].In this paper we focus on the Max-Cut and the Max Independent Set (MIS) problems defined on specific graph instances.
Among the hybrid variational algorithms, the Quantum Approximate Optimization Algorithm (QAOA) [7] is extensively studied [8] as a promising algorithm to investigate quantum speedups on NISQ devices and has been implemented on several experimental platforms, such as Rydberg atom arrays [9], superconducting processors [10], trapped-ions simulators [11], as well as simulated on classical devices [12].
Similarly to other hybrid variational algorithms, QAOA consists in a sequence of parametrized quantum gates applied to a wavefunction, on which an expectation value of some operator, typically the Hamiltonian, is reconstructed from measurements.The task of the classical subroutine is to optimize the gate parameters in order to minimize such expectation value.Every variational quantum algorithm therefore requires the estimation of the wavefunction, which is a suboptimal problem that requires resources that grow exponentially with the system size [13].Moreover, the interplay between the classical and quantum parts of the algorithm entails to run the quantum circuit a large number of times, thus being expensive in term of resources.Finally, there is the notorious problem of Barren Plateaus (BP) which are large portions of the optimization landscape in which the gradient becomes exponentially small with the number of qubits and layers [14].This phenomenon was proven to be caused also by the presence of noise [15] or by the use of a cost function depending on global observables [16].
To overcome these issues, in fact, an efficient classical optimization routine is crucial.Different techniques have been proposed for optimizing variational quantum circuits, e.g.Nelder-Mead [17], Machine Learning [18], gradient descent [19], iterative schemes [8,20,21], Gaussian Processes [22,23] and Bayesian methods [15,[24][25][26][27].In particular, to tackle the problem of BPs it might seem logical to avoid the calculation of the gradient.However, in [28] it was shown that gradient-free optimizers such as COBYLA, Powell and Nelder-Mead suffer from BPs too.Here, we focus on a Bayesian optimization framework, which is suitable for gradient-free global optimization of black-box functions [29,30].We explore its behaviour in comparison with other global optimizers and we show that the convergence rate to a minimum is faster.We demonstrate that the Bayesian approach is efficient in terms of number of circuit runs and is robust against noise sources.
The paper is structured as follows.In Section II we introduce the QAOA algorithm.In Section III we give a detailed presentation of the Bayesian algorithm.In Section IV we present the result of applying this method to QAOA, compare it to other global optimization methods, evaluate its performance with a limited number of circuit runs and against simulated quantum noise.Finally, in Section V we draw our conclusions.

II. QAOA FOR COMBINATORIAL PROBLEMS
The QAOA is a variational quantum algorithm that performs hybrid quantum-classical optimization [7].Given a cost function C(z) with z = (z 1 , . . ., z i , . . ., z N ) with z i ∈ {0, 1}, QAOA aims at finding the bitstring z ⋆ that minimizes the cost.In order to do so, the cost function is translated into a quantum operator H C that is diagonal in the computational basis |z⟩ = |z 1 . . .z i . . .z N ⟩ for N qubits i.e.
The QAOA circuit consists in preparing an initial state of N qubits, usually |+⟩ = ∑ z |z⟩/ √ 2 N , and then applying two unitary operators alternatively: one generated by H C , the other generated by H M = ∑ i X i , where X i is the flip (NOT) operator on the i-th qubit for a number of layers p which is called the depth of the circuit.In this way the QAOA circuit prepares the state where θ θ θ = (γ γ γ, β β β ) are 2p parameters.By measuring the state |θ θ θ ⟩ in the computational basis an estimate of the energy E(θ θ θ ) = ⟨θ θ θ |H C |θ θ θ ⟩ is obtained.This energy is fed to a classical routine which looks for the set of angles θ θ θ ⋆ = (γ γ γ ⋆ , β β β ⋆ ) that minimizes E(θ θ θ ).Several strategies have been proposed for finding the optimal parameters θ θ θ ⋆ .In this work we rely on Bayesian optimization.

III. BAYESIAN OPTIMIZATION
Bayesian optimization is a global optimization strategy which allows to find within relatively few evaluations the minimum of a noisy, black-box objective function f (θ θ θ ) that is in general expensive to evaluate [31].The algorithm can be summarized as follows: (i) it treats the objective function f as a random function by choosing a prior (also called surrogate model) for f .Several choices for the surrogate model are possible [29], in this work we adopt the so-called Gaussian process [32].(ii) The prior is then updated through the likelihood function by gathering observations of f and therefore forming the posterior distribution.(iii) The posterior distribution is finally used to construct an auxiliary function, called acquisition function, that is in general cheap to evaluate.The point where the acquisition function is maximized gives the next point where f will be evaluated [30].See Appendix A 1 for an overview of Bayesian terminology.
Since Bayesian optimization requires no previous knowledge on f , it appears to be a well-suited technique for optimizing the parameters of a variational circuit running on NISQ devices.
In the following sections we describe the Gaussian process, the optimization routine and the acquisition function in detail.

A. Gaussian process
Since the function f (θ θ θ ) (θ θ θ ∈ A ⊂ R d ) to be optimized is unknown, we may think of it as belonging to a random process, i.e. an infinite collection of random variables defined for every point θ θ θ ∈ A. A random process is called Gaussian if the joint distribution of any finite collection of those random variables is a multivariate normal distribution defined by a mean function µ(θ θ θ ) and covariance (or kernel) function k(θ θ θ , θ θ θ ′ ) [32].The mean function is related to the expected value of the function f , while the kernel estimates the deviations of the mean function from the value of f : where E denotes the expectation w.r.t. the infinite collection of functions belonging to the random process.Conceptually, the mean encloses the knowledge of the function f to reconstruct while k represents the uncertainty we have on such reconstruction.
Since we assume f to be smooth, we choose for k the Matérn kernel, a stationary kernel [32] that depends on the distance between the points θ θ θ and θ θ θ ′ , defined as where ∥•∥ 2 is the 2-norm and σ 2 and ℓ are two hyperparameters characterizing the Gaussian process.The hyperparameter σ 2 defines the variance of the random variables whereas ℓ is a characteristic length-scale which regulates the decay of the correlation between points: in the limit of ℓ → ∞ all points are equally correlated, for ℓ → 0 all points are uncorrelated.

B. Bayesian optimization algorithm
The main steps of the algorithm for Bayesian optimization can be summarized in the pseudocode in Algorithm 1 (see also Appendix A 2 for details).
The optimization starts by a warmup phase where a number N W of evaluations of the objective function f is performed.These evaluations take place at randomly chosen values of the points θ θ θ i and are collected in the training set i=1 of the optimization.Given the set D, we define the design matrix Θ Θ Θ = (θ θ θ 1 , . . ., θ θ θ N W ) with the points and the vector y y y ∈ R N W with the observations via y y y = (y 1 , . . ., y N W ).
We form the covariance matrix K K K ∈ R N W ×N W by evaluating the covariance function in Eq. ( 3) for each pair of points θ θ θ i , θ θ θ j ∈ Θ Θ Θ via where K K K i, j denotes the (i, j) element of the matrix K K K.The hyperparameters entering the kernel function (Eq.( 4)) are optimized at this step, as explained in Sec.III D.
The training set will be used at each step of the optimization to incorporate the acquired knowledge in the Gaussian process.This happens in two steps.First, the Gaussian process prior is conditioned on the observations in D [32].Conditioning is equivalent to a Bayesian step in which we multiply the prior with the likelihood, thus obtaining a posterior distribution (see Appendix A 1). Thanks to the properties of Gaussian distributions, the posterior is still described by a Gaussian process multinomial distribution but it is characterized by a posterior mean µ ′ and covariance k ′ given by Here, θ θ θ is a generic point in A and κ κ κ is a column vector formed by evaluating the covariance function k between the generic point θ θ θ and all the points in Θ Θ Θ, i.e. its j-th element is κ j = k(θ θ θ , θ θ θ j ).Eq. ( 6) shows that the new mean is a linear combination of the observations y y y.

C. Acquisition Function
The next step in the Bayesian optimization is computing the acquisition function, whose maximum gives the next point at which to evaluate the objective function.A common choice of acquisition function is the Expected Improvement (EI): this function suggests which points, on average, improve on f m the most [30].This choice corresponds to defining the acquisition function EI(θ θ θ ) = E[u(θ θ θ )] as the average value of the utility function u(θ θ θ ) = max[0, f m − f (θ θ θ )] such that the lower f (θ θ θ ) is with respect to the current minimum, the larger the utility u(θ θ θ ) will be.
By considering that f (θ θ θ ) is a Gaussian process, we can obtain an analytical expression for EI(θ θ θ ) as where µ ′ and k ′ are obtained for the point θ θ θ by using Eqs.( 6) and (7); Φ(•) and φ (•) are respectively the cumulative distribution function and the probability density function of the standard normal distribution and the quantity z is defined as z = ( f m − µ ′ )/k ′ .The two terms in Eq. ( 8) resume the tradeoff between exploitation and exploration: the first term, being proportional to the difference between the current minimum and the mean value of the posterior, brings the optimization towards points with lower µ ′ whereas the second one promotes points with larger k ′ , i.e. with higher uncertainty.The point θ θ θ that maximizes the acquisition function is then added to the training set D and the algorithm's loop is repeated (as written in Algorithm 1).Its value is found by using the differential evolution algorithm [33], a population-based metaheuristic search algorithm (see Appendix C for details).

D. Hyperparameters
We are now only left with the task of picking the best hyperparameters σ , ℓ for the Matérn kernel.This is typically done by considering the marginal likelihood [32] (and Appendix A 1) where the prior p( f f f |Θ Θ Θ) and the likelihood p(y y y| f f f , Θ Θ Θ) are Gaussian and the marginalization is done over the function values f f f .Given the Gaussian nature of the likelihood and the prior, a closed form of the log marginal likelihood can be obtained (for the standard derivation of this formula see for example [32]): where N is the number of observations in the design matrix Θ Θ Θ.
In Eq. ( 10), the first term specifies how well the process fits the data, the second term instead acts as a regularization factor on the elements of the kernel matrix.When fitting the Gaussian process to a new set of points, the best hyperparameters ( σ 2 , l) can be found by maximizing the log marginal likelihood in Eq. (10).For the optimization of log p(y y y|Θ Θ Θ), we use the quasi-Newton method L-BFGS [34] with multiple restarting points which proved to be efficient on the flat landscape of the likelihood (see Appendix A 2 for details).

IV. RESULTS
In this section we apply the Bayesian optimization to the QAOA parameters.We consider two well-known combinatorial problems defined on graphs, the Max-Cut and the Max Independent Set.
Max-Cut -Given a graph G = (V, E) where V is the set of nodes and E the set of edges, the Max-Cut problem consists in finding a partition of the graph's vertices V , P = {V 0 ,V 1 }, such that the number of edges between V 0 and its complement V 1 is as large as possible.It is known to be a NP-hard problem [35].We can define the assignment of the nodes to the sets V 0 and V 1 by labelling with label "0" the nodes v ∈ V 0 and with label "1" the nodes v ∈ V 1 .In these terms, the Max-Cut consists in finding the largest number of edges connecting the bits labelled with "0" to the bits labelled with "1".On a quantum computer, the labels 0 and 1 are replaced by the computational basis states |0⟩ and |1⟩ and the cost Hamiltonian can be written as: The cost function has minimum eigenvalue on the states |z i z j ⟩ for which z i ̸ = z j which represent separate partitions.MIS -The Max Independent Set problem consists in finding the largest number of graph nodes which are not adjacent and the corresponding cost quantum Hamiltonian is: with ω a parameter that balances the effect of the first term (which maximizes the number of qubits in |1⟩) and the second one (which prevents neighbour bits to be equal).We set ω = 2 for the rest of the work.
During the optimization process we monitor the approximation ratio R = E(θ θ θ )/E GS [7] where E GS is the energy of the  QAOA at low vs. large depth -We start by looking at the QAOA at depth p = 1 for the 6 nodes graph.It corresponds to a shallow circuit that depends only on two parameters θ θ θ 1 1 1 = (γ 1 , β 1 ).We consider the MIS problem, and we plot the landscapes of both the energy E(θ θ θ 1 ) and the fidelity F(θ θ θ 1 ) (Fig. 2(a) and (b), respectively) for values of γ 1 , β 1 ∈ [0, π] due to the symmetry of the problem.We see that the landscape of the energy, which is the function to minimize, is rather flat with two global maxima and minima, corresponding to the best solutions possible at p = 1.
Interestingly, we find that the QAOA state |θ θ θ E E E ⟩ = ∑ z α z,E |z⟩, corresponding to the parameters which minimize the energy, is not the state |θ θ θ F F F ⟩ = ∑ z α z,F |z⟩ with largest fidelity.To see how they differ we plot the squared amplitudes |α z,E | 2 and |α z,F | 2 of both states in Fig. 2(c) as histograms.The fidelity of |θ θ θ F F F ⟩ w.r.t. the solution |z ⋆ ⟩ is, as expected, much larger than that one of |θ θ θ E E E ⟩, yet the latter has a lower energy because it has many non zero amplitudes along excited states with low energy.This unravels the problem of optimizing the QAOA parameters by only looking at the energy E(θ θ θ ).There is a large concentration of excited states with energy comparable to the energy of the ground state, as shown in the histograms of panels (a) and (b) of Fig. 1.It is difficult to increase the amplitude corresponding to the solution when many other states can contribute with low values of the energy.
The divergence between lowest energy and highest fidelity is guaranteed to disappear theoretically for p → ∞.For this reason, we apply Bayesian optimization to the problem and we show in Fig. 3 that the approximation ratio and fidelity both tend to 1 for p ∼ 12. Yet, we already see a good performance at p = 4 where R ∼ 0.7 and we have F ∼ 0.5 meaning about a 50% chance of measuring the solution on the state obtained with QAOA.
Comparing resources -Increasing the depth of a variational circuit increases the number of parameters that must be optimized.In turn, this is expected to increase the number of calls to the quantum circuit needed to reach a good approximate solution, which is a problem in the current NISQ era, since running a quantum circuit can be costly due to both state preparation routines and recalibrations of the device.
Bayesian optimization mitigates such problem, and allows to achieve a good approximate solution within relatively fewer calls to the circuit compared to other global optimization methods.To show this, we ran differential evolution, basinhopping and dual annealing (see Appendix C and D for details) on the 10 nodes graph of Fig. 1(b) for the Max-Cut problem at different depths.Fig. 4(a) shows the average number of calls to the circuit that the other optimizers need in order to reach the same approximation ratio of Bayesian optimization.To gain more insight, we plot in panels (b) and (c) of Fig. 4 the average approximation ratio and fidelity during the run of the algorithm for each method at p = 7.We see that Bayesian optimization stops at a lower R than basin-hopping and dual annealing, but it reaches a value of R = 95% with ∼ 500 runs of the circuit compared to the other two methods which take, in order, 1400 and 10800.
For the noiseless circuit, (it) is very clear (see Figure 4) that the Bayesian approach can mitigate this problem better than any other tested techniques, from different points of view (such as number of calls, number of steps, number of measurements).Let us stress that this approach gives a fidelity that is always higher that the other methods at a fixed number of steps (and fixed p, see Figure 4c).
Slow Measurements -The energy E(θ θ θ ) is obtained by measuring the QAOA state after running the circuit: we refer to these two operations combined together as a "shot".By measuring on the Z basis at each shot we get a bitstring, and we calculate its classical energy associated to the combinatorial problem.The precision in the reconstruction of E(θ θ θ ) depends on the number of shots N S .Since we consider this as a multinomial sampling problem we expect the variance of the reconstructed energy to depend on N −1 S .In many scenarios of NISQ devices it is necessary to balance N S with the desired standard deviation.For this reason, we compare the average approximation ratio obtained with the exact energy (simulated) with the energy reconstructed with a limited number of shots.We show in Fig. 5 such comparison with a number of shots N S equal to 1024, 128, 64, 16, 4. Looking at the approximation ratio R (Fig. 5(a)) we see that taking N S = 128 shots reduces R by 5% w.r.t.N S = 1024 and going to N S = 64 reduces it by a further 5%.This behaviour then stops and even reverses its trend.In fact, we even see an average increase going from N S = 16 to 4. This is understandable since the reconstruction of the energy with as little as 4 shots is not indicative of the real energy of the state.Specifically, from a final QAOA state we might sample the solution bitstring 2, 3 or even 4 times out of 4 and the expectation of the energy on these three samplings would be very different.This behaviour is indeed confirmed also by the fidelity in Fig. 5(a) which follows the same trend as the approximation ratio.
To have a better understanding of how the algorithm adapts to the sampling noise, we look at the kernel noise parameter σ 2 N which is learned by the Gaussian process during the fitting at each step of the optimization (see Appendix B for details on the noise hyperparameter).The plot in Fig. 5(b) shows that, after an initial phase, the kernel noise sets at a specific value at around 400 steps.In addition to that, the lower the number of shots the larger the noise parameter learned.In fact, by fitting the average kernel noise found at the end of the training (Fig. 5(c)) we obtain that σ 2 N follows a power law with N −1.1

S
. This trend is comparable to the expected trend for the variance N −1 S of the reconstruction of the energy.This shows that the Gaussian process adapts to sampling noise.
Quantum noise -Another relevant issue in the state-of-theart NISQ devices are the sources of quantum noise which can interfere with the quantum circuit.Every device has different sources of noise depending on the underlying technology.In order to simulate it without specifying the device technology we add random noise on every QAOA parameter.In this way Eq.( 1) for the Max-Cut problem is modified as: where β i l and γ (i, j) l act differently on every qubit/edge of the graph at every layer because they are affected by Gaussian random noise with mean zero and standard deviation σ QN .In Fig. 6 we plot R and F as a function of depth at different values of σ QN on the 10 nodes graph for the Max-Cut problem.
By increasing σ QN and p, we expect to obtain a worse approximation ratio R because the error accumulates along the circuit as the number of parameters grows.Indeed, as we can see in the figure, from p ≥ 5 the obtained R decreases w.r.t. the noiseless case, decreasing even by 20% for p = 9 with σ QN = 0.1.Considering σ QN = 0.001, 0.01, both R and F grow/remain stable up to p = 7 which indicates that, for shallow circuits, Bayesian optimization is robust against noise.
We care to stress that the case σ QN = 0.1 was considered in order to show the effect of an exponential growth of machine noise.Realistically, a Gaussian white noise with variance 0.1 affecting each of the parameters (in range [0, π], see Appendix A 2), would completely destroy the state preparation.In fact at p = 9 the fidelity F is the same as p = 1 (Fig. 6  (a)).
We compare the results of our algorithm with the second best performing algorithm of Fig. 4, basin-hopping.It is clear that when subjected to noise this algorithm performs poorly.Considering the approximation ratio, basin-hopping shows barely any improvement with respect to the depth of the circuit with the results starting to plummet from p = 3.Most importantly, the fidelity peaks at p = 3 with F ≃ 0.15 (Fig. 6 (b)) and then remains contained under this value.The poor performance in terms of fidelity confirms that basin-hopping, while being an effective algorithm in the noise-free scenario, is not apt for optimization in the presence of noise.This is probably due to the fact that basin-hopping is a global optimizer that exploits a local gradient-based optimization rou- tine (see Appendix D).Calculating gradient in the presence of noise is in fact non-optimal since even a small variation of the parameters can impact greatly the evaluation of the function, returning a gradient that does not represent the local structure of the landscape.

V. CONCLUSIONS
In this work we have presented the Bayesian optimization algorithm as a subroutine to optimize the variational parameters of the QAOA.We have applied it to find the solutions of two combinatorial optimization problems, the Max-Cut and the MIS on two graph instances.
After introducing the QAOA and the details of the Bayesian optimization algorithm, we have focused on its capability to adapt to the data and to predict new possible optimal points by exploiting both the accumulated knowledge from the previous observations and the uncertainty with respect to the optimization landscape.
We have analyzed some details of the QAOA at low circuit depth with the purpose of presenting some of the issues related to the optimization of a variational quantum algorithm.These include the flatness of the energy landscape and the limited information that we can retain from the energy of the QAOA state compared to its overlap with the ground state.After that, we have compared the Bayesian optimization with other global optimization methods, and we have showed that they require more calls, in the order of tens or hundreds, to the quantum circuit with respect to the Bayesian optimizer.This is a first sign that this method responds more efficiently to the requirements from the quantum part of the QAOA.
Secondly, we have analyzed the effects of a finite number of measurements for the reconstruction of the energy landscape.We have showed that the results are slightly altered by a 5% decrease in the approximation ratio by using 1024 measurements compared to the optimization with the exact energy.A lower number of measurements will results in a decreasing approximation ratio.We have also showed that the Gaussian process learns to add a noise hyperparameter which is proportional to the variance expected from the reconstruction of the energy.This can be seen as a further example of adaptation of the Bayesian algorithm to the data.
Finally, we have simulated a noisy algorithm and we have showed that for shallow circuits, with depth p ∈ [1, 3, 5, 7], approximation ratio and fidelity are improved even for reasonable values of the noise.For deeper circuits, up to p ≥ 9, the intensity of noise sensibly affects the final approximation ratio, as expected.Eventually, we compared it to basin-hopping, which uses a local gradient-based optimizer, and showed that it performs very poorly, suggesting that a gradient-free optimizer is indeed the better choice.
These findings show that Bayesian optimization is a robust method which can account for both quantum and sampling noise.For this reason it represents a valid tool for solving optimization problems via hybrid algorithms to be run on a NISQ device.

Algorithm 1 :
Pseudo-code for Bayesian optimizationSet the prior on f as a Gaussian process; Evaluate f at N W different points θ θ θ i ; Define the initial training set D

9 FIG. 1 .
FIG. 1. Energy distributions of graphs.(a) MIS: Distribution of the energies of the possible bitstrings for the graph of 6 nodes (shown in the inset, nodes in red correspond to one solution).The red bar to the left highlights the two solution bitstrings of the MIS problem on such graph.(b) Max-Cut: Distribution of the energies of the possible bitstrings for the graph of 10 nodes (shown in the inset, nodes in red correspond to one solution).The red bar highlights the two solution bitstrings of the Max-Cut problem on such graph.

5 FIG. 2 .
FIG. 2. QAOA at p p p = = = 1 1 1.(a) Landscape of the energy E(θ θ θ 1 ) obtained on the 6 nodes graph solving the MIS problem.The red cross indicates the angles corresponding to the final state |θ θ θ F ⟩ with largest fidelity.(b) Landscape of the fidelity F(θ θ θ 1 ).(c) Squared amplitudes of the two states |θ θ θ E ⟩, |θ θ θ F ⟩.The solution bitstrings are highlighted in red.Values of the qubits are given in the order shown in the inset of Fig. 1(a).

FIG. 3 .
FIG. 3. Results increasing depth.Average approximation ratio (plotted as 1 − R) and fidelity F for increasing values of circuit depth from 1 to 12 over 50 runs.Shaded areas correspond to one standard deviation.Results were obtained on the 6 nodes graph of Fig. 1(a).

FIG. 4 .
FIG. 4. Comparison among optimizers.(a) The plot shows the average number of calls N C to the quantum circuit of each optimizer in order to obtain the same approximation ratio as the Bayesian optimization.(b) -(c) Average approximation ratio (plotted as 1 − R) and fidelity during the optimization with the different methods at p = 7 over 30 runs.Shaded areas correspond to one standard deviation.Results are obtained on the 10 nodes graph of Fig. 1(b).

FIG. 5 . 2 N
FIG. 5. Slow measurements.(a) Average approximation ratio (plotted as 1 − R) and fidelity (F) as a function of the number of shots N S .The 1/N S = 0 points indicates the exact evaluation of the energy E(θ θ θ ).Shaded areas correspond to one standard deviation.(b) Kernel noise σ 2 N learned by fitting the data at each step of the optimization for different number of shots N S .(c) Average kernel noise learned by the Gaussian process as a function of N S (blue circles).The plot also shows a linear fit (∼ 1/N 1.1 , orange line) of the logarithm of the data.

FIG. 6 .
FIG. 6. Approximation ratio R for different values of the quantum noise σ QN .The noise is simulated by adding random Gaussian noise with mean zero and standard deviation σ QN to the variational parameters (γ γ γ, β β β ).The plot on the left (right) shows the effects of the noise σ QN on the final obtained approximation ratio R (fidelity F) as a function of the QAOA depth p for different σ QN .Shaded areas correspond to one standard deviation.The results are obtained on the graph with 10 nodes in Fig. 1(b).

FIG. 8 .
FIG. 8. Parameters of Bayesian optimization at p = 2, 7. Plots of the parameters changing during Bayesian optimization for two runs with N BAY ES = 600 steps.(a) Approximation ratio R. (b) Fidelity F. (c) Kernel constant σ 2 .(d) Kernel correlation length ℓ.(e) Standard deviation of the expected improvement and (f) average distance of the points of differential evolution at the last generation N T .