Variational Learning for Quantum Artificial Neural Networks

In the past few years, quantum computing and machine learning fostered rapid developments in their respective areas of application, introducing new perspectives on how information processing systems can be realized and programmed. The rapidly growing field of quantum machine learning aims at bringing together these two ongoing revolutions. Here, we first review a series of recent works describing the implementation of artificial neurons and feedforward neural networks on quantum processors. We then present an original realization of efficient individual quantum nodes based on variational unsampling protocols. We investigate different learning strategies involving global and local layerwise cost functions, and we assess their performances also in the presence of statistical measurement noise. While keeping full compatibility with the overall memory-efficient feedforward architecture, our constructions effectively reduce the quantum circuit depth required to determine the activation probability of single neurons upon input of the relevant data-encoding quantum states. This suggests a viable approach toward the use of quantum neural networks for pattern classification on near-term quantum hardware.


I. INTRODUCTION
In classical machine learning, artificial neurons and neural networks were originally proposed, more than a half century ago, as trainable algorithms for classification and pattern recognition [1], [2].A few milestone results obtained in subsequent years, such as the backpropagation algorithm [3] and the universal approximation theorem [4], [5], certified the potential of deep feedforward neural networks as a computational model, which, nowadays, constitutes the cornerstone of many artificial intelligence protocols [6], [7].
In recent years, several attempts have been made to link these powerful but computationally intensive applications to the rapidly growing field of quantum computing; see also [8] for a useful review.The latter holds the promise to achieve relevant advantages with respect to classical machines already in the near term, at least on selected tasks including, e.g., chemistry calculations [9], [10], classification, and optimization problems [11].Among the most relevant results obtained in quantum machine learning, it is worth mentioning the use of trainable parameterized digital and continuousvariable quantum circuits as a model for quantum neural networks [12]- [21], the realization of quantum support vector machines [22] working in quantum-enhanced feature spaces [23], [24], and the introduction of quantum versions of artificial neuron models [25]- [32].However, it is true that very few clear statements have been made concerning the concrete and quantitative achievement of quantum advantage in machine learning applications, and many challenges still need to be addressed [8], [33], [34].
In this article, we review a recently proposed quantum algorithm implementing the activity of binary-valued artificial neurons for classification purposes.Although formally exact, this algorithm in general requires quite large circuit depth for the analysis of the input classical data.To mitigate for this effect, we introduce a variational learning procedure, based on quantum unsampling techniques, aimed at critically reducing the quantum resources required for its realization.By combining memory-efficient encoding schemes and lowdepth quantum circuits for the manipulation and analysis of quantum states, the proposed methods, currently at an early stage of investigation, suggest a practical route toward problem-specific instances of quantum computational advantage in machine learning applications.

II. MODEL OF QUANTUM ARTIFICIAL NEURONS
The simplest formalization of an artificial neuron can be given following the classical model proposed by McCulloch and Pitts [1].In this scheme, a single node receives a set of binary inputs {i 0 , . . ., i m−1 } ∈ {−1, 1} m , which can either be signals from other neurons in the network or external data.The computational operation carried out by the artificial neuron consists in first weighting each input by a synapse coefficient w j ∈ {−1, 1} and then providing a binary output O ∈ {−1, 1} denoting either an active or rest state of the node determined by an integrate-and-fire response where θ represents some predefined threshold.
A quantum procedure closely mimicking the functionality of a binary-valued McCulloch-Pitts artificial neuron can be designed by exploiting, on one hand, the superposition of computational basis states in quantum registers and, on the other hand, the natural nonlinear activation behavior provided by quantum measurements.In this section, we will briefly outline a device-independent algorithmic procedure [28] designed to implement such a computational model on a gate-based quantum processor.More explicitly, we show how classical input and weight vectors of size m can be encoded on a quantum hardware by using only N = log 2 m qubits [28], [35], [36].For loading and manipulation of data, we describe a protocol based on the generation of quantum hypergraph states [37].This exact approach to artificial neuron operations will be used in the main body of this article as a benchmark to assess the performances of approximate variational techniques designed to achieve more favorable scaling properties in the number of logical operations with respect to classical counterparts.
Let i and w be binary input and weight vectors of the form with i j , w j ∈ {−1, 1} and m = 2 N .A simple and qubiteffective way of encoding such collections of classical data can be given by making use of the relative quantum phases (i.e., factors ±1 in our binary case) in equally weighted superpositions of computational basis states.We then define the states where, as usual, we label computational basis states with integers j ∈ {0, . . ., m − 1} corresponding to the decimal representation of the respective binary string.The set of all possible states, which can be expressed in the form above, is known as the class of hypergraph states [37].
According to (1), the quantum algorithm must first perform the inner product i • w.It is not difficult to see that, under the encoding scheme of (3), the inner product between inputs and weights is contained in the overlap [28] We can explicitly compute such overlap on a quantum register through a sequence of i-and w-controlled unitary operations.First, assuming that we operate on an N-qubit quantum register starting in the blank state |0 ⊗N , we can load the input-encoding quantum state |ψ i by performing a unitary transformation U i such that It is important to mention that this preparation step would most effectively be replaced by, e.g., a direct call to a quantum memory [38] or with the supply of data encoding states readily generated in quantum form by quantum sensing devices to be analyzed or classified.It is indeed well-known that the interface between classical data and their representation on quantum registers currently constitutes one of the major bottlenecks for quantum machine learning applications [8].
Let now U w be a unitary operator such that In principle, any m × m unitary matrix having the elements of w appearing in the last row satisfies this condition.If we apply U w after U i , the overall N-qubit quantum state becomes Using ( 6), we then have Engineering uantum

Transactions on IEEE
We see that, as a consequence of the constraints imposed to U i and U w , the desired result i • w ∝ ψ w |ψ i is contained up to a normalization factor in the coefficient c m−1 of the final state |φ i,w .
The final step of the algorithm must access the computed input-weight scalar product and determine the activation state of the artificial neuron.In view of constructing a general architecture for feedforward neural networks [30], it is useful to introduce an ancilla qubit a, initially set in the state |0 , on which the c m−1 ∝ ψ w |ψ i coefficient can be written through a multicontrolled not gate, where the role of controls is assigned to the N encoding qubits [28]: At this stage, a measurement of qubit a in the computational basis provides a probabilistic nonlinear threshold activation behavior, producing the output |1 a state, interpreted as an active state of the neuron, with probability |c m−1 | 2 .Although this form of the activation function is already sufficient to carry out elementary classification tasks and to realize a logical xor operation [28], more complex threshold behaviors can, in principle, be engineered once the information about the inner product is stored on the ancilla [27], [29].Equivalently, the ancilla can be used, via quantum controlled operations, to pass on the information to other quantum registers encoding successive layers in a feedforward network architecture [30].It is worth noticing that directing all the relevant information into the state of a single qubit, besides enabling effective quantum synapses, can be advantageous when implementing the procedure on real hardware on which readout errors constitute a major source of inaccuracy.Nevertheless, multicontrolled not operations, which are inherently nonlocal, can lead to complex decompositions into hardwarenative gates especially in the presence of constraints in qubitqubit connectivity.When operating a single node to carry out simple classification tasks or, as we will do in the following sections, to assess the performances of individual portions of the proposed algorithm, the activation probability of the artificial neuron can then equivalently be extracted directly from the register of N encoding qubits by performing a measurement of |φ i,w targeting the |m − 1 ≡ |1 ⊗N computational basis state.

A. EXACT IMPLEMENTATION WITH QUANTUM HYPERGRAPH STATES
A general and exact realization of the unitary transformations U i and U w can be designed by using the generation algorithm for quantum hypergraph states [28].The latter have been extensively studied as useful quantum resources [37], [39] and are formally defined as follows.Given a collection of N vertices V , we call a k-hyperedge any subset of exactly k vertices.A hypergraph g ≤N = {V, E} is then composed of a set V of vertices together with a set E of hyperedges of any order k, not necessarily uniform.Notice that this definition includes the usual notion of a mathematical graph if k = 2 for all (hyper)edges.To any hypergraph g ≤N we associate a N-qubit quantum hypergraph state via the definition where q v 1 , . . ., q v k are the qubits connected by a k-hyperedge in E and, with a little abuse of notation, we assume C 2 Z ≡ CZ and C 1 Z ≡ Z = R z (π ).For N qubits, there are exactly N = 2 2 N −1 different hypergraph states.We can make use of well-known preparation strategies for hypergraph states to realize the unitaries U i and U w with at most a single N-controlled C N Z and a collection of p-controlled C p Z gates with p < N. It is worth pointing out already here that such an approach, while optimizing the number of multiqubit logic gates to be employed, implies a circuit depth that scales linearly in the size of the classical input, i.e., O(m) ≡ O(2 N ), in the worst case corresponding to a fully connected hypergraph [28].
To describe a possible implementation of U i , assume once again that the quantum register of N encoding qubits is initially in the blank state |0 ⊗N .By applying parallel Hadamard gates (H ⊗N ), we obtain the state |+ ⊗N , corresponding to a hypergraph with no edges.We can then use the target collection of classical inputs i as a control for the following iterative procedure: for P = 1 to N do for j = 0 to m − 1 do if (| j has exactly P qubits in |1 and i j = −1) then Apply C P Z to those qubits Flip the sign of i k in i ∀k such that |k has the same P qubits in |1 end if end for end for Similarly, U w can be obtained by first performing the routine outlined above (without the initial parallel Hadamard gates) tailored according to the classical control w: since all the gates involved in the construction are the inverse of themselves and commute with each other, this step produces a unitary transformation bringing |ψ w back to |+ ⊗N .The desired transformation U w is completed by adding parallel H ⊗N and not ⊗N gates [28].

III. VARIATIONAL REALIZATION OF A QUANTUM ARTIFICIAL NEURON
Although the implementation of the unitary transformations U i and U w outlined above is formally exact and optimizes the number of multiqubit operations to be performed by leveraging on the correlations between the ±1 phase factors, the overall requirements in terms of circuit depth pose, in

Transactions on IEEE
Tacchino et al.: VARIATIONAL LEARNING FOR QUANTUM ARTIFICIAL NEURAL NETWORKS general, severe limitations to their applicability in non-errorcorrected quantum devices.Moreover, although with such an approach the encoding and manipulation of classical data is performed in an efficient way with respect to memory resources, the computational cost needed to control the execution of the unitary transformations and to actually perform the sequences of quantum logic gates remains bounded by the corresponding classical limits.Therefore, the aim of this section is to explore conditions, under which some of the operations introduced in our quantum model of artificial neurons can be obtained in more efficient ways by exploiting the natural capabilities of quantum processors.
In the following, we will mostly concentrate on the task of realizing approximate versions of the weight unitary U w with significantly lower implementation requirements in terms of circuit depth.Although most of the techniques that we will introduce below could, in principle, work equally well for the preparation of encoding states |ψ i , it is important to stress already at this stage that such approaches cannot be interpreted as a way of solving the long-standing issue represented by the loading of classical data into a quantum register.Instead, they are pursued here as an efficient way of analyzing classical or quantum data presented in the form of a quantum state.Indeed, the variational approach proposed here requires ad hoc training for every choice of the target vector w, whose U w needs to be realized.To this purpose, we require access to many copies of the desired |ψ w state, essentially representing a quantum training set for our artificial neuron.As, in our formulation, a single node characterized by weight connections w can be used as an elementary classifier recognizing input data sufficiently close to w itself [28], the variational procedure presented here essentially serves the double purpose of training the classifier upon input of positive examples |ψ w and of finding an efficient quantum realization of such a state analyzer.

A. GLOBAL VARIATIONAL TRAINING
According to (6), the purpose of the transformation U w within the quantum artificial neuron implementation is essentially to reverse the preparation of a nontrivial quantum state |ψ w back to the relatively simple product state |1 ⊗N .Notice that, in general, the qubits in the state |ψ w share multipartite entanglement [39].Here, we discuss a promising strategy for the efficient approximation of the desired transformation satisfying the necessary constraints based on variational techniques.Inspired by the well-known variational quantum eigensolver (VQE) algorithm [40], and in line with a recently introduced unsampling protocol [41], we define the following optimization problem: given access to independent copies of |ψ w and to a variational quantum circuit, characterized by a unitary operation V ( θ ) and parameterized by a set of angles θ, we wish to find a set of values θ opt that guarantees a good approximation of U w .The heuristic circuit implementation typically consists of sequential blocks of single-qubit rotations followed by entangling gates, repeated up to a certain number that guarantees enough freedom for the convergence to the desired unitary [9].
Once the solution V ( θ opt ) is found, which in our setup corresponds to a fully trained artificial neuron, it would then provide a form of quantum advantage in the analysis of arbitrary input states |ψ i as long as the circuit depth for the implementation of the variational ansatz is sublinear in the dimension of the classical data, i.e., subexponential in the size of the qubit register.As it is customarily done in near-term VQE applications, the optimization landscape is explored by combining executions of quantum circuits with classical feedback mechanisms for the update of the θ angles.In the most general scenario, and according to (6), a cost function can be defined as The solution θ opt is then represented by and leads to V ( θ opt ) U w .We call this approach a global variational unsampling as the cost function in (11) requires all qubits to be simultaneously found as close as possible to their respective target state |1 , without making explicit use of the product structure of the desired output state |1 ⊗N .It is indeed well-known that VQE can lead, in general, to exponentially difficult optimization problems [14]; however, the characteristic feature of the problem under evaluation may actually allow for a less complex implementation of the VQE for unsampling purposes [41], as outlined in the following section.A schematic representation of the global variational training is provided in Fig. 1(a).

B. LOCAL VARIATIONAL TRAINING
An alternative approach to the global unsampling task, particularly suited for the case we are considering in which the desired final state of the quantum register is fully unentangled, makes use of a local qubit-by-qubit procedure.This technique, which has been recently proposed and tested on a photonic platform as a route toward efficient certification of quantum processors [41], is highlighted here as an additional useful tool within a general quantum machine learning setting.
In the local variational unsampling scheme, the global transformation V ( θ ) is divided into successive layers V j ( θ j ) of decreasing complexity and size.Each layer is trained separately, in a serial fashion, according to a cost function, which only involves the fidelity of a single qubit to its desired final state.More explicitly, every V j ( θ j ) operates on qubits j, . . ., N and has an associated cost function Transactions on IEEE where the partial trace leaves only the degrees of freedom associated with the jth qubit and, recursively, we define At step j, it is implicitly assumed that all the parameters θ k for k = 1, . . ., j − 1 are fixed to the optimal values obtained by the minimization of the cost functions in the previous steps.Notice that, operationally, the evaluation of the cost function F j can be automatically carried out by measuring the jth qubit in the computational basis while ignoring the rest of the quantum register, as shown in Fig. 1(b).
The benefits of local variational unsampling with respect to the global strategy are mainly associated with the reduced complexity of the optimization landscape per step.Indeed, the local version always operates on the overlap between single-qubit states, at the relatively modest cost of adding N − 1 smaller and smaller variational ansatzes.In the specific problem at study, we, thus, envision the local approach to become particularly effective, and more advantageous than the global one, in the limit of large enough number of qubits, i.e., for the most interesting regime, where the size of the quantum register, and therefore of the quantum computation, exceeds the current classical simulation capabilities.

C. CASE STUDY: PATTERN RECOGNITION
To show an explicit example of the proposed construction, let us fix m = 16 and N = 4.Following [28], we can visualize a 16-bit binary vector b [see (2)] as a 4 × 4 binary pattern of black (b j = −1) and white (b j = 1) pixels.Moreover, we can assign to every possible pattern an integer label k b corresponding to the conversion of the binary string k b = b 0 . . .b 15 , where b j = (−1) b j .We choose as our target w the vector corresponding to k w = 20032, which represents a black cross on white background at the north-west corner of the 16-bit image (see Fig. 2).
Starting from the global variational strategy, we construct a parameterized ansatz for V ( θ ) as a series of entangling (E) and rotation (R( θ )) cycles where n is the total number of cycles, which, in principle, can be varied to increase the expressibility of the ansatz by increasing its total depth.Rotations are assumed to be acting independently on the N = 4 qubits according to where σ (q) y is the Pauli y-matrix acting on qubit q.At the same time, the entangling parts promote all-to-all interactions between the qubits according to cnot qq (17) where cnot qq is the usual controlled not operation between control qubit q and target q acting on the space of all four qubits.For n cycles, the total number n θ of θ -parameters, including the initial rotation layer R(θ 0,1 . . .θ 0,4 ), is, therefore, A qubit-by-qubit version of the ansatz can be constructed in a similar way by using the same structure of entangling and rotation cycles, decreasing the total number of qubits by one after each layer of the optimization.Here, we choose a uniform number n of cycles per qubit (this condition will be relaxed afterward; see Section III-D), thus setting ∀ j = 4 For j = 4, we add a single general single-qubit rotation with three parameters where σ = (σ x , σ y , σ z ) are again the usual Pauli matrices.We implemented both versions of the variational training in Qiskit [42], combining exact simulation of the quantum circuits required to evaluate the cost function with classical Nelder-Mead [43] and Cobyla [44] optimizers from the scipy Python library.We find that the values n = 3 and n = 2 allow the routine to reach total fidelities to the target state |1 ⊗N well above 99.99%.As shown in Fig. 2, this, in turn, guarantees a correct reproduction of the exact activation probabilities of the quantum artificial neuron with a quantum circuit depth of 19 (29) for the global (qubit-by-qubit) strategy, as compared to the total depth equal to 49 for the exact implementation of U w using hypergraph states.This counting does not include the gate operations required to prepare the input state, i.e., it only evidences the different realizations of the U w implementation assuming that each |ψ i is provided already in the form of a wavefunction.Moreover, the multicontrolled C P Z operations appearing in the exact version were decomposed into single-qubit rotations and cnots without the use of additional working qubits.Notice that these conditions are the ones usually met in real near-term superconducting hardware endowed with a fixed set of universal operations.

D. STRUCTURE OF THE ANSATZ AND SCALING PROPERTIES
In many practical applications, the implementation of the entangling block E could prove technically challenging, in particular for near-term quantum devices based, e.g., on superconducting wiring technology, for which the available connectivity between qubits is limited.For this reason, it is useful to consider a more hardware-friendly entangling scheme, which we refer to as nearest neighbors.In this case, each qubit is entangled only with at most two other qubits, essentially assuming the topology of a linear chain This scheme may require even fewer two-qubit gates to be implemented with respect to the all-to-all scheme presented above.Moreover, this entangling unitary fits perfectly well on those quantum processors consisting of linear chains of qubits or heavy hexagonal layouts.We implemented both global and local variational learning procedures with nearest neighbor entanglers in Qiskit [42], using exact simulation of the quantum circuits with classical optimizers to drive the learning procedure.In the following, we report an extensive analysis of the performances and a comparison with the all-to-all strategy introduced in Section III-C.All the simulations are performed by assuming the same cross-shaped target weight vector w depicted in Fig. 2.
In Fig. 3, we show an example of the typical optimization procedure for three different choices of the ansatz depth (i.e., number of entangling cycles) n = 1, 2, 3, assuming a global cost function.Here, we find that n = 3 allows the routine to reach a fidelity F ( θ ) to the target state |1 ⊗N above 99%.
In the local qubit-by-qubit variational scheme, we can actually introduce an additional degree of freedom by allowing the number of cycles per qubit, n , to vary between successive layers corresponding to the different stages of the optimization procedure.For example, we may want to use a deeper ansatz for the first unitary acting on all the qubits and shallower ones for smaller subsystems.We, thus, introduce a uantum  (20).On top of each rectangle, in light blue, we reported the depth of the corresponding quantum circuit to implement that given structure with that particular entangling scheme.For clarity, a structure "211" corresponds to a variational model having two repetitions (n 1 = 2) for the first layer acting on all four qubits, and one cycle (n 1 = n 2 = 1) for the remaining two layers acting on three and two qubits, respectively.Each bar was obtained by executing the optimization process ten times and then evaluating the means and standard deviations (shown as error bars).The optimization procedure was performed using COBYLA [44].

FIG. 4. Final fidelity obtained for the local variational training and using both the all-to-all entangler E (17) and nearest neighbor E nn
different n j for each V j ( θ j ) in (18), and we name structure the string "n 1 n 2 n 3 ."The latter denotes a learning model consisting of three optimization layers: V 1 ( θ 1 ) with n 1 entangling cycles, V 2 ( θ 2 ) with n 2 cycles, and V 3 ( θ 3 ) with n 3 cycles.In the last step of the local optimization procedure, i.e., when a single qubit is involved, we always assume a single threeparameter rotation [see (19)].A similar notation will be also applied in the following when scaling up to N > 4 qubits.
The effectiveness of different structures is explored in Fig. 4. We see that, while the all-to-all entangling scheme typically performs better in comparison to the nearest neighbor one, this increase in performance comes at the cost of deeper circuits.Moreover, the stepwise decreasing structure "321" for the nearest neighbor entangler proves to be an effective solution to problem, achieving a good final accuracy (above 99%) with a low circuit depth.This trend is also confirmed for the higher dimensional case of N = 5 qubits, which we report in Fig. 5. Here, the dimension of the underlying pattern recognition task is increased by extending the original 16-bit weight vector w with extra 0 s in front of the binary representation k w .In fact, it can easily be seen that, assuming directly nearest neighbor entangling blocks, the decreasing structure "4321" gives the best performancedepth tradeoff.Such an empirical fact, namely that the most efficient structure is typically the one consisting of decreasing depths, can be heuristically interpreted by recalling again that, in general, the optimization of a function depending on the state of a large number of qubits is a hard training problem [14].Although we employ local cost functions, to complete our particular task, each variational layer needs to successfully disentangle a single qubit from all the others still present in the register.It is, therefore, not totally surprising that the optimization carried out in larger subsystems requires more repetitions and parameters (i.e., larger n j ) in order to make the ansatz more expressive.
By assuming that the stepwise decreasing structure remains sufficiently good also for a larger number of qubits, we studied the optimization landscape of global [see (11)] and local [see (13)] cost functions by investigating how the hardness of the training procedure scales with increasing N.As commented above for N = 5, we keep the same underlying target w, which we expand by appending extra 0 s in the binary representation.To account for the stochastic nature of the optimization procedure, we run many simulations of the same learning task and report the mean number of iterations needed for the classical optimizer to reach a given target fidelity F = 95%.Results are shown in Fig. 6.The most significant message is that the use of the aforementioned local cost function seems to require higher classical resources to reach a given target fidelity when the number of qubits increases.This actually should not come as a surprise, since the number of parameters to be optimized in the two cases is different.In fact, in the global scenarios, there are N + N • n (the first N is due to the initial layer of rotations) to be optimized, while in the local case there are N + N • n 1 for the first layer, (N − 1) + (N − 1) • n 2 for the second,...; for a total of where the final 3 is due to the fact that the last layer always consists of a rotation on the Bloch sphere with three parameters, see (19).Using the stepwise decreasing structure, that is, n q = q − 1, we eventually obtain N q=2 q + q(q − 1) = N q=2 q 2 ∼ O(N 3 ), compared to # global ∼ O(N 2 ).Here, we are assuming a number of layers n = N − 1, consistently with the N = 4 qubits case (see Fig. 3).While, in the global case, the optimization makes full use of the available parameters to globally optimize the state toward |1 ⊗N , the local unitary has to go through multiple disentangling stages, requiring (at least for the cases presented here) more classical iteration steps.At the same time, it would probably be interesting to investigate other examples in which the number of parameters between the two alternative schemes remains fixed, as this would most likely narrow the differences and provide a more direct comparison.
In agreement with similar investigations [45], we can actually conclude that only modest differences between global and local layerwise optimization approaches are present when dealing with exact simulations (i.e., free from statistical and hardware noise) of the quantum circuit.Indeed, both strategies achieve good results and a final fidelity F ( θ ) > 99%.At the same time, it becomes interesting to investigate how the different approaches behave in the presence of noise and specifically statistical noise coming from measurements operations.For this reason, we implemented the measurement sampling using the Qiskit qasm_simulator and employed a stochastic gradient descent (SPSA) classical optimization method.Each benchmark circuit is executed n shots = 1024 times in order to reconstruct the statistics of the outcomes.Moreover, we repeat the stochastic optimization routine multiple times to analyze the average behavior of the cost function.In Fig. 7, we show the optimization procedure for the local and global cost functions in the presence of measurement noise, with both of them reaching acceptable and identical final fidelities F local = 0.87 ± 0.02 and F global = 0.89 ± 0.02.Notice that for the local case [see Fig. 7(a)], each colored line indicates the optimization of a V j ( θ j ) from (13).We observe that the training for the local model generally requires fewer iterations, with an effective optimization of each single layer.On the contrary, in the presence of measurement noise, the global variational training struggles to find a good direction for the optimization and eventually follows a slowly decreasing path to the minimum.These findings look to be in agreement, e.g., with results from [45] and [46]: with the introduction of statistical shot noise, the performances of the global model are heavily affected, while the local approach proves to be more resilient and capable of finding a good gradient direction in the parameters space [46].In all these simulations, the parameters in the global unitary and in the first layer of the local unitary were initialized with a random distribution in [0, 2π ).All subsequent layers in the local model were initialized with all parameters set to zero in order to allow for smooth transitions from one optimization layer to the following.This strategy was actually suggested as a possible way to mitigate the occurrence of Barren plateaus [45], [47].
We conclude the scaling analysis by reporting in Fig. 8 a summary of the quantum circuit depths required to implement the target unitary transformation with different strategies and for increasing sizes of the qubit register up to N = 7. Evidently, all the variational approaches scale much better when compared to the exact implementation of the target U w , with the global ones requiring shallower depths in the specific case.In addition, we recall that the use of an allto-all entangling scheme requires longer circuits due to the implementation of all the cnots, but generally needs less ansatz cycles (see Fig. 4).At last, while the global procedures seem to provide a better alternative compared to local ones in terms of circuit depth, they might be more prone to suffering from classical optimization issues [14], [45] when trained and executed on real hardware, as suggested by the data reported in Fig. 7.The overall promising results confirm the significant advantage brought by variational strategies compared to the exponential increase of complexity required by the exact formulation of the algorithm.

IV. CONCLUSION
In this article, we reviewed an exact model for the implementation of artificial neurons on a quantum processor, and we introduced variational training methods for efficiently handling the manipulation of classical and quantum input data.Through extensive numerical analysis, we compared the effectiveness of different circuit structures and learning strategies, highlighting potential benefits brought by hardwarecompatible entangling operations and by layerwise training routines.This article suggests that quantum unsampling techniques represent a useful resource, upon input of quantum training sets, to be integrated in quantum machine learning applications.From a theoretical perspective, our proposed procedure allows for an explicit and direct quantification of possible quantum computational advantages for classification tasks.It is also worth pointing out that such a scheme remains fully compatible with recently introduced architectures for quantum feedforward neural networks [30], which are needed, in general, to deploy, e.g., complex convolutional filters.Moreover, although the interpretation of quantum hypergraph states as memory-efficient carriers of classical information guarantees an optimal use of the available dimension of an N-qubit Hilbert space, the variational techniques introduced here can, in principle, be used to learn different encoding schemes designed, e.g., to include continuousvalued features or to improve the separability of the data to be classified [23], [24], [48].In all envisioned applications, our proposed protocols are intended as an effective method for the analysis of quantum states as provided, e.g., by external devices or sensors, while it is worth stressing that the general problem of efficiently loading classical data into quantum registers still stands open.Finally, on a more practical level, a successful implementation on near-term quantum hardware of the variational learning algorithm introduced in this article will necessarily rely on a deeper analysis of the impact of realistic noise effects both on the training procedure and on the final optimized circuit.In particular, we anticipate that the reduced circuit depth produced via the proposed method could critically lessen the quality requirements for quantum hardware, eventually leading to meaningful implementation of quantum neural networks within the near-term regime.

FIG. 1 .
FIG. 1. Variational learning via unsampling.(a) Global strategy, with optimization targeting all qubits simultaneously.(b) Local qubit-by-qubit approach, in which each layer is used to optimize the operation for one qubit at a time.

VOLUME 2, 2021 FIG. 2 .
FIG. 2. Comparison of output activation p out = | ψ w |ψ i | 2 among the exact (hypergraph states routine), global (n = 3), and local (n = 2) approximate implementations of U w .The inset shows the general mapping of any 16-dimensional binary vector b onto the 4 × 4 binary image (b) and the cross-shaped w used in this example (c).The selected inputs on which the approximations are tested were chosen to cover all the possible cases for p out and are labeled with their corresponding integer k i (see the main text).

FIG. 3 .
FIG. 3. Optimization of the global unitary with nearest neighbor entanglement for three different structures differing in the numbers of entangling blocks n.The cost function is | 11 . . .1|V ( θ)|ψ w | 2 = 1 − F ( θ) [see (11)].Only for n = 3, the learning model has enough expressibility to reach a good final fidelity.The classical optimizer used in this case was COBYLA [44].

FIG. 5 .
FIG. 5. Final fidelities for different structures of the local variationallearning model with a nearest neighbor entangler, for the case of N = 5 qubits.Similarly to the case with N = 4 qubits portrayed in Fig.4, the most depth-efficient structure is the one consisting of constantly decreasing number of cycles.

FIG. 6 .
FIG. 6. Number of iterations of the classical optimizer to reach a fidelity of F = 95%.Each point in the plot is obtained by running the optimization procedure ten times and then evaluating the mean and standard deviation (shown as error bars in the plot).All results refer to exact simulations of the quantum circuits in the absence of statistical measurement sampling or device noise, performed with Qiskit statevector_simulator.

FIG. 7 .
FIG. 7. Optimization of cost functions for the (a) local and (b) global case in the presence of measurement noise for N = 5 qubits.In each figure, we plot the mean values averaged on five runs of the simulation.The shaded colored areas denote one standard deviation.The number of measurement repetitions in each simulation was 1024.The final fidelity at the end of the training procedure in this case were F local = 0.87 ± 0.02 and F global = 0.89 ± 0.02.Notice the difference in the horizontal axes bounds.(a) Optimization of the local cost functions V j ( θ j ) [see (13)], plotted with different colors for clarity.The vertical dashed lines denotes the end of the optimization of one layer, and the start of the optimization for the following one.(b) Optimization of the global cost function V ( θ) in (11).

FIG. 8 .
FIG. 8. Scaling of circuit depth for the implementation of U w computed with Qiskit.The labels locals and global refer to the local and global variational approaches, while a2a and nn refer to the all-to-all and nearest neighbor entangling schemes, respectively.The number of ansatz cycles used for both the global (n) and local/qubit-by-qubit (n ) variational constructions and for each entangling structure are increased with the number of qubits up to the minimum value guaranteeing a fidelity of the approximations above 98%.