Experimental Evaluation of Quantum Machine Learning Algorithms

Machine learning and quantum computing are both areas with considerable progress in recent years. The combination of these disciplines holds great promise for both research and practical applications. Recently there have also been many theoretical contributions of quantum machine learning algorithms with experiments performed on quantum simulators. However, most questions concerning the potential of machine learning on quantum computers are still unanswered such as How well do current quantum machine learning algorithms work in practice? How do they compare with classical approaches? Moreover, most experiments use different datasets and hence it is currently not possible to systematically compare different approaches. In this paper we analyze how quantum machine learning can be used for solving small, yet practical problems. In particular, we perform an experimental analysis of kernel-based quantum support vector machines and quantum neural networks. We evaluate these algorithm on 5 different datasets using different combinations of quantum feature maps. Our experimental results show that quantum support vector machines outperform their classical counterparts on average by 3 to 4% in accuracy both on a quantum simulator as well as on a real quantum computer. Moreover, quantum neural networks executed on a quantum computer further outperform quantum support vector machines on average by up to 5% and classical neural networks by 7%.


I. INTRODUCTION
Hardly any other field of research in computer science has made such rapid progress in recent years as machine learning. It is used successfully in various areas both in research and in industry [18]. However, there are also limits and unsolved problems in practical applications due to the enormous computing resource requirements of large machine learning algorithms such as transformer-based language models [29]. Moreover, machine learning methods are often complex and based on large amounts of data. Therefore, depending on the task, the algorithms can become extremely computationally intensive.
A novel type of computer hardware, quantum computers, promises considerable speed-up so that these algorithms are The associate editor coordinating the review of this manuscript and approving it for publication was Li He . useful for a broad class of users [5], [34]. Moreover, companies such as IBM and Amazon already provide public access to quantum computers via Python interfaces [1], [11]. This allows active research in quantum computing also for small to medium-sized research institutions or companies that do not have the computing resources of large corporations.
The field of quantum machine learning has gained considerable attention in the last years [4], [7], [10], [14]. However, it is still relatively unclear what kind of problems can be solved practically today and which ones remain only of theoretical nature.
In this paper we will perform an experimental evaluation of quantum support vector machines (QSVM) as well as quantum neural networks (QNN) and compare them against their classical counterparts. In our first set of experiments we will evaluate kernel-based SVMs [14]. Classical kernel-based SVMs have been studied well and have been widely applied. VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ However, the classical approaches suffer in situations where the feature space becomes large and the kernel functions become computationally expensive to estimate. These limitations can be overcome by using quantum algorithms that enable the exploitation of an exponentially large quantum state space through controllable entanglement and interference.
In addition, we study various implementations of QNNs based on different quantum circuits. One of the open research questions is how do design optimal quantum circuits both for QSVMs as well as QNNs such that the quantum algorithm shows the best learning behavior for practical machine learning problems. In order to address this question, we will perform an experimental evaluation of kernel-based QSVMs as well as QNNs for classification problems using five different datasets.
This paper makes the following contributions: • We perform a detailed experimental evaluation of classical kernel-based SVMs and compare the accuracy against QSVMs running both on a quantum simulator and a real, publicly available quantum computer. We also compare the performance of classical neural networks against quantum neural networks.
• Our experimental evaluation on five different datasets shows that QSVMs outperform their classical counterparts on average by 3 to 4%. Moreover, QNNs further outperformed QSVMs by up to 5%.
• Further comparisons with classical neural networks demonstrate that QNNs also outperform those whilst using far fewer parameters, which has also been confirmed in comparable experiments [9].
• These results demonstrate that quantum computing can be successfully applied for small-scale machine learning problems in practice already today.

II. QUANTUM COMPUTING FOUNDATIONS
Quantum computing is a new computational paradigm based on the fundamental principles of quantum mechanics. The major concepts that can be leveraged for performing calculations are superposition and entanglement which basically means that quantum computation does to information what traditional quantum mechanics does to elementary particles and photons: it characterizes these fundamental entities by wave-and particle-like aspects. We will briefly explain these concepts below.
In a classical computer, information is stored in bits whose states can either be 0 or 1. In a quantum computer, the information is stored as qubits (quantum bits) to either represent the values 0 and 1 or a linear combination of both. Expressed in mathematical terms, a qubit is a vector of length 1 in a two-dimensional complex Hilbert space H 2 (basis |0⟩, |1⟩). A general qubit |q⟩ is given by |q⟩ = c 0 |0⟩ + c 1 |1⟩ with c 0,1 ∈ C and |c 0 | 2 + |c 1 | 2 = 1. The normalization means that a qubit is characterized by three real numbers (c n = a n + ib n , Classical bits can be combined into registers, which contain bit sequences. In contrast, the state of a quantum register of length n can be a linear combination of all possible bit sequences of length n. Expressed in mathematical terms, a quantum register is a state in the tensor product of n two-dimensional Hilbert spaces H 2 n , i.e. a Hilbert space of dimension 2 n . The basis vectors can be written as: with b k ∈ {0, 1}. Here, |b 1 b 2 . . . b n ⟩ is a basis vector in H 2 n . A general vector (or state, as it is called in quantum mechanics) is a linear combination of these basis states: Quantum mechanics requires that the state |Q⟩ is normalized: These linear combinations are called superpositions. There are two types of processes one can apply on such quantum registers: • Quantum dynamics, which are unitary transformations (rotations and reflections) of |Q⟩. These unitary transformations are reversible and fully deterministic.
• Measurements: These are projections combined with normalizations. For our purposes this means that a measurement M maps the state |Q⟩ of a quantum register stochastically onto one of the basis states |b 1 b 2 . . . b n ⟩.
The probability for this to happen is given by A measurement is irreversible and surjective (which implies that in a measurement, one looses information). It helps to imagine a quantum computation as a series of unitary operations (true quantum operations) finalized with one measurement (more involved schemes are used, though). Importantly, rotating |Q⟩ affects ''all basis states at once'', i.e. all basis vectors in Equation 2 are manipulated in parallel. Thus, a quantum computer is a highly parallel supercomputer.
The concept of entanglement implies that the combined state of qubits contains more information than the qubits have independently. This is easy and worthwhile to understand. We explain it in some detail, because a widespread misconception of quantum computing sees its advantages solely with respect to the mentioned ''quantum parallelism''.
We start with the observation that a single qubit is determined by three real numbers. A collection of n independent qubits is therefore characterized by 3n real values. A superposition is given by 2 n complex numbers being subject to the normalization condition in Equation 3. This implies that the state of |Q⟩ is determined by 2 · 2 n − 1 real numbers. For the case of n = 2, this means that two independent qubits are characterized by six parameters, whereas the state of a quantum register of length n = 2 is determined by seven numbers.
Mathematically, this means that in general, the state of a quantum register |Q⟩ cannot be written as a tensor product of two independent qubits q A , q B . The equation describes the tensor product of two independent qubits. The state of a two-qubit register is given by For a general choice of c 00 , c 01 , c 10 , c 11 with |c 00 | 2 +|c 01 | 2 + |c 10 Physically, there is in general no way to interpret the entangled state of a quantum register in terms of a collection of individual qubits.
In order to manipulate qubits, quantum circuits are used. These circuits are similar to their classical counterparts but they contain additional logical operators and gates. One of the gates is the Hadamard gate which brings qubits in a superposition.
Another important type of operator are the controlled Pauli-gates. Single qubits can be visualized as points on a two-dimensional sphere, the so called Bloch-sphere. The Bloch-sphere is embedded into a three-dimensional space with coordinate axis x, y, z. Note well that these coordinates have no direct relation to the actual physical space, but are primarily a consequence of a specific representation of qubits (a more detailed historical analysis of the origin of the Bloch-sphere would tell a somewhat different story, which is, however, not of relevance in the context here).
Controlled manipulation of qubits can then be understood as rotations around the x, y, z-axis, and consequently, these gates are also called controlled X-, Y-and Z-gates. As it turns out, since these rotations of one qubit depend on the state of another qubit, the application of such a controlled gate leads to quantum entanglement. Mathematically, all quantum gates can be considered as unitary matrix operations.
An example of a simple quantum circuit is given in Figure 1. The circuit consists of three qubits q 0 , q 1 and q 2 . First, all qubits are initialized with the ground state 0. Then, the Hadamard-gate is applied on qubit q 1 , followed by controlled X-gate operation with qubit q 2 and a controlled X-gate operation between qubits q 0 and q 1 , followed by a Hadamard-gate applied on qubit q 0 . Finally, the qubits q 0 and q 1 are measured.
Let us start with analyzing QSVMs in more detail. [14] propose a variational quantum circuit to classify data similar to SVMs. This approach uses a variational circuit that generates a separating hyperplane in the quantum feature space. A further approach proposed by the same authors is called quantum kernel estimator, which is used to estimate the kernel function and optimize a classifier. To evaluate the approach, a synthetic data set is used that contains 20 data points per label. In our experiments we also use this data set.
One of the first study to demonstrate that quantum neural networks show an advantage over their classical counter parts is presented in [2]. The authors evaluate two different feature maps for the quantum neural network. One feature map is based on the circuit introduced in [14]. The second circuit uses parameterized RY-gates, which are followed by CNOT-gates that are applied between every pair of qubits in the circuit. Finally, another set of parameterized RY-gates are used. The QNN is evaluated both on a quantum simulator as well as on a real quantum computer using the Iris data set that we also use in our experiments.
A quanvolutional neural network architecture is proposed in [15]. The basic idea is to replace a convolutional filter of a CNN with a quanvolutional layer that transforms input data with a random quantum circuit. The approach is evaluated with image data on a quantum simulator. Our approach, however, is evaluated on various numerical data sets both on a quantum simulator as well as on real quantum hardware.
[8] introduce a hybrid classical-quantum approach called Quantum Short Long-Term Memory. The idea is to replace parts of classical RNN with a variational quantum circuit. The approach is evaluated on a quantum simulator but not on real quantum hardware.
In this paper we evaluate existing approaches based on QSVMs and QNNs. Previous algorithms have mostly been evaluated either only on a quantum simulator or on a single data set. Hence, comparability of the algorithms as well as generalizability of the approaches to other data sets has not been demonstrated in depth. In our paper, we evaluate various quantum machine learning approaches on 5 different data sets both on a quantum simulator as well as on real quantum hardware.
The major question we study in this paper is how well these algorithms perform on small, yet real machine learning problems using publicly available quantum hardware.

IV. MACHINE LEARNING APPROACHES A. KERNEL-BASED SVMs: CLASSICAL APPROACH
A feature function φ(⃗ x) is a mapping of a data point ⃗ x into feature space of higher dimension. This is advantageous for classification because it opens up more possibilities for a hyperplane to separate data point of different classes.
The so-called kernel trick allows re-writing of a linear decision function used by SVMs in terms of a dot product between data points. In combination with a feature function, it can be further substituted with a kernel function k(⃗ x, ⃗ , for a given training data point ⃗ x (i) and a data point ⃗ x for which the decision is made [13]. The decision function in x (i) ) introduces a shortcut to the explicit calculation of the dot product between feature vectors, which can be of infinite dimension. Furthermore the resulting function is linear in the feature space.
The part i α i k(⃗ x, ⃗ x (i) ) of the function is called kernel matrix and represents the similarity values between each training data point.

B. KERNEL-BASED SVMs: QUANTUM APPROACH
Let us now discuss how classification can be implemented on a quantum computer. In principle, we need to following steps: • Transform the classical data points into quantum data points with a quantum circuit.
• Use a parameterized quantum circuit to classify the data.
• Measure the output.
• Send the results of the quantum kernel to a classical SVM for final classification.
These steps comprise a so-called variational quantum classifier [31] leveraging parameterized quantum circuits. Since current quantum computers are still quite error-prone, a common approach is to implement one part of the end-to-end process on a quantum computer and the remaining parts on a classical computer. In particular, [14] suggest a quantum kernel estimator, where the kernel function is implemented as a quantum kernel, i.e. a quantum circuit, which translates classical data into quantum states via a quantum feature map, and then builds the inner product of these quantum states.
The inner product is used for further processing by the classical SVM. As a final step, the classification is performed by a kernel-based SVM on a conventional computer using the calculated kernel. In summary, the calculation of the kernel matrix is performed by a quantum algorithm, whereas the classical SVM algorithm is executed on a conventional computer.
Let us describe the QSVM approach more formally. According to Thomsen [30], using an already classified data point ⃗ x (i) and a to be classified data point ⃗ x, the corresponding decision function uses the kernel function to classify ⃗ x, QSVM where a and b are training parameters and M refers to the size of the data set. The quantum analogue kernel function equation -with an exponentially large space of density matrices S(2 q ) spanned across q qubits as the feature space -can be seen in Equation 7 [30].
As previously stated, the quantum circuit, which performs the transformation into the quantum space, is called quantum feature map. Typical quantum feature maps are the Z-featuremap, the ZZ-feature-map and the Pauli-feature-map [14].
An example of the Pauli-feature-map, which is the most generic feature map, is shown in Figure 2. It consists of two different quantum gates, namely the Hadamard gate, which puts qubits in superposition, and a parameterized P-gate (phase gate). In addition, we can see the controlled X-gate (''+'') which enables entanglement between qubits.
The circuit can also be stacked and thus made wider in order to design even more complex feature maps resulting in a quantum circuit with a larger depth. However, due to the limitations of current quantum devices, larger quantum circuits often lead to a higher error rate. Hence, designing optimal quantum kernels for SVMs is still an unsolved research problem. The goal of this paper is to evaluate various feature maps for solving small, yet practical machine learning problems using QSVMs.

C. QUANTUM NEURAL NETWORK
The design of the quantum neural network is inspired by previous work of Havlicek et al. [14] and Thomsen [30]. The general architecture of the quantum circuit is shown in Figure 3a and consists of three parts. The first part is the feature map U (⃗ x) which is used to encode the input features of the used dataset into quantum states. The second part is the variational model W (θ) which evolves the quantum states of the system using trainable parameters θ. The final layer consists of the measurement of the final states. Figure 3b shows that the variational model can be repeated n times, which is similar to the layers of a classical neural network. The larger the quantum circuit, the better a function can be approximated and hence the better the generalization of the machine learning algorithm should be. At the same time, current quantum hardware is limited in its size and stability. Available systems do not allow for the creation of longer circuits with repeated, variational models. At the same time the length of a circuit should be minimized -noise affecting the quantum system during calculations leads to instabilities and influences the resulting measurements, which can falsify results. With this in mind, the circuits in this paper are limited to using a single instance of the variational model.
The parameters θ of the variational model are optimized using classical optimizers leveraging classical hardware.
Note that there is a fundamental difference between the QSVM we discussed previously and the QNN we described here. In the case of the QSVM, only the kernel is implemented using a quantum circuit while the SVM itself is implemented classically. As for the QNN, the whole neural network is implemented using a quantum circuit and only the optimization of the parameters is implemented classically.
We will now describe the implementation of the feature map and the variational model in more detail.

1) FEATURE MAP
The main goal of the feature map is to encode classical features of our dataset into the Hilbert space H the quantum system acts in. We apply the circuit U (⃗ x) to the zero state |0⟩, which defines the feature map as described in Equation 8 |ψ where ⃗ x is defined according to Equation 7 previously introduced in Section IV-B.
Among a multitude of embedding techniques [25], we have chosen angle encoding to encode the classical data into quantum states. Whilst angle encoding is not optimal as it requires n qubits to represent n-dimensional data, it is efficient regarding operations and directly useful for processing data in quantum neural networks [19]. Weigold et al. [32] state that only single-qubit rotations are needed for the state preparation routine, which is highly efficient and can be done in parallel for each qubit. Figure 4 shows a circuit with one qubit per feature. For instance, qubit q 0 represents feature x 0 which is encoded with a rotation around the y-axis where the angle is proportional to the value of feature x 0 .

2) VARIATIONAL MODEL
After the classical features are encoded as quantum states, these quantum states can be further evolved in the variational model according to Equation 9.
The trainable weights are embedded in the variational model W (θ) which can be grouped into n layers, where each layer consists of RY -, RX -, and RZ -rotation gates.
An example of a variational model with three parameters is shown in Figure 5. Note that first qubit q 0 is rotated by angle 2θ 1 around the y-axis. Next, qubit q 0 and qubit q 1 are entangled with a controlled R y -gate with the angle parameter VOLUME 11, 2023  2θ 2 followed by a rotation around the y-axis with the angle parameter 2θ 3 .

3) DECISION FUNCTION
Next, the resulting state of Equation 9 needs to be measured. Since we use our quantum circuit as a binary classifier, a bitstring z ∈ {0, 1} q is calculated which is associated with a class membership via the following Boolean function.
The classification is re-run multiple times (R shots) where R is the number of re-runs or shots. Thus, the resulting measurement outcome z is probabilistic and we need to assign the label to the bitstring with the largest probability. Hence the probability of measuring either label y ∈ {+1, 1} is given by where F is a diagonal operator Since F only has eigenvalues of −1 or +1, the expectation value is as follows:

V. EXPERIMENTS
In our first set of experiments we evaluate the performance of classical kernel-based support vector machines and compare them against QSVMs. First, we execute the QSVMs on qasm_simulator, a Python-based quantum simulator of IBM Qiskit [3] accessed via BasicAer. 1 Afterwards, we execute the experiments on ibmq_belem, a real quantum system providing 5 qubits [16], accessed via IBMQ. 2 In our second set of experiments we compare the accuracy of classical neural networks against quantum neural networks. 1 https://qiskit.org/documentation/apidoc/providers_basicaer.html 2 https://qiskit.org/documentation/apidoc/ibmq_provider.html The major questions we address with these experiments are as follows: • Which quantum circuit yields the best performance for a given dataset?
• Can we establish a clear strategy for designing quantum circuits?
• Does the quantum implementation of the algorithm have an advantage over the classical counterpart?

A. DATASETS
We will now describe the datasets that we used for our experiments. In particular, we used five datasets with varying degrees of difficulty, which was estimated from the order of the separating hyperplane in the origin space. Each of these datasets has one hundred data points and two classes containing the same number of data points. The chosen encoding strategy assumes one qubit per feature. Since our quantum computer provides a maximum of five qubits, we have reduced the number of features to a maximum of five, if necessary. For training and testing we performed an 80:20 percent split. Moreover, we performed a 10-fold cross validation for all experiments and report on the average results. An overview of the datasets is given in Table 1. 3 Iris dataset. This widely used flower dataset was loaded via the Python library scikit-learn. 4 For the experiment, the data points were selected from the Iris-Setosa and Iris-Virginica classes. A data point has four numerical features.
Rain dataset. This dataset is taken from kaggle.com 5 and contains about ten years of daily weather observations from many locations in Australia. The incomplete data entries were removed, and the following five features were selected for this purpose: MinTemp, Humidity9am, Wind-Speed3pm, Pressure9am, WindDir9am. The attribute Rain-Tomorrow serves as a class label. Its categorical values No and Yes were mapped to the numbers 0 and 1, respectively.
Vlds dataset. This dataset was generated using a dataset generator provided by scikit-learn. 6 The characteristics of the features are shown in Figure 6.  Custom dataset. The dataset consists of data points with two features, generated using the function numpy.random. default_rng. 7 In a 2D representation, the data points are part of a square. The points on the diagonals are labelled as 0.0 (see Figure 7).
Adhoc dataset. This dataset is artificially generated and described in [14]. It provides a complete classification of data points using the feature map configurations of the quantum kernel estimator approach chosen in [14]. Qiskit provides an Adhoc dataset generator, 8 allowing the generation of data points with three features. The characteristics of the features are shown in Figure 8.

B. CLASSICAL KERNEL-BASED SUPPORT VECTOR MACHINES
We first evaluate the accuracy of classical kernel-based SVMs provided by sklearn.svm.SVC with default hyperparameter configurations, using four different kernels, namely linear, polynomial, radial basis function and sigmoid. We have used these kernels of different complexity to be able to adapt to the datasets, that are also of different complexity.
The results are shown in Table 2. As we can see, for the Iris dataset, all kernels have a perfect accuracy score of 1.00. For the Rain dataset, the rbf kernel performs best with an 7 https://numpy.org/doc/stable/reference/random/generator.html 8 https://qiskit.org/documentation/stable/0.26/_modules/qiskit/ml/ datasets/ ad_hoc.html#ad_hoc_data accuracy of 0.77. For the Vlds dataset, all kernels behave similarly with a slight advantage for the linear kernel. For the custom dataset, again the rbf kernel performs best. Finally, for the Adhoc dataset, which is the most complex one, the linear kernel performs best with an accuracy score of 0.56 followed by sigmoid and rbf. In general, the rbf kernel appears to be the most robust one across all five datasets.

C. QUANTUM SUPPORT VECTOR MACHINES ON QUANTUM SIMULATOR
Let us now analyze the performance of QSVMs on a quantum simulator. Figure 9 shows the accuracy of QSVMs using three different feature maps and four different entanglement strategies (none, linear, circular, full). To analyze the effect of the circuit depth on the accuracy, we set the circuit depth to either 1, 2, 4 or 8.
We first consider the results for the Iris dataset. Figure 9a shows that with the Z-feature-map of depth 1 and 4, as well as with the ZZ-feature-map and the Pauli-feature-map of depth 1 the accuracy of 100% can be achieved. For all other feature maps with a depth above 2 we can observe a relatively high variance which might be due to the characteristics of the different data samples due to the 10-fold cross validation. We also notice that introducing entanglement harms the performance of the algorithm.
For the Rain dataset 9b we can clearly see that the Z-feature-map of depth 1 outperforms all other feature maps. The same pattern is observed with the Vlds dataset.
For the Custom and the Adhoc datasets, which are considered to be the most complex datasets, when can again see that the Z-feature-map performs best. Moreover, it can be recognized, that increasing the depth slightly improves the accuracy.
In summary, we can observe that the Z-feature-map performs best across all five datasets. We can also see that a greater depth of the feature map circuits can have a positive effect on the accuracy for more complex datasets. Finally, the more complex feature maps ZZ-feature-map and Paulifeature-map have a negative impact on the accuracy. The latter also suggests that these algorithms cannot take advantage of entanglement. Hence, additional studies are required to understand these phenomena in more detail. VOLUME 11, 2023 FIGURE 9. Accuracy of different quantum Support Vector Machines using three different feature maps (quantum kernels) with four different entanglement strategies on a quantum simulator for five different datasets. VOLUME 11, 2023

D. QUANTUM SUPPORT VECTOR MACHINES ON QUANTUM COMPUTER
Let us now evaluate the performance of quantum support vector machines on a real, publicly available quantum computer. The major question is if the algorithms still perform well on a quantum device or if the failure rates of the underlying quantum computer render these types of quantum machine learning algorithms impractical.
Our experimental results on a real quantum computer showed similar results to the ones on a quantum simulator. These results are extremely promising since current quantum computers are still very error-prone especially for circuits with a large number of quantum gates. Hence, being able to run quantum machine learning algorithms on real quantum computers that outperform their classical counterparts is a very promising step in quantum space.
In Table 3 we compare the best results of the classical kernel-based SVMs with the QSVMs on the quantum simulator as well as on the real quantum computer. As we can see, the QSVMs have a higher average accuracy over all five datasets than the classical counterparts and thus outperform them by 4% and 3%, respectively.

E. QUANTUM NEURAL NETWORKS
We will now evaluate the performance of quantum neural networks. Recall that we use a hybrid approach where the neural    network is implemented as a variational quantum circuit and the optimizer is implemented using classical hardware.
In our experiments we use the following 5 quantum circuits. To increase expressiveness in different ways [28], all circuits use at least one or a combination of the RY , RX , RZ gates. There is only one circuit without entanglement which is q_circuit_04. All other circuits use entanglement by including either a CX , CZ or a CRZ gate in their variational model inspired by [26].

1) q_circuit_01
The circuit in Figure 10 is built using circular entanglement with RY-gates followed by parameterized entangled CRYgates consisting of 1 layer.

2) q_circuit_02
The circuit in Figure 11 is built using RY-gates followed by circular entangled parameterized CRY-gates depicted with 1 layer.

3) q_circuit_03
The circuit in Figure 12 is built using circular entanglement with RY-gates followed by entangled CZ-gates and is depicted with 1 layer. VOLUME 11, 2023

4) q_circuit_04
The circuit in Figure 13 is built using RX-gates followed by RY-gates and final RZ-gates. This circuit is without entanglement.

5) q_circuit_05
The circuit in Figure 14 is built using circular entanglement with RX-gates followed by RY-gates and final RZ-gates entangled by CX-gates and is depicted with only 1 layer.

6) CLASSICAL OPTIMIZERS
Inspired by the work of Pellow-Jarman et al. [23], we selected the same four optimizers: AMSGRAD, SPSA, BFGS and COBYLA. For all optimizers we tuned the parameter settings ranging from 100 to 1500 iterations.

7) RESULTS OF NEURAL NETWORKS
For the classical approach we have implemented various fully-connected neural networks with 1, 2 and 3 hidden layers using PyTorch [22]. These neural networks were then passed to Ray Tune [20], which is used to facilitate the search for good hyperparameters when optimizing neural networks. The best validation accuracy was selected out of 10 runs with the best hyperparameters. The input data was normalized in the same way it was for the quantum neural network, as to eliminate the normalization as a deciding factor. As we can see in Table 4, the average accuracy of the best classical neural networks over all 5 datasets is 78%, which is similar to the performance of the classical SVM shown in Table 3.
We will now evaluate the performance of the quantum neural networks. Figure 15 shows the accuracy of the quantum neural networks with different classical optimizers on 5 datasets. Let us first analyze the performance on the Iris dataset. On the simulator we can see that except for QNNs using the SPSA-optimizers, all others receive a perfect score of 100% accuracy. On the quantum computer we can see a similar behavior.
For the rain dataset, the highest accuracy of 82% can be achieved with the AMSGRAD-optimizer on the quantum simulator. On the quantum hardware, the SPSA-optimizer shows a slight advantage followed by AMSGRAD.
For the Vlds dataset, the BFGS-optimizer shows the highest accuracy. For the custom dataset and the adhoc dataset the winner is COBYLA. In short, for all the different experiments there is no clear winning optimizer.
When looking at the quantum circuits, it also turns out that there is no clear winner and there is a relatively high variance between the different circuits. The reason might be due to the variations of the dataset and the relatively small number of data records. Table 4 shows the performance of the best combination of quantum circuits and optimizers per dataset. On average, the accuracy over all 5 datasets is 85.8% on the quantum simulator and 84.7% on the quantum computer. The results of the QNN are 5% better than the results of the QSVM which demonstrates the advantage of quantum implementations over a hybrid quantum-classical implementation. Moreover, the quantum neural network executed on the quantum computer outperforms the classical neural network by 7% -even though the classical neural network is vastly more complex. In the case of the vlds dataset, hyperparameter optimization resulted in a neural network with 69,402 parameters, whereas as the biggest quantum neural network has 15 parameters.

VI. CONCLUSION
In this paper we performed a detailed experimental evaluation of quantum support vector machines and quantum neural networks. Our experimental evaluation showed that QSVMs outperform their classical counterparts on average by 3 to 4% in terms of accuracy. We could also show that the quantum neural networks further outperformed the QSVMs by up to 5%.
Even though our experiments were only performed on relatively small datasets, these results demonstrate that quantum computing can be successfully applied for small-scale machine learning problems in practice already today. Given the tremendous progress in the development of quantum hardware, we expect that also larger problem sizes can be tackled in the near future. Whilst only usable for problems of a limited size, they outperform classical solutions on the same problems, whilst being, comparatively, less complex.
Our current experiments showed that the best quantum kernel is based on the Z-feature-map which does not use quantum entanglement. One of the open research questions is how to design quantum circuits such that they can take advantage of entanglement and thus harness of the full power of quantum computing. Another open research question is to find out how the analyzed algorithms perform on larger datasets. Larger, less error-prone quantum hardware might give more insights.