Quantum Embedding Search for Quantum Machine Learning

This paper introduces a novel quantum embedding search algorithm (QES, pronounced as"quest"), enabling search for optimal quantum embedding design for a specific dataset of interest. First, we establish the connection between the structures of quantum embedding and the representations of directed multi-graphs, enabling a well-defined search space. Second, we instigate the entanglement level to reduce the cardinality of the search space to a feasible size for practical implementations. Finally, we mitigate the cost of evaluating the true loss function by using surrogate models via sequential model-based optimization. We demonstrate the feasibility of our proposed approach on synthesis and Iris datasets, which empirically shows that found quantum embedding architecture by QES outperforms manual designs whereas achieving comparable performance to classical machine learning models.


INTRODUCTION
Quantum machine learning is view as a potential advancement of quantum computing in the Noisy Intermediate-Scale Quantum (NISQ) era.As the validity of near-term quantum devices, quantum machine learning poses exciting advantages over classical counterparts.The potential quantum advantage can be addressed based on the geometric test over the input data space, followed by the complexity test for specific functions [1].Although quantum machine learning models are more often than not referred to as quantum neural networks, the terminology might be misleading to some extend.The classical neural networks can transform the original data space into higher or lower dimensional space based on the design of neural architectures.For example, state-of-theart neural architectures tend to transform high-dimensional inputs such as images into lower-dimensional representations of latent vectors.In contrast, quantum neural networks possess a similar mathematical structure to kernel methods, where input data is embedded into high-dimensional quantum Hilbert space [2], [3], [4].The quantum representations of input data are the outcome of quantum embedding, which plays a crucial role in the performance of quantum classifiers [5].Such quantum embeddings are quantum model functions, referred to as parameterized quantum circuits [6], quantum neural networks [7], [8], or variational circuits [9], [10].The quantum embeddings are often manually designed for specific use-cases, which requires extensive expert knowledge and computational resource.From the perspective of deep learning, the classical embeddings aim to transform the inputs into deep representations in the latent space, which commonly has lower dimensionality.For example, convolutional neural networks (CNNs) embeds the input images (considered high-dimensional input) into deep brief features (lower-dimensional representations in the laten space), which enables performing machine learning tasks such as classification or object detection.On the other hand, variational quantum embeddings also transforms the inputs into feature maps; however, the latent space in quantum embeddings is high-dimensional Hilbert space.[5] shows that decision boundaries established in the Hilbert space is associated with complex decision boundaries in the input space.
This paper introduces an automated search algorithm, which derives optimal design of entangling layout for supervised quantum machine learning.First, the proposed work directly addresses the ansatz optimization for emerging quantum machine learning via searching the optimal entanglement layout for ansatz architectures.The novel encoding scheme of entanglement as genotype vectors allows us to leverage ML-based search algorithms for the problem of ansatz optimization, which results in well-performed quantum neural networks.Second, we instigate the entanglement level to reduce the cardinality of the search space to a feasible size for practical implementations.Finally, we mitigate the cost of evaluating the true loss function by using surrogate models via sequential model-based optimization.We demonstrate the feasibility of our proposed approach on simulated and bench-marking datasets, including Iris, Wine and Breast Cancer datasets, which empirically shows that found quantum embedding architecture by QES outperforms manual designs of entanglement in term of the predictive performance.
1) We instigate an efficient encoding scheme of quantum embedding's architectures as directed multi-graphs, which enable us to well-define the search space of the quantum embedding search problem.Moreover, we introduce the constraints over the search space by quantum entanglement level, which reduces the cardinality of the search space to a reasonable size for practical implementations.The formulation of quantum entanglement as genotype vectors allows classical ML algorithms efficiently address the ansatz optimization problems.models enables the approximation of the actual loss function, which significantly reduces computational cost in the optimization, and (2) non-parametric densities in TPE allows us to draw multiple architecture candidates for evaluating the expected improvement of surrogates, which is more computationally effective.3) Discovered quantum embedding architectures by QES outperform manual designs in both synthesis and Iris dataset [12] while achieving compatible results compared to classical machine learning models.
We organize the paper as: Section 2 summarizes related work in the literature, Section 3 discusses our proposed QES algorithm indepth, Section 4 reports experimental results.Finally, we discuss the implication and threats to the validity of QES in Section 5.

Automated Architecture Search
Automated architecture search has drawn significant attention from the ML/DL-related research community.Its motivation is practical but straightforward; that is, there is no universal design of network for all datasets.The main objective of such an algorithm is to find an optimal design for the model's architecture based on predefined selection criteria.Initialization of automated architecture search algorithm starts with defining the configuration of the search space.The basic search space structure is known as flat search space, referred to as hyper-parameters optimization.For example, the flat search space of neural architecture search is to find the depth (number of layers), the width (number of initial channels), and the size of kernel.A more complicated formation of search space is cell-based neural architectures, where each neural candidate can be encoded as a directed acyclic graph [13], [14].
The search space of our proposed QES is motivated by the latter configuration, which will be discussed hereafter in Section 3.
Many frameworks have been proposed to tackle the automated architecture search problems.An early solution involves random search [15], which is often used as the baseline for comparison.The next progression is the development of sequential modelbased optimizations, which mitigate the expensive cost of actual loss function by using surrogate models [11], [16].More advanced search strategies have been proposed to tackle the problem, involving reinforcement learning [13], evolutionary [17], gradientbased with continuous relaxation and bilevel optimization [18], heuristic search with performance prediction [19] and SMBO-TPE [20].The biggest challenge in automated architecture search is the computational cost for the search phase.Search strategies such as reinforcement learning and evolutionary take up to 2000−3150 GPU days to find optimal architecture for the CIFAR-10 dataset [13], [14].Although progression helps to shorten the time complexity of the search procedure to reasonable time and hardware, the expensive computation is inherited from the costly evaluation of the cost function.The same issue appears while training quantum machine learning models in near-term quantum computers and quantum simulators.Recent work solves quantum circuit optimization problems by reinforcement learning with circuit transformation [21], which achieves remarkable results.However, such an algorithm still relies on evaluating the true loss function for maximizing the cumulative reward.We find that sequential model-based optimization is another potential solution for quantum embedding search/quantum circuit optimization since the approximation of true loss function by surrogates reduces the computational expense of searching for the optimal quantum embedding architecture.

Quantum Machine Learning
Quantum machine learning has become an emerging technology of quantum computing due to its potential for near-term intermediatescale quantum hardware.Current literature has witnessed the advantages of quantum machine learning over its classical counterparts given various learning tasks [22], [23], [24], [25], [26], [27].The primary approach for quantum machine learning is circuit-based models, referred to as variational quantum classifiers [9], [10], [28].Different strategies of classifier in the quantum Hilbert space have been proposed, including linear classifier [29], bitstrings parity-binary mapping [30], Helstrom, and fidelity classifiers [5].Moreover, a strong connection between quantum machine learning and kernel methods has been established in [2], [3], [4].The core component of circuit-based quantum machine learning models is the variational (parameterized) circuits called ansatz (plural ansaetze).The construction of an ansatz is formed by stacking multiple identical sub-layers, similar to the construction of cell-based neural architecture designs [13].Although many variational ansaetze have been proposed in the literature [4], [7], [28], [31], [32], [33], [34], there is no general framework to design optimal ansatz for data-specific scenarios or specific use-cases.It is the main motivation for our QES algorithm, which directly tackles the problem of discovering optimal quantum embedding architectures for given datasets of interest.

Ansatz Optimization
Optimizing ansatz circuit plays an indispensable role in designing quantum algorithms for specific tasks in practice.The ansatz optimization problem can be categorized into two main types.In the first category, we perform circuit simplification to reduce the computation for quantum hardware.In other words, the local or global structure of ansatz are optimized by being replaced with equivalent but more computationally efficient architectures [21], [35], [36].On the other hand, the second categorize of ansatz optimization aim to find the optimal ansatz that yields the best performance on given tasks.In other words, the heuristic search enables well-performed ansatz on specific tasks instead of reduce computation.Our proposed work is in the second category, which aims to find the optimal ansatz for quantum machine learning problems.There are several studies having the same objective with our proposed work, including: [37] introduces the usage of classical neural networks as surrogates to approximate the optimal parameters for tasks such as the Quantum Approximate Optimization Algorithm (QAOA) for MaxCut and Sherrington-Kirkpatrick Ising model or VQE for Hubbard model.[38] leverages student-teacher learning (or knowledge distillation) for tuning the circuit parameters such that the ansatz output in pre-chosen output.[39] optimizes VQE for discovering the ground stats of Lithium Hydride and Heisenberg model.

METHODOLOGY
Our proposed QES aims the find the optimal quantum embedding for supervised quantum machine learning under classification tasks.We follow the common design of ansatz used for supervised ML [2], which is discussed in Section 3.1.In this formalism, the quantum embedding is simplified into sub-components, including feature-dependent block, entanglement structure between qubits and parameterized rotations (considered as model weights).This layered structure is referred to quantum neural networks (QNNs), which enables feature maps in Hilbert space.Although only consider very restricted design in compare to universal design of ansatz, these designs show promising results towards applications of quantum machine learning in NISQ era.Besides, the optimization of such embeddings remain challenging even in the restricted designs since all combinations of the choices and number for rotation or CNOT gates form a massive search space.Hence, QES reduces the search space by a assumption that the rotation gates are the static components of the ansatz.In other words, our proposed scheme develop the optimal entanglement layout for quantum embeddings.We theoretically show that even in very restrict manner, the search space of finding optimal entanglement layout is massive and exponentially expanded when the number of qubits increases in Section 3.2.Thus, the SMBO-TPE for search strategy of ansatz optimization offers several advantage over other competitors such as genetic algorithms or reinforcement learning.First, SMBO leverages surrogates to approximate the true value of fitness function, which significantly reduce the cost for training QNNs in the current implementation.Second, the chosen search strategy enables efficient searching by leveraging the prior knowledge.Finally, with sorted queries in the history, SMBO using TPE can be efficiently scalable when the search space expanded (Section 3.3).

Quantum Embeddings
Let x ∈ X be the feature vector in classical data space, quantum embedding is similar to the kernel method since the input feature space is mapped to a high-dimensional Hilbert space [2].Mathematically speaking, the mapping is given by where φ(x) is the quantum representations of original input data and H is the quantum Hilbert space.In particular, a system of n qubits is corresponding to a vector space C 2 n .The measurements on these quantum states yield embedded outputs, that is the representation of the observable in the latent space.In general, the intermediate representations of a quantum state can be written as where Z is the measurement associated with the observable.Our choice for Z is the expectation values in Z basis over all qubits, which is Z = ⊗n 1 σ z .State-of-the-art quantum machine learning model [2], [5] transforms the expectation in continuous domain to categorical labels by thresholding the outcome.In contrast, we leverage the continuous latent representation of the measurements.The decision boundary of the quantum machine learning model in Figure 1 is created by single-layer linear classifier.We will show the power of representation learning from quantum embeddings over classical counterparts with five-time complexity hereafter in Section 4.
The layered gat architectures of a quantum embedding includes a stack of multiple circuit ansatz (Figure 1) which results in intractable latent representations for universal quantum computing [40].The Quantum Approximate Optimization Algorithm (QAOA) [28] inspires the embedding circuit ansatz, which transforms the classical input data into quantum representations.Figure 1 shows our assumption for the design of a partially parameterized circuit ansatz, which includes an input-depend unitary block, fixed unitary block and learnable unitary block.Primitive gates of unitary includes Control-rotation gates R σ1∈{X,Y,Z} (x), which are parameterized by the input data x.Then immediate quantum representations U 1 |φ(x) are fed forward into the fixed unitary block including multiple entangling patterns active on certain number of qubits.The primitive gates for entanglement establishment is CNOT gate, which offers a highly entangled state over all qubits in the system.Finally, the last block of an ansatz circuit includes control rotation gates R σ2∈{X,Y,Z} (w) Fig. 2: Directed multi-graph representation for given entangling structures.The representation of corresponding graphs are also represented as adjacency matrices, where diagonal entries are 0 and off-diagonal elements takes value 0 (absence of CNOT gate) or 1 (presence of CNOT gate).Moreover, these hand-crated designs are referred as baseline 1 and baseline 2, respectively.parameterized by learnable weights w.Mathematically, the ansatz acts on n-qubits system as where Ψ(x, w) is parameterized unitary transformation of w with realizations x.By backpropagation, the set of weights w will be learned to minimize the cost function throughout the training process.Figure 2 depicts several manual designs for strongly entangled patterns of an quantum embeddings, which will be considered as baseline comparison for Section 4.

Observations:
Before further exploration, we would like to address several findings from our observations over the preliminary experiments (Figure 4). 1) Different entangling structures result in varying loss values in the validation set.2) Entangling structure is permutation variant.In other words, the order of CNOT gates over qubits significantly impacts the overall performance.3) Larger number of CNOT gates does not guarantee higher predictive power of the resulting architecture.4) Repetitions of similar entangling connections are possible.
The detail of preliminary experiments on the Iris dataset is given in Appendix C, where several statistical tests are conducted to provide statistical evidence for our observations.

Encoding Scheme
We proposed a representation for entangling layouts as directed multi-graphs, in which vertices represent qubits and edges formed by CNOT gates.The main objective of our proposed work is to find the optimal structure for entangling patterns on a given dataset.Hence, we fix the choice of rotation axis in the first and third parameterized unitary blocks at σ 1 = σ 2 = Y [5].In other words, we only consider parameterized control rotation by Y-axis over all qubits.Moreover, the encoded graph representation for each candidate circuit layout in the search space is associated with an asymmetric adjacency matrix (Figure 2), which allows better illustration in the complexity analysis.

Complexity Analysis
Given a set of N qubits, the number of off-diagonal entries for the adjacent matrix is given as With regard the order permutation of CNOT gates in candidate circuit layouts, the total number of possible candidate in the search space of N qubits embeddings is As a result, the search space of all possible circuit layouts is extensively large (≈ 1.3 × 10 9 ) even though we only consider a small system of 4 qubits.Moreover, the complexity of the search space is exponentially expanded when increasing the number of input qubits, which is tremendously hard to find the optimal circuit layout for the entanglement block.Thus, we proposed additional parameters for the search space configuration, called the entanglement level, which is set to be the fixed number of CNOT gates within the entangling layer.Given a pre-defined entangling Fig. 5: General framework of automated model search.After well-defining the search space, the automated intelligence draws an architecture candidate following search strategy.The drawn architecture will be evaluated using selection criteria, which create response for updating the search intelligence.In investigating search strategy SMBO-TPE, the prior knowledge of search intelligence will be updated corresponding to response score during the search phase.
level k, the total number of possible circuit candidates in the reduced search space is: By implementing the proposed entangling level, the search space is reduced to a reasonable cardinality for finding optimal circuit architecture.Together with proposed encoding scheme (Figure 3), each circuit candidate in the search space of N qubits constrained by entangling lever k is represent by an encoded genotype vector α of length k, whose each component is corresponding with each element of the ordered set of all possible connections E = {e i } i∈{0,...,E} .

Sequential Model-based Optimization
Figure 5 illustrates the framework of automated search for optimal configurations of the entanglement design.This procedure is motivated by hyper-parameters optimization, which has been wellstudied in classical machine learning but not being utilized in quantum machine learning.For example, the optimization method is commonly used to find the optimal configurations of neural network (number of layers, initial channels or kernel size) or optimal training setting (learning rate or type of optimizer).It is worth mentioning that sequential model-based optimization (SMBO) enables efficient searching on highly complex search space (hundreds of dimensions [11]), which is a promising search strategy for the defined search space in Section 3.2.Specifically, SMBO does not requires the computation of the true fitness function (true loss) to proposing the most potential candidates for the next trial.Instead, SMBO leverages surrogate models to approximate the true cost function and samples the most promising candidates based on selection criteria such as conditional entropy of minimizer, bandit-based criterion or expected improvement [11].In this proposed research, we consider the expected improvement (EI) as the criterion for the sampler due to its versatility in different hyper-parameter settings [11].This characteristic is not appeared in other search strategy such as reinforcement learning or gradient-based search since the methods demand the computation of the true cost function to update their samplers.
We employ the sequential model-based optimization (SMBO) using tree Parzen estimator [11] as the search strategy for the optimal circuit candidate, which can be mathematically stated as where ŷ is the prediction of the model.The cost function val (y, ŷ) is the validation loss value corresponding to the circuit candidate α, which is modeled by less computational expensive surrogate S(α) by SMBO.In other words, we approximate the true loss function by surrogates that are much simpler to be computed.Essentially, the usage of surrogates enables sampler to propose the most promising candidate for the next trial based on the prior knowledge.In the inner loop of Algorithm 1, we aim to optimize the Expected Improvement (EI) under surrogates S(α) using numerical optimization .The Expected Improvement (EI) is the expected value from a given statistical model f (x) in the model space M that the value f (x) will be greater than a given threshold t.Mathematically, we have: where M is an arbitrary model in the model space that L (α) val (y, ŷ) will exceed t.Moreover, p M (t|α)s the conditional probability density of outputs corresponding to the candidate model parameterized by genotype vector α .The TPE estimator enables the decomposition of the conditional probability p(t|α) as two densities: reforming the EI in Equation 8into Due to the decomposition, the TPE estimator enables sampling multiple candidates based on the density l(.), allowing more efficient estimation of the EI.Furthermore, TPE estimators is different from Gaussian-based estimators (GP) in the choice of thresholding value for output t.Specifically, GP favours values less than the best evaluation in the history while TPE favours t larger than such a best observation.Hence, TPE estimators can propose a threshold value correspond to some quantile of outputs, which enables more efficient sampling.Besides, SMBO-TPE is initialized with the prior distribution of discrete variables p prior i , which have the same length as the genotype vector.Thus, the posterior is proportional to Lp prior i + C i , where L is the length of genotype vector and C i is the number of choices for each element in the genotype.As a result, the search time of each trial using SMBO-TPE can be linearly scaled with the length of genotype vector with sorted observation in the history H.This properties is very desirable since the search space of entangling layout is exponentially expanded with increasing number of input qubits or primitive gates.The final evaluation metric to compare the performance of derived architectures is the validation loss, which is computed by the validation dataset, illustrating the generalization of such embeddings.Fig. 6: Results from search phase with increasing number of entanglement levels k on low dimensional datasets including synthesized and Iris datasets.The best validation score on synthesized data is 0.4746 ± 0.011 using k = 8.On the other hand, there is not enough statistical evidence to show that larger k leads to better entanglement layout on Iris dataset.

Procedure
We would like to summary the general procedure to implement our proposed framework in this section.
Step 1 -Initialization: The current implementation of quantum computers and quantum simulator only supports very limited number of noisy qubits, which restricts the capability of quantum embeddings within low-dimensional datasets.However, we can mitigate the curse of dimension for quantum embeddings by adopting hybrid classical-quantum neural architectures [32].In these hybrid architectures, the classical component plays a role as an autoencoder (feature extractor), that transforms the input space R p → R q , where q < p.Thus, if the given dataset is lowdimensional (number of features is less than number of available qubits), the original features will be directly used as input of quantum ansatz.Otherwise, classical autoencoder will be used to reduce the number of input features, then the resulting feature maps will be used as input of quantum embeddings.We will give more details about the hybrid classical-quantum architecture with use-cases in Section 4.
Step 2 -Search Phase: Given q number of input features, we create an ansatz (illustrated in Figure 1) with q number of qubits for the quantum embeddings.Then, we initialize the first search space in Section 3.2 by selecting k = q for the first iteration.This initial guess for k is equivalent to baseline manual entanglement layouts such as fully entanglement (Figure 2).We further perform searching by using increasing values for k until the improvement gain appears diminishing.As a consequence, our proposed framework only guarantees to find the local optimal solution of the entanglement layout since we only investigate limited number of search spaces associated with the entanglement level.However, finding the global solutions remains an extremely difficult challenge for architecture optimization in machine learning.Therefore, the greedy heuristic for selecting k used in our proposed work enables locally optimal solution while requiring reasonable computational resources such as hardware requirement or searching time.
Step 3 -Evaluation Phase: After discovering the entanglement structure, we compare the derived quantum ansatz with other classical counterparts using predictive performance as the evaluation metrics.

NUMERICAL EXPERIMENTS
In this section, we report the experiment justifications for the effectiveness of our proposed QES by using different data scenarios.Incorporating with Section 3.4, we first evaluate QES on low dimensional datasets using stand-alone quantum embeddings, including synthesized datasets and the IRIS dataset.The generated dataset includes 400 observations in the feature space of dimension 4, which involves the classification of three classes.We employ a small factor of hypercube size to obtain a synthesis dataset that is hard to be separated.Secondly, we further investigate the validity of QES on more challenging datasets, including Breast Cancer and Digits datasets.Although these datasets are not highly complex for classical machine learning, they can be considered high dimensional datasets for the current implementation of quantum computers and simulation.Hence, we leverage the hybrid classical-quantum neural architecture for these experiments.The detailed hyper-parameter setting for training the quantum embeddings is given in Appendix A.

Synthesized Dataset
In the search phase on synthesis dataset, we consider search spaces corresponding with increasing values of k.Moreover, the depth of each architecture candidate is set equal to two layers during both search phase and evaluation phase.Figure 7 illustrates the found architectures using SMBO-TPE and baseline random search.It is worth mentioning that the model complexity of the two setting is equivalent, that number of parameters in derived embedding architectures are equal.It is because the primitive CNOT gates does not contain any learnable weights.The score function for both of search procedure is based on the loss value on the (independent) validation set.We compare our found architectures with two common baseline entangling structures (strongly entangled layers) in Figure 2 and also with classical machine learning counterparts.In Table 1, our discovered quantum embedding circuit under gains a significant improvement in comparison to the baseline structures, while nearly achieve the performance of SVM and XGBoost with only minimal gap Fig. 7: Discovered quantum circuit architectures from different search space configurations and search strategies on synthesized dataset.Found architecture contains a sequence of CNOT gates, which establishing entanglement over all qubits. of 0.5%.Moreover, an obvious improvement of over 10% gain in validation accuracy is witnessed in comparison with neural network with the same number of parameters.Besides, expanding the search space yields the locally optimal value k = 8, which enables discovering circuit structures with higher predictive power, consistently for both search strategies.
We further analyze the effectiveness of SMBO-TPE compared to baseline random search.Figure 8 shows the intermediate values of search strategy over trials of the two investigating optimization approaches.Overall, the validation loss converges after 50 epochs for both settings of the search space.Moreover, optimal architectures are found in early trials using SMBO-TPE, while random search discovers such architectures in late trials of the search phase.Another advantage of SMBO-TPE over random search is depicted in the parallel coordinate plot, where we can see that the TPE sampler leverages the knowledge during the search phase to update its prior knowledge.Sampling results from TPE concentrate on edges that potentially form higher predictive performance, while the random search's sampled edges are widely spread throughout the configuration space.
Finally, the search cost of quantum embeddings is significantly higher than searching for classical neural network architectures due to the computational limitation of near-term quantum simulators.For example, candidates in neural architecture search are convolution neural networks involving up to millions of parameters, which can be found in only 0.25 − 8 GPU days by recent state-of-the-art NAS algorithms [18], [20].On the other hand, training a minor quantum embedding of very few qubits consumes much larger computational expenses.Our experimental setting takes 2 − 4 GPU days to search for an architecture of only 23 parameters on the quantum simulators.Fortunately, the computational expenses are majorly accounted for training the quantum embeddings.In other words, the enhancement of quantum computing and quantum machine learning in the near-term devices that accelerates the trainability of quantum embeddings will be directly benefited by the proposed QES in term of search time.

Results on Iris Dataset
In this experiment, we use the original Iris datasets without any classical pre-processing, which includes 4 input features.We witness that the Iris dataset contains very well-representation Neural Network (fair) 71.00 -SVM (ovo) [41] 84.00 -XGBoost [42] 83.50 -Entangling Baseline 1  observations, which is much easier to be separated in comparison to the synthesized dataset (Figure 15).Hence, the magnitude of validation loss using different k is hard to capture.From the left panel of Figure 6, there is not enough statistical evidence to show that increasing number of primitive results in better predictive performance.Thus, we only investigate the search space initialized by baseline k = 4.We present the found structure for entanglement of quantum ansatz in Figure 9, which involves a stack of two identical found architectures in the search phase.The same pattern in the simulated dataset, where the TPE sampler leverages the knowledge learned from response scores to update its prior distribution, enables better architectures.The discovered architecture achieves 95.33 ± 0.0125 in the validation accuracy (based on ten independent runs), outperforming two baseline designs close to 2%. Figure 10 presents the convergence of model weights from founded quantum embedding, which indicates stable neural solution.
We further compare the proposed quantum embedding with classical embeddings using fair neural network.In the quantum   classifier, there are 8 learnable parameters for the variational rotation gates in the embeddings, which is followed by a classical post-processing using fully connected layer with 15 parameters.Thus, the total number of parameter for the quantum classifier is 23 parameters, including both quantum embedding and postprocessing layer.We construct fair autoencoders which involves a single layer of hidden nodes (ranging from 6 to 10), followed by the same fully connected layer for the classifier.Since training small network may result in varying classification performances, we train each neural networks 100 times and report their accuracy mean.As a result, the proposed quantum embedding outperforms a fairly classical neural network with 7 hidden nodes by over 0.3% (95.33% vs. 95%), while contains approximately 2.5× less of parameters (Table 2).

Experimental Results of Hybrid Classical-Quantum on Higher Dimensional Datasets
In this section, we evaluate the proposed QES on more challenging data scenarios.These datasets have higher number of original features than the number of available qubits on quantum ansatz, which is summarized in Table 3.We leverage a classical autoencoder, which includes a stack of 3 fully connected layers, to reduce the dimensionality of input dataset for quantum ansatz (Figure 11).The usage of hybrid neural architectures may raise the concern whether the effectiveness of learning comes from classical or quantum components.Therefore, we also analyze the representation learning ability from each component by compare pre-and post-quantum feature maps.The detailed comparison will be given within each case study hereafter.The left panel of Figure 13 shows the optimal values for entanglement level k following procedure in Section 3.4.On the wine dataset, the optimal quantum circuit is identified in the The data scenarios is when the number of input features (p) is larger than the number of available qubits (q).We decompose the architecture to investigate the representation learning ability of each component.search space generated by k = 5, which is visualized in the left panel of Figure 12.On the other hand, k = 7 is found in breast cancer dataset, resulting in the structure of CNOT depicted in the right panel of Figure 12.We analyze the performance of found ansatz and compare to fair classical neural networks.It is important to emphasize that we only replace the quantum asatz by a fully connected layer.In other words, the classical autoencoder (Figure 11) is used in all experiments and contributes a static   number of parameters to the model complexity.We report the configuration of fair classical neural networks (NN) in the right panel of Figure 13, in which the NN has single depth d = 1 with h hidden nodes.In addition, the total number of weights of the whole hybrid architecture is reported for comparisons.As a result, the proposed quantum ansatz outperforms classical NN in the wine dataset (based on 100 independent runs) while maintains less a number of parameters.Regarding Breast Cancer dataset, the performance of discovered ansatz is higher than NNs with 4 and 8 hidden nodes, but slightly lower than NN with h = 16 despite possesses the least number of parameters.We aware that the usage of hybrid architectures may raise a critical concern about the effectiveness of found ansatz.Particularly, it is not hard to wonder that the learning has already done by the classical autoencoders, since they have a relatively large model complexity for such less complex problems like Wine and Breast Cancer datasets.Thus, we decompose the neural architectures and analyse the representations pre-and post-quantum ansatz (red and blue vectors in Figure 11).The evaluation of representation learning can be delivered using T-SNE [43] visualization of feature vectors, in which we can observes the distance between clusters of classes, depicted in Figure 14.In the T-SNE visualization from the wine dataset, the decision boundary between clusters are unclear when we visualize the pre-ansatz features.In contrast, T-SNE of post-ansatz features shows clearer boundary between classes, resulting in three separated cluster.This is consistent to the predictive performance of the hybrid architecture with ansatz, which possesses the test accuracy of 98.34 ± 0.0021.The same observations can be seen from Breast Cancer dataset, where the T-SNE of pre-quantum representations is difficult to be separated by linear classifier, while that from post-ansatz features are wellseparated.The experiments enable insights into the effectiveness of found ansatz, which leads to more efficient representation learning, in term of predictive performance.

Implication
Beyond the numerical experiments, we would like to address several general principles from our QES.Our proposed approach provides an automated search intelligence that can find an optimal architecture of a quantum embedding circuit for a given dataset.
It is reasonable to believe that there is no universal design of the embedding structure for every dataset, but instead, we can derive optimal architectures that well-performs on the dataset.

Threads to Validity
Threats to the internal validity of QES consider the reproducible ability of the algorithm, which is the most challenging factor in automated machine learning [44], [45].As we are in the noisy intermediate-scale quantum era, such issues have been amplified compared to classical computational hardware.Moreover, our proposed QES relies on the assumption of the entangling level that tremendously reduces the cardinal of the search space.Hence, we lack empirical evidence of the effectiveness of QES on expanded search space, especially when the number of qubits is scaled.We want to defer the investigation of such a problem for further study.Nevertheless, QES well-performs in small qubits can achieve similar results with classical machine learning counterparts and outperform basic entangling structures.Threats to external validity include the generalization of QES on different data scenarios, the number of qubits in the system, and noises inherited from actual quantum computing hardware.Although the experimental results from simulated quantum computing hardware are robust and stable, the story may change when we implemented QES on near-term noisy quantum computers.Another thread is the computational limitation of near-term quantum computers and quantum simulators.The cost for training single quantum embedding is remarkably higher than training classical encoders in the quantum simulations, which leads to a very high computational expense for the search phase.These threats indicate future research opportunities in quantum embeddings, including implementation of different search strategies.

Conclusion
This paper proposes an automated procedure for finding optimal quantum embeddings architecture that leads to high representation learning ability on the quantum Hilbert space.The algorithm is accessible and promising compared to the classical machine learning model, which can be implemented on near-term quantum computers.Although our QES cannot guarantee to find the global optimal design of the quantum embedding architecture in any full search space, it can certainly discover high-performed architecture solutions under the constrain of the entanglement level.

2 )Fig. 1 :
Fig. 1: (Top) Quantum embeddings for supervised quantum machine learning.Similar to kernel methods, quantum embedding transforms observations in classical data space into quantum Hilbert space of quantum states, which the inner product of quantum representations can represent.(Bottom) Architecture of a quantum machine learning model formed by selected set of quantum circuits.The ansatz circuit plays an crucial role in the circuit model, enabling learning model's weights w accordingly to input data x.

Fig. 4 :
Fig. 4: Preliminary results of quantum embeddings with different entangling structures on Iris dataset.The permutation of CNOT gates leads to significant improvement in terms of validation loss.Moreover, extending the number of CNOT gates may reduce the performance of the embedding.

Fig. 8 :
Fig. 8: Training response of all trials under each search space configurations and search strategies on synthesis dataset.Each architecture candidate is trained for 100 epochs at each trial.The validation loss converges within 50 epochs (as shown).Best score is witnessed from the search space with k = 8 under SBMO-TPE.

Fig. 9 :
Fig.9: Found architecture of quantum embedding and its optimization history.The same pattern is as presented on simulated dataset.The full architecture of the quantum embedding includes stack of two identical layers with reported entanglement layout as in the left panel.

Fig. 12 :
Fig. 12: Discovered entanglement layouts using hybrid classical-quantum neural architectures.The final ansatz includes a stack of two layers.

Fig. 13 :
Fig.13: Results from the search phase using hybrid classical-quantum neural architecture on breast cancer and wine dataset.(Left)The locally optimal value for entangling level from the breast cancer data is k = 7, while that from wine data is k = 5. (Right) Evaluation results from found ansatz, in comparison to fair classical neural networks.

Fig. 14 :
Fig.14: Analysis on representation learning pre-and postquantum embeddings using T-SNE: (Top) Wine dataset, (Bottom) Breast Cancer dataset.The T-SNE features from pre-ansatz representations is hard to be separated using linear classifier, while those features from post-representations can be well-separated using the same classifiers.
Algorithm 1 Sequential Model-based Optimization via TPE estimator Given search space Ω initialized by k, cost function L(.), initial model candidate M 0 , T number of iterations, surrogates S(.) and history H.

TABLE 1 :
Comparison of found architectures with classical machine learning models and baseline hand-crafted entangling structure on synthesis dataset.The evaluation is the validation accuracy based on 5 independent runs.Discovered quantum embedding outperforms baseline designs while achieves compatible performance as classical machine learning models.

TABLE 2 :
Comparison between quantum and classical networks on Iris dataset.The results is mean and standard deviation of test accuracy based on 100 independent runs.

TABLE 3 :
Description of datasets used in experiments of hybrid classical-quantum neural architecture.