Model-Predictive Quantum Control via Hamiltonian Learning

This article proposes an end-to-end framework for the learning-enabled control of closed quantum systems. The proposed learning technique is the first of its kind to utilize a hierarchical design, which layers probing control, quantum state tomography, quantum process tomography, and Hamiltonian learning to identify both the internal and control Hamiltonians. Within this context, a novel quantum process tomography algorithm is presented, which involves optimization on the unitary group, i.e., the space of unitary operators, to ensure physically meaningful predictions. Our scalable Hamiltonian learning algorithms have low memory requirements and tunable computational complexity. Once the Hamiltonians are learned, we formalize data-driven model-predictive quantum control (MPQC). This technique utilizes the learned model to compute quantum control parameters in a closed-loop simulation. Then, the optimized control input is given to a physical quantum system in an open-loop fashion. Simulations show model-predictive quantum control to be more efficient than the current state-of-the-art, quantum optimal control, when sequential quadratic programming (SQP) is used to solve each control problem.


I. INTRODUCTION
Quantum information science is a rapidly growing field that seeks to utilize quantum dynamical systems to perform sensing, communication, or computation [1]- [6]. The success of many quantum devices, such as those employing quantum bits (qubits), is dependent on the precise control of their dynamics [7]- [17]. In fact, quantum control techniques can be used to realize quantum gates [18]- [20], and quantum gates have numerous applications in quantum information science, including localization, synchronization, communication, and computing [21]- [24].
Quantum optimal control (QOC) has been of significant research interest for the first part of the 21st century [25]- [34]. However, much like classical systems, the dynamics of quantum systems are not always known, which limits the breadth of control strategies that may be used. A promising means for overcoming this uncertainty is via quantum Hamiltonian learning (QHL). Learning quantum dynamics from data is difficult due to the fundamental limitations of observing quantum phenomena via classical measurement. Measurement devices interact with quantum states and can result in probabilistic outcomes. Quantum learning-based system identification strategies must work with incomplete or imperfect information about quantum states.
To overcome the obstacle of incomplete information about quantum states, methods such as quantum state tomography (QST) have been developed to infer quantum states via repeated measurements of identically prepared states [35]- [38]. Quantum process tomography (QPT) has been developed to infer a quantum process, which maps an input state to an output state [6]. A survey on resource consumption and measurement requirements for various QPT algorithms is given in [39].
Both QST and QPT have been used to infer the evolution of closed quantum systems and ultimately learn the system Hamiltonian in [40]. The method in [40] utilizes a large number of initial states to perform QPT. The work in [40] gives an explicit upper bound on the estimation error for the learned Hamiltonian. Conventional system identification approaches have been used to estimate the Hamiltonian [41]- [44]. In particular, control signals are used in [43] and [44] to infer unknown parts of the Hamiltonian. Alternatively, learning-based approaches have been used to estimate the Hamiltonian [45]- [47].
Inspired by the learning-based control of classical systems [48]- [51], the learning-based control of quantum systems has also been proposed [52]- [59]. These methods work in an iterative fashion to improve the control policy used in each quantum experiment. However, these techniques are model-free and do not come with theoretical guarantees. On the other hand, if a model of the quantum system is established, one may perform model-based control. For several decades, QOC has been the most popular approach in the model-based quantum control literature. In addition to some early works in this area [60]- [62], recent results have reiterated the strength of QOC and expanded upon earlier possibilities [63]- [65]. An influential QOC technique is the so-called gradient ascent pulse engineering (GRAPE) algorithm for computing optimal nuclear magnetic resonance (NMR) pulses [61], which was expanded upon in [66]- [69]. The principle of the GRAPE algorithm is to use gradient-based solvers to compute optimal control parameters for quantum systems. Other highly successful QOC optimization schemes include the chopped random basis (CRAB) [70], [71] and gradient optimization of analytic controls (GOAT) [72] algorithms, which use ansatzes of basis functions to formulate control pulses. The purpose of these algorithms is to identify optimal tuning parameters within the ansatz.
In the classical control literature, layering model-based and model-free control has been shown to produce desirable results [73]. This has been done recently in the so-called C 3 technique for controlling quantum systems [74]. The term C 3 stands for "control, calibrate, and characterize." As the name implies, this work meshes the ideas of controlling and identifying the dynamics of the system. The first phase utilizes a control ansatz based on the derivative removal by adiabatic gate (DRAG) method. The tuning parameters in the DRAG control pulse are then optimized in a closed-loop experimental setting with a physical quantum system in the calibration phase. Finally, the characterization phase uses the data generated in the previous phase to update an ansatz model of the quantum system.
Model-predictive control (MPC) for infinite-dimensional quantum systems has been considered in the case where the model of the system is fully known [75]- [77]. These works only investigate systems with pure states governed by the Schrödinger equation, and their control designs only consider quadratic objectives. Recently, an optimization procedure known as sequential quadratic programming (SQP) has shown great promise to design robust quantum gates in the context of QOC [78]. We note that SQP can also be employed to solve MPC problems. Hence, investigating SQP's use in MPC for quantum systems is of interest.
Most of the methods outlined above are based on the optimal control of closed quantum systems, and additional calibration and refinement may be needed for addressing open quantum systems. Real-world applications of controlling closed quantum systems include, but are not limited to, control of spin systems in NMR [79]- [82], control of molecular systems in physical chemistry [83]- [85], and forming baseline control policies for manipulating superconducting qubits [86]- [91].
Learning and control of quantum dynamics poses challenges not yet solved by the literature. There is a need to learn both internal and control Hamiltonians. The internal Hamiltonian describes the free evolution of the quantum system, and the control Hamiltonian describes the system's interaction with external fields. While successful, the quantum tomography (QT)-based QHL algorithm proposed in [40] requires the practitioner to readily and repeatedly prepare d 2 (where d is the dimension of the system) unique quantum states to identify a system's internal Hamiltonian and does not address learning the effect of control on the system. In terms of control, QOC is often computationally expensive, and closed-form solutions are only known in specific cases with simple models or restrictive assumptions. For instance, time-optimal control of a qubit is known to be of bang-bang type [26]; however, such discontinuous control signals may excite unwanted energy levels in qubit. The work-horse GRAPE algorithm, which has become the standard for computing optimal quantum control pulses, still takes a significant amount of computational time [92]. Moreover, data-driven techniques are desirable when no model of the quantum system is known. While the C 3 method [74] has shown promising results, like [70]- [72], it utilizes a heuristic ansatz for its control policy and can only improve the efficacy of the control by tuning parameters (such as the amplitude and frequency of the actuation) within this policy. In summary, there is a need for scalable and efficient quantum learning and control algorithms. The fundamental questions related to quantum system identification are: Is it possible to reduce the number of quantum experiments required, computational complexity demanded, and memory storage needed in order to properly identify a quantum process? How can the error of various Hamiltonian identification methods be bounded in terms of the number of quantum experiments performed? A reduction in experimental, computational, and memory complexity will enable the identification of higher dimensional quantum systems-which is paramount to the future of quantum computing. Improved error bounds on this process will allow practitioners to design robust model predictive quantum control (MPQC) strategies. The goal of this article is to provide an entirely data-driven end-to-end solution to the quantum control problem.
In this article, we attempt to bridge the gap between learning and control for quantum systems. By not relying on  manually designed models, which can be brittle, our method will account for the complexity of real-world applications. The contributions of this article are twofold. First, a new QT-based QHL algorithm for identifying both internal and control Hamiltonians for closed quantum systems is proposed. This technique is visually summarized in Fig. 1. Given a physical system under the influence of an external control field, our method utilizes QST and QPT to learn the Hamiltonians of interest. Within this algorithm, a new optimization-based QPT procedure is presented. This optimization procedure is defined over the unitary group, i.e., the set of all unitary operators, which is physically motivated by the dynamics of closed quantum systems. An efficient iterative method for solving this problem is presented, and, in a special case, a closed-form solution and error bound are given. The proposed QT-based QHL algorithm requires less memory than the existing state-of-the-art and, in some cases, greatly reduces the computational complexity.
The second contribution is that, for the first time ever, we formulate data-driven MPQC, which uses the proposed QT-based QHL algorithm to control quantum systems endto-end with unknown dynamics. MPQC leverages the ideas from MPC for classical systems to produce an optimizationbased approach to controlling quantum systems. In our experiments, when an SQP solver is used, MPQC is two orders of magnitude faster at computing control sequences than QOC, yet provides similar control performance. The MPQC formulation is summarized in Fig. 2. First, MPQC generates a control sequence offline in a closed-loop simulation using a model of the quantum system [see Fig. 2(a)]. In this simulation, the state of the quantum system can be fully known, unlike that of a physical quantum system. Once a control sequence has been optimized offline, it may be provided to a physical quantum system in an open-loop fashion so as to preserve the coherence of the physical system [see Fig. 2 The key contributions of this article are as follows: 1) develop a novel data-driven QHL algorithm to estimate both internal and control Hamiltonians; 2) prove error bounds on the proposed control QHL methods, under appropriate assumptions; 3) formalize an MPQC framework for the learningenabled open-loop control of quantum systems; 4) illustrate the scalability and efficacy of the proposed data-driven MPQC via numerical experiments.
The rest of this article is organized as follows. Section II describes QPT and our novel approach to QPT. Section III utilizes the proposed QPT method to perform QHL and identify both the internal and control Hamiltonians of a closed quantum system. This is followed by Section IV, which proposes data-driven MPQC. Numerical results are given in Section V. Finally, Section VI concludes this article.
Notations: Sets are denoted by calligraphic font, e.g., X . In particular, Z m:m+M is defined to be the set of integers from VOLUME 3, 2022 Engineering uantum Random variables are displayed in sans serif upright fonts; their realizations in serif italic fonts. Vectors and matrices are denoted by bold lowercase and uppercase letters, respectively. For example, a random vector and its realization are denoted by x and x; a random matrix and its realization are denoted by X and X, respectively. In the context of quantum states, complex vectors are denoted using bra-ket notation, i.e., |ψ and ψ| denote a complex vector and its conjugate transpose, respectively. The d-by-d identity matrix is denoted by I d : the subscript is removed when the dimension of the matrix is clear from the context. The Frobenius norm and spectral norm of X are denoted by X F and X , respectively. The Euclidean and infinity norms of the vector x are denoted x and x ∞ , respectively. The notation diag(A 1 , A 2 , . . . , A n ) represents a block diagonal matrix with the arguments on its main diagonal. The trace, transpose, conjugate, and conjugate-transpose of the matrix X are denoted by tr{X}, X T , X * , and X † , respectively. A, B − denotes the commutator of matrices A and B. The Kronecker product is denoted by the symbol ⊗, and we will denote the Kronecker product of multiple state vectors as |ψ 1 ψ 2 · · · ψ n |ψ 1 ⊗ |ψ 2 ⊗ · · · ⊗ |ψ n . The notations R{·} and F{·} denote the real and imaginary parts of the given argument, respectively. If f, g : C p×d → R are arbitrary functionals, f is said to grow on the order O(g(X)) if there exists some numerical constant C ∈ R such that f (X) C g(X) for all X ∈ C p×d . The magnitude of a complex number z = a + ıb ∈ C is denoted by |z| = √ a 2 + b 2 , where ı = √ −1. Notations and definitions for important quantities used in this article are summarized in Table 1. Important acronyms and optimization problems are also summarized in Tables 2 and 3, respectively.

II. SCALABLE QPT ALGORITHM
In this section, we present a new and scalable QPT algorithm. For ease of exposition, Section II-A considers systems which start from pure states and Section II-B addresses those which start from mixed states. If one's experiments start from both pure and mixed states, the method provided for mixed states will suffice since it is possible to represent any pure quantum state as a mixed state. The proposed QPT formulations rely on solving an optimization problem on the unitary group, which may be achieved via efficient iterative algorithms. Sections II-C gives a method for computing iterative solutions to the QPT problem. Section II-D presents a closed-form solution to the QPT problem in a special case, which provides further insight into the nature of the proposed method.

A. LEARNING FROM PURE STATES
We begin by considering a closed quantum system and assume no explicit external control influence. A pure state vector, or wave function, |ψ(t) ∈ C d evolves according to the   (1) where is the reduced Planck constant and H ∈ C d×d is the system Hamiltonian, which is an unknown Hermitian operator. For now, H is a time-independent Hamiltonian. In this article, we elect to use atomic units such that = 1. At any time, |ψ(t) is of unit length such that |ψ(t) = 1.
Given the initial state |ψ(0) ∈ C d , it is possible to calculate the state at any arbitrary time in the future by directly solving the Schrödinger equation: Here, we have defined the operator U (t) e −ıHt . The matrix exponential of any skew-Hermitian matrix is unitary. Hence, U (t) † U (t) = I d . The relationship (2) highlights the importance of closed quantum systems: since the evolution of the state vector is unitary, the system preserves state coherence.
The set of all unitary operators defined on the space C d×d is called the unitary group, which is denoted Throughout this article, every problem for learning a quantum system's dynamics will be formalized as an optimization problem defined on the unitary group. Because closed quantum systems evolve unitarily, U (t) ∈ U(d) for all t.
We are interested in a system identification problem of the following form. First, let t f ∈ (0, ∞) be some final time where we perform a measurement and end the experiment.
is a matrix of N i initial states, whose nth column is a state vector |ψ e n (t 0 ) , and is a matrix of N i final states, whose nth column is a state vector |ψ e n (t f ) . The system identification goal is to learn the mapping U (t f ), which will be accomplished via QPT. As shown in Section III, it is possible to estimate the system Hamiltonian H from U (t f ).
It is assumed that the system is initiated in a known state. However, it is only possible to estimate the final state matrix Ψ e 1:N i (t f ), which can be accomplished via QST. 1 The associated state estimation error can be written as where Ψ e 1:N i (t f ) is the estimate of the final states matrix, with the nth column denoted by |ψ e n (t f ) . The QPT methods proposed in this article are agnostic to the underlying QST algorithms used to estimate the final states.
To identify U (t f ), one needs a measure of distance between two quantum states. An appropriate measure is the infidelity [93]. For two pure quantum states |ψ and |φ , the infidelity between them is denoted and defined by This function takes on values in the range of 0 to 1. An infidelity of one corresponds to perfectly orthogonal states, and an infidelity of zero indicates that |ψ and |φ are identical up to a possible global phase difference. In fact, the selection of the global phase is arbitrary, namely, e ıθ |ψ and |ψ are functionally equivalent in quantum dynamics, and they represent the same physical state for any global phase θ ∈ R. An important interpretation of the infidelity is as follows. The term | ψ|φ | 2 is the probability that |ψ will be measured to be |φ . Hence, minimizing the infidelity between two pure quantum states translates into maximizing the probability that one will be observed as the other.
Since U (t f ) : |ψ e (t 0 ) → |ψ e (t f ) , we wish to find an where |ψ e (t f ) is an estimate of the final state obtained by using a QST algorithm. This is the distance between the estimated final state and the one predicted bŷ U (t f )|ψ e (t 0 ) . Hereafter, we will omit the explicit dependence ofÛ (t f ) on t f when it is clear from context. Since there are many initial states and final states, we seek to minimize the sum of the infidelities between all the pairs of initial conditions and final conditions. This gives rise to the following optimization problem, which is the basis of our proposed QPT algorithm: where the objective function sums over all state pairs in (3) and the estimate Ψ e 1:N i (t f ) of (4). LetÛ denote the optimal solution of P 1 . It should be emphasized that Problem 1 is an optimization problem defined over a constraint manifold, which is a distinct feature of our QPT formulation.
Next, Problem 1 is transformed into a simpler problem, which can be solved efficiently. The new problem takes similar form to the well-studied Brockett cost function from the controls and optimization literature [94]. The Brockett cost where A n ∈ C d×d and B n ∈ C d×d are Hermitian positivesemidefinite matrices for all n. Remark 1: Brockett's original article considered a slightly different functional defined over the real orthogonal group. Since then, the literature has widely referred to functions of the form f (X) = tr{N XAX † }, where N is diagonal and A is real symmetric to be the "Brockett cost function." Our definition, with the summation, is closer to Brockett's definition in [94].
Define the density operators Ξ Proposition 1: Solving Problem 1 is equivalent to solving the following optimization problem: Proof: We prove the case for N i = 1; the case for N i > 1 follows directly. First, note that to minimize the objective function in Problem 1, we need only maximize | ψ Expanding this expression, we obtain The second equality holds since the trace of a scalar is equivalent to the scalar itself. The third equality is due to the cyclic property of the trace. The final equality follows from the definition of the density operator.
In the general case, we propose an iterative gradient-based algorithm for solving Problem 2. This solver and its efficiency will be described in Section II-C. A closed-form solution in the special case where N i = 1 is given in Section II-D. In the next section, learning from mixed states is addressed.

B. LEARNING FROM MIXED STATES
Recall from Section II-A that the density operator for a pure state |ψ is denoted and defined by Ξ = |ψ ψ|. In order to properly represent mixed states, one should use the density operator Ξ instead of the state vector |ψ . A mixed state is a probabilistic mixture, or classical ensemble, of pure states. In other words, a mixed state is characterized by the density operator and |ψ i is a pure state for all i. A mixed state cannot be described by a single state vector alone; however, both the pure and mixed states can be fully characterized by density operators. The density operator Ξ is a Hermitian positive semidefinite matrix satisfying tr{Ξ} = 1. Mixed states in a closed quantum system evolve according to the Liouville-von Neumann equation The solution to this equation is given by where U (t) = e −ıHt as usual. This solution can also be derived from the fact that each state in the classical ensemble Ξ is governed by (1). As for the case where the system starts from a pure state, QPT for mixed states starts by fixing some sampling time For each initial state, quantum experiments are run, and the final states are measured at time t f . Similar to the case of pure states, an estimate of the final state matrices is denoted by As before, the state estimation error for the final states is given by The nth block column of Ξ e 1:N i (t f ) (i.e., the estimate of the nth final state) is simply Ξ e n (t f ). Like their pure counterparts, estimates of mixed states are given by QST, and our method remains agnostic to what QST method is employed.
Mixed states are defined on the Hilbert space C d×d endowed with the canonical Frobenius inner product, which induces the Frobenius norm. This leads to a natural metric, or measure of distance, between two matrices X, Y ∈ C d×d . Namely, d F (X, Y ) = X − Y F . In the case of mixed states, we propose the following optimization problem which minimizes the sum of the squared distances between Ξ e n (t f ) and XΞ e n (t 0 )X † to produce an estimate of the unitary dynamics: whereÛ denotes the optimal solution of P 3 , which is an optimization problem defined on the unitary group. This problem minimizes the Frobenius norm of the difference between the predicted final states,Û Ξ e n (t 0 )Û † , and the QST estimate of the final states, Ξ e n (t f ). Like the proposed QPT formulation for pure states, Problem 3 may be simplified into an easier problem.
Proposition 2: Using the density representation for mixed states, solving Problem 3 is equivalent to solving Problem 2.
Proof: We prove the case where N i = 1, which is easily extended to the case for any finite N i . To begin, we observe that The first equality is the definition of the Frobenius norm. The second equality is due to the fact that [XΞ is Hermitian. Expanding the argument of the prior expression, we obtain The cyclic property of the trace function gives Combining the prior two expressions, the constraint that X is unitary and the linearity property of the trace function, the following is obtained: Since Ξ e (t 0 ) and Ξ e (t f ) are fixed at the time of optimization and are not functions of the variable X, only the term tr{Ξ e (t f )XΞ e (t 0 )X † } can be optimized. To solve the original optimization problem, we wish to maximize this term subject to the constraint that X is unitary, and the proposition is proved. Remark 2: It is a pleasing result that Problem 1 (designed for pure states) and Problem 3 (designed for mixed states) both simplify to Problem 2. For the rest of this article, we will refer to "solving Problem 2" unambiguously as it treats both pure and mixed states equally.

C. GENERAL CASE: N i 1
In the case where N i 1, we provide a computationally efficient algorithm, which iteratively maximizes the objective function (i.e., a Brockett cost function) and solves Problem 2. This process is equivalent to performing steepest ascent on the unitary group. Our methodology is inspired by that of [95], which was introduced in the context of minimizing the Brockett cost function on the Stiefel manifold [96], [97]  when N i = 1. We now proceed by first introducing preliminaries required to implement the proposed iterative method. For generality, we initially adopt the same general notation as in (6) for the Brockett cost and prove the main result. This is done to emphasize the potential application of said result to other domains and simplify the notation. At the end of this section, the general result is used to solve Problem 2 using the appropriate notation.
The unitary group is an example of a Lie group, which is also a differentiable manifold [96]. The tangent space of U (d) at any point X ∈ U(d) is denoted and defined as The projection of an arbitrary matrix Z ∈ C d×d onto the unitary group is denoted and defined by In fact, assuming that the singular value decomposition of [98]. For any X in either U (d), the canonical inner product on the tangent space T X is defined as follows: Under such parameterization, the tangent space should be sufficient to locally characterize points on the manifold. In rigorous terms, this translates to: for any point Y ∈ U(d) sufficiently close to X, there must exist some Z ∈ T X such that Y = h(Z). This article uses the parameterization h(Z) P U (X + Z). Associated with the local parameterization is a local cost function g : . Steepest ascent on the unitary group selects the steepest ascent direction to maximize the local cost g(Z). The following proposition provides an analytic expression for this direction in the general case and is a direct corollary of [95,Th. 14], which computes the direction of steepest descent for the local cost g. Noting that the direction of steepest ascent for the cost g is the same as the direction of steepest descent for −g proves this proposition. Proposition 3: Given an arbitrary cost function f : U (d) → R and the local cost function g(Z) = f P U (X + Z) about any point X ∈ U(d), the direction of steepest ascent of the function g about the point Z = 0 under the inner product ·, · T is given by Using Proposition 3, we present a closed-form expression for the steepest ascent direction of the local cost function Compute the steepest ascent direction: Set X = P U (X + γG) 10: Set j = j + 1 11: Return: X when the global cost is given by (6). This will be used in our iterative algorithm for solving Problem 2. The proof of the following proposition is presented in Appendix A.
Proposition 4: Let f B : U (d) → R be the Brockett cost function (6). Moreover, for any X ∈ U(d), define the local cost g : T X → R to be g(Z) f B (P U X + Z) . Then, the direction of steepest ascent in the sense of Proposition 3 is Exploiting this result, an iterative method for maximizing (6) may be formalized. The strategy is fully characterized by Algorithm 1. The program starts with the initial guess X = I d and a step size γ 1. At each iteration, the direction of steepest ascent is computed according to (10). The algorithm is stopped if one of two criteria are met: 1) if, for some (small) user-defined threshold > 0, the inequality G, G T < holds, i.e., the direction of steepest ascent is sufficiently small. This is the case when X approaches a stationary point or 2) if the number of iterations exceeds some maximum threshold C ∈ N. Steps 5-8 of the algorithm adaptively update the step size according to the Armijo criteria, which ensures that the algorithm converges to a stationary point [95]. In step 9, the variable X is updated according to the direction of steepest ascent and the step size. As mentioned at the beginning of this section, Algorithm 1 be used to solve Problem 2 by simply replacing A n Ξ e n (t 0 ) and B n Ξ e n (t f ). Recall that N i is the number of state pairs used in QPT and d is the dimension of the quantum system. Ignoring the complexity of computing the Armijo step length, which may be replaced with a constant step size, the worst-case computational complexity of our optimization procedure is Engineering uantum . This is due to the matrix multiplication and addition performed in Step 4 and the maximum number of iterations C. Tuning the hyperparameters N i and C gives the practitioner control over the computational complexity of our approach at the potential cost of accuracy. The current state-of-the-art QPT method for closed systems (see [40]) has computational complexity O(d 12 ) (or O(d 6 ) in a special case).
In terms of memory, our method has space complexity O (N i d 2 ), which is determined by the N i matrices of dimension d 2 that must be stored. The state-of-the-art (see [40]) has space complexity of O(d 8 ) (or O(d 4 ) in a special case). The number N i of input-output state pairs used in our QPT method may be chosen by the designer. However, one can expect to produce better results with more samples. In our numerical experiments, we find that N i = d is a good choice.
Remark 3: The choice of N i is influenced by both the theoretical and practical limitations. In [99], the use of "unitarily informationally complete (UIC)" quantum states for QPT was studied. It was shown that as few as d pure states or two mixed states are needed to uniquely identify a unitary process. In practice, one may wish to use more states to overcome noise and error in QST estimates. We have not studied the use of UIC quantum states in our QPT approach; however, this is an interesting research direction.
Remark 4: The computational complexity studied above is that of the QPT algorithm alone. In practical scenarios, one should also consider the complexity of QST since QPT is dependent on state estimates. Popular QST algorithms proposed in [36], [100], and [101] have complexity O(d 4 ) and O(d 3 ). Using such algorithms, the computational burden resides with QPT.

D. NOTE ON THE SPECIAL CASE WHEN N i = 1
In the special case where QPT is performed using a single pair of initial and final states, i.e., N i = 1, we provide a closed-form solution and error bounds on the recovered unitary operatorÛ (t f ). To begin, note that the spectral theorem states that the eigendecomposition of the Hermitian matrix Ξ e (t 0 ) may be taken to be where Λ ∈ R d×d is the diagonal matrix of real eigenvalues and the columns of V ∈ C d×d consist of orthonormal eigenvectors. Similarly, Ξ e (t f ) is taken to be Ξ e (t f ) =QΛQ † (12) whereΛ, Q ∈ C d×d are matrices of eigenvalues and orthonormal eigenvectors, respectively. It is always assumed that eigenvalues are placed in nondecreasing order. With this setup, we are able to provide a closed-form solution for calculating the maximizerÛ for Problem 2. The proof of the following proposition is postponed to Appendix B. Proposition 5: When N i = 1,Û =QV † is a solution to Problems 1, 2, and 3. Now that we understand the closed-form solution to Problem 2 in this special case, it is possible to bound the error incurred by our approach. For an arbitrary vector of phases The following proposition gives an upper bound on the unitary dynamics recovered in terms of the amount of error in the QST process to obtain the estimate Ξ e (t f ). The proof of the following proposition is postponed to Appendix C. Proposition 6: Let Ξ e (t 0 ) have simple eigenvalues. Then, letÛ be calculated according to Proposition 5 and let E be the state estimation error defined in (9). Then, the following error bound holds: where λ i denotes the ith eigenvalue of Ξ e (t 0 ).

Remark 5:
A matrix is said to have simple eigenvalues if none of its eigenvalues are repeated. For Ξ e (t 0 ) to possess this property, it must necessarily be a mixed state. Interestingly, mixed quantum states with this property occur naturally in certain quantum systems, e.g., see the model in [74, Sec. IV.A.3]. Moreover, since Ξ e (t 0 ) is chosen by the designer, it may be designed so that its eigenvalues tighten the bound stated in the previous proposition. Proposition 6 shows that from a single initial condition, it is possible to identify U up to a diagonal unitary matrix Δ φ . This is because the solution to Problem 2 given by Proposition 5 remains a solution when multiplied by any diagonal unitary matrix. In the identification setting studied in [43], which uses population measurements (which are important in various applications including physical chemistry [27]), it is only possible to identify unitary dynamics up to a diagonal unitary matrix. However, in general, it may be possible to uniquely identify a quantum system's dynamics up to a global phase using other measurements, such as positive operator valued measurements (POVMs) [102]. Nonetheless, since the closed-form solution presented in Proposition 5 is nonunique, more than one set of experimental states is necessary to identify the quantum system's dynamics using our QPT formulation. The iterative solution given for the general case of N i 1 can achieve this in a scalable and efficient manner. Section V, which is devoted to numerical experimentation, gives some examples of the number of states, N i , needed for obtaining "good" results.

III. HAMILTONIAN LEARNING
This section will discuss how to use the results of the proposed QPT algorithm (see Section II) to estimate both internal (see Section III-A) and control (see Section III-B) Hamiltonians of a closed quantum system.

A. RECOVERING THE INTERNAL HAMILTONIAN
Once we have estimatedÛ (t f ) using QPT, we seek to recover an estimateĤ of the internal Hamiltonian. All the unitary matrices are diagonalizable by a unitary transformation. Suppose that the eigendecomposition ofÛ (t f ) iŝ Here,Θ,V ∈ C d×d are the diagonal matrix of eigenvalues and the matrix of orthonormal eigenvectors, respectively. Then, according to the relationshipÛ (t) = e −ıĤt , it follows thatĤ where log (Θ) is just the usual logarithm applied entrywise to the diagonal matrixΘ. In order forĤ to be a unique estimate, we must limit the duration of the experiment. This well-known result is in the spirit of the Nyquist-Shannon theorem for digitally sampling classical signals and is due to the fact that the logarithm of a complex variable is nonunique. Proposition 7 ( [40] and [103]): To uniquely identify H from the unitary operator U (t f ), the duration t f of the experiment should satisfy where μ max and μ min are the maximum and minimum eigenvalues of H, respectively. While, in general, the spectrum of H is unknown, choosing a sufficiently short sampling time will ensure that this condition is satisfied. If such a sampling time is employed and the system Hamiltonian is estimated using the approach presented above, it was shown in [40] that the following bound holds: Hence, bounding the error in the estimate of U bounds the error in the estimate of H. With these results in hand, the time has come to present the proposed QT-enabled QHL algorithm for identifying an Use Algorithm 1 to solve Problem 2 and produceÛ (t f ) 4: EstimateĤ according to (14) 5: Return:Ĥ internal Hamiltonian. This technique utilizes both QST and QPT, and a visual depiction is given in Fig. 3. The process is also summarized in Algorithm 2. A set of prepared states are given to the quantum system and are measured at time t f . Through repeated measurement across multiple experiments, QST estimates the state of the quantum system at the termination of each experiment and returns a matrix of estimates Ξ e 1:N i (t f ). QPT uses these state estimates to solve Problem 2 and infer a unitary operatorÛ (t f ) that best maps the prepared input states to the experimentally measured output states. Using the inferred unitary map, QHL is used to reconstruct an estimateĤ of the system Hamiltonian.
Before concluding this section, an inequality relating the error in QST to the error in QHL is in order. In particular, we consider the case where N i = 1, i.e., a single pair of states is used in QPT. For any given φ, denoteÛ φ Δ φÛ and let H φ be the Hamiltonian inferred fromÛ φ according to (14). Then, Propositions 6 and 7 can be combined with (16) to produce the following result.
Proposition 8: Under the assumptions of Proposition 6 and when the sampling time t f satisfies (15), the following error bound holds:

B. LEARNING THE CONTROL HAMILTONIANS
In the case of a controlled quantum system, we consider a time-varying Hamiltonian, which may be decomposed as where H 0 is the internal Hamiltonian that appears in (1) and H c (t) is the control Hamiltonian. Furthermore, we assume that the control Hamiltonian has the following structure: where N c ∈ N, H l is a Hermitian matrix for each l ∈ {1, 2, . . . , N c }, and u l (t) ∈ R is a control input. 3 Each H l is known as an interaction Hamiltonian, and it describes the effect of the lth control field on the quantum system. We emphasize that each H l is time independent and that the only temporal dependence is in the real control input u l (t). The Schrödinger equation becomes and the von Neumann equation becomes During the learning phase, i.e., for t ∈ [0, t f ], it is assumed that the control u l (t) can be placed under a zero-order hold (ZOH), which removes the time dependence in (18) and (19) and allows Algorithm 2 to be applied. This is outlined in Algorithm 3, which we call QT-enabled CQHL. The premise is to learn the Hamiltonians one by one via selectively turning on the control u l (t) and placing it under a ZOH for the duration of the sampling time. The value of u l (t) under the ZOH hold is denoted c l ∈ R. This parameter may be chosen by the designer and is called a probing control input. First, the internal Hamiltonian is learned according to Algorithm 1, while all the control inputs are set to zero. Then, the first control input is set to c 1 (the other inputs remain zero) and the composite (internal and control) Hamiltonian is estimated by repeating Algorithm 1. To recover the control Hamiltonian, the estimated internal Hamiltonian from step one is subtracted from the composite Hamiltonian learned in the second step. This process is repeated for all l ∈ {1, 2, . . . , N c }. At each 3 It should be noted that the control input u l (t) and the unitary propagator U (t) are separate entities despite the notational similarity. step l, the sampling time t f can be arbitrarily chosen (and possibly different across steps) as long as it satisfies Proposition 7 for the composite Hamiltonian, H = H 0 + c l H l . The following proposition gives an error bound on the recovery of the control Hamiltonians when Algorithm 3 is used.
Proposition 9: LetĤ 0 andĤ l , for l = 1, 2, . . . , N c , be estimates of the internal and lth control Hamiltonian produced by Algorithm 3. Furthermore, suppose thatÛ 0 is the unitary operator produced by QPT for the internal dynamics andÛ is the unitary operator produced by QPT for the dynamics governed by the composite Hamiltonian H = H 0 + c l H l . Finally, let the sampling time t f used by QPT satisfy Proposition 7 for both the internal and composite Hamiltonians. Then, the following error bound on the lth control Hamiltonian holds: Remark 6: Consider the special case where QPT uses a single pair of states to estimate bothÛ andÛ 0 . A corollary to Propositions 8 and 9 could be given, which bounds the error in our estimates of the control Hamiltonians in terms of the error in QST.

IV. DATA-DRIVEN MPQC
Once the dynamics (Hamiltonians) of the closed quantum system are learned, one may turn attention to controlling the system's behavior. MPQC is inspired by MPC for classical systems. Traditionally, MPC works by computing optimal controls over a short prediction horizon based on a model of the system's dynamics [104]. It then implements only the first control in this optimal sequence. After executing this action, it measures the state of the system, recomputes an optimal control sequence based on its new knowledge of the state, executes the first control action in this sequence, and repeats. Unlike classical systems, state measurements of quantum systems are not always available as measurement may cause decoherence of their states. Hence, MPQC performs MPC in a closed-loop computer simulation of the quantum system where its state can be fully known [see Fig. 2(a)]. Once a control sequence is computed, it can be delivered to a physical quantum system in an open-loop manner so as to preserve the coherence of the quantum state [see Fig. 2(b)]. Here, for the first time, we propose the data-driven MPQC framework, where the model of the quantum system is inferred by the QHL algorithm proposed in the previous section.
This article is concerned with controlling a closed quantum system whose dynamics are of the form (18) or (19). All the systems studied in the rest of this article are assumed to be controllable [28], which ensures that any control problem considered is well posed. For ease of exposition, the theory of MPQC will be presented in terms of pure states with comments on the related formulation for mixed states when necessary.  The goal of the controller is to drive |ψ(t) to a desired state, |ψ d , via an appropriate control input while minimizing a cost function. MPQC works by placing the control vector u(t) under a ZOH on each time interval of length Let |ψ(t k ) denote the state of the system at time step k ∈ N such that |ψ(t 0 ) is the initial state at time t 0 = 0. The lth element of u(t k ) is denoted by u l (t k ). Finally, H(t k ) is the system Hamiltonian at time step k, i.e., The discrete-time dynamics that advance state |ψ(t k ) to |ψ(t k+1 ) are given by the solution to the Schrödinger equation over the time interval [t k , t k + Δ t ): Define an arbitrary cost as g( Examples of such a cost are given in Table 4 and discussed later. In many cases, the cost is chosen independently of time. Let K p , K c ∈ N denote the prediction horizon and control horizon, respectively, for the controller such that K c K p . During closed-loop quantum simulation, at each time step t k , MPQC attempts to minimize the total cost over the prediction horizon K p by manipulating the control u(t k ) over the control horizon K c . The optimization problem performed at time step t k is given in the following: . That is, MPQC will only optimize controls over the horizon K c -after that, allowing the control to remain constant-while considering the cost incurred over the longer horizon K p . After the optimal sequence u(t k ), u(t k+1 ), . . . , u(t k+Kc ) is computed, only the control action u(t k ) is recorded and given to the simulator, which then returns the next state according to (21). This is repeated until a control sequence of the desired length is constructed or the cost g becomes smaller than a designed threshold. At the termination of this simulation, the sequence of recorded control actions is given to the physical quantum system in an open-loop manner.
The choice of cost function g determines the types of control generated by MPQC, and Table 4 outlines several different choices. In all cases, a summation over s in Table 4 is the sum over all s ∈ Z k:k+Kp . Cost g 1 penalizes the system's state for occupying a forbidden state |ψ bad . Costs g 2 and g 3 both reward the system's state for tending toward the desired state |ψ d ; however, g 3 promotes quick convergence by considering the distance of the state |ψ(t s ) from the desired state over the entire prediction interval. Cost g 2 is the infidelity between the desired state and the system's state at the end of the prediction horizon. In g 3 , {α s } k+Kp s=k is a set of positive weighting parameters that can be designed to put emphasis on different parts of the trajectory. If one would like to consider the expense of control, g 4 and g 5 are a good starting place. A controller utilizing g 4 is called a β-minimum norm controller. Function g 4 adds a penalty for the control expended over the prediction horizon K p to the state cost in g 2 . The term β/(K p u max ) is a scaling parameter, which ensures that the control cost is the same order of magnitude as the state cost. Here, β ∈ (0, 1] may be chosen by the designer.

Engineering uantum
Transactions on IEEE An (α, β)-minimum norm controller is given by optimizing with respect to g 5 , which combines the α-weighted average state cost with the β-weighted control cost. This promotes quick convergence to the desired state while considering the cost of actuation.
Solving the classical MPC problem has been a subject of research interest for many years [105]. We would like to utilize these methods to solve Problem 4; however, this is not immediately feasible since the state |ψ(t k ) is a complex variable, and most of these methods are designed to handle real variables. To overcome this hurdle, we make use of the isomorphism between C d and R 2d . The MPQC formulation (Problem 4) can then be cast, for the purpose of optimization, into one involving only real variables. In this article, we make use of SQP [106], which has been used to solve nonlinear programs for classical MPC [107] and recently to design robust quantum gates [78]. SQP solvers are part of many available numerical optimization packages, including the MPC toolbox in MATLAB [108]. Remark 7: Many nonlinear programming routines, including SQP, require gradients of the cost and constraint functions with respect to states and controls. These may either be computed analytically or numerically approximated. The literature on QOC provides an exhaustive review of how these may be computed, and the reader is referred to [61], [67], and [109] for a few examples of both the analytic and approximate methods.
Suppose one wishes to generate a control sequence of length K ∈ N. If the prediction horizon and control horizon are taken to be this entire length, i.e., K = K p = K c , then MPQC reduces to QOC. Hence, MPQC can be considered as a generalization to QOC that allows the practitioner to simplify the optimization problem performed by breaking it into smaller pieces. This simplification is intrinsically dependent on the underlying optimization routine used. We use SQP to solve the optimal control problem. Suppose, for simplicity, that K c = K p in MPQC, i.e., the control horizon and prediction horizon are the same length. The number of decision variables used by the SQP program that solves Problem 4 is given by K p (d + N c ). At each iteration of the optimization routine, SQP solves a quadratic program, which has complexity that is cubic in the number of decision variables.
Hence, for large values of K and K p < K, one can expect a speedup in the optimization problem performed at each iteration.
Remark 8: We note that there are several different notions of "QOC." The one we refer to in the prior paragraph is similar to that in which GRAPE is applied: assuming ZOH on the control input, optimize a given cost function g over the control horizon K. The differences are twofold. First, GRAPE attempts to solve this problem via gradient ascent, which is different than SQP. Second, the cost in GRAPE is a terminal cost-only the cost of the final state at step K is considered. QOC, more generally defined, could incur a cost at each time step as is done in Problem 4.
Remark 9: The implementation of MPQC in physical scenarios is highly dependent on the application. In the case of superconducting qubits, high-fidelity arbitrary waveform generators (AWGs) have enabled the physical implementation of control signals generated by QOC algorithms such as GRAPE [68]. We, therefore, expect that MPQC could also be physically implemented by such methods. In general, we expect MPQC would perform well in any physical scenario in which QOC has been applied previously.
So far, MPQC has been formalized. Data-driven MPQC is the same procedure; however, the Hamiltonians learned in Section II are used as the simulation's model. Once a control sequence is generated using the learned model, it can be delivered to the physical system in an open-loop fashion. This method is validated in Section V on numerical experiments, and the effect of error in the learned Hamiltonians on the performance of the proposed data-driven MPQC is analyzed.

A. GENERATING UNITARY GATES
In the previous section, MPQC was used to govern the evolution of the state of a closed quantum system. MPQC can also be used to generate control sequences for implementing arbitrary unitary gates. Take the Schrödinger operator equation which describes the evolution of the propagator U (t). This allows one to ignore the initial state |ψ(0) or density operator Ξ(0) and only consider what its net evolution is over time. At time zero, U (0) = I. The objective of the control may be to generate any number of unitary gates, such as an X-gate, Y -gate, Z-gate, or Hadamard gate. In this scenario, the MPQC formulation becomes where U (t s ) is the unitary operator at time step s. A commonly used cost function in this control scenario is the "infidelity" between the unitary U (t s ) and a desired unitary U d as follows: This infidelity is also referred to as the "gate error."

V. CASE STUDIES
In this section, various simulation scenarios are presented to validate both the proposed QT-based QHL algorithm and data-driven MPQC. In the first experiment, the accuracy of the QHL algorithm is assessed for estimating both the internal and control Hamiltonians. Then, data-driven MPQC is tested, and its performance is compared to QOC when the same underlying optimization routine is used. In this section, the percent error metric is used to evaluate the recoverability properties of the proposed QHL algorithm. All the experiments were performed on a basic laptop with 8-GB RAM and an Intel® Core TM i5-7300 U CPU @ 2.60 GHz.

A. EXPERIMENT 1
In this experiment, the proposed QHL algorithm is used to infer the dynamics of a four-qubit network. For the remainder of Section V, let I denote the 2-by-2 identity matrix, X denote the Pauli-x operator, Y denote the Pauli-y operator, and Z denote the Pauli-z operator. The internal Hamiltonian of the network is + ω 12 X (12) + ω 13 X (13) + ω 14 X (14) + ω 23 X (23) + ω 24 X (24) + ω 34 X (34) .
Here, X (1) = X ⊗ I ⊗ I ⊗ I represents the evolution of qubit 1 and ω 1 is the frequency of this evolution. The term X (12) X ⊗ X ⊗ I ⊗ I represents the coupling between qubits 1 and 2 in the network and ω 12 denotes the strength of this coupling. In general, X (i) represents the independent evolution of the ith qubit and is the matrix defined by a tensor product of four matrices, which has X at the ith element of the tensor product and I at all other locations. The coupling between the ith and jth qubits is specified by X (ij) , which is the matrix defined as the tensor product of four matrices, which has X at both the ith and jth locations in the product and I at the other locations. The frequency of the ith qubit is denoted ω i and the coupling frequencies between the ith and jth qubits are denoted ω ij . In the following experiments, ω 1 = 0.1 GHz, ω 2 = 0.025 GHz, ω 3 = 0.075 GHz, and ω 4 = 0.13 GHz; moreover, all qubit coupling frequencies are ω ij = 0.01 GHz. Each qubit is controlled by two external fields. The control Hamiltonian is  where Z (l) is defined analogously to X (l) . In this experiment, d = 16 is the dimension of the quantum system; however, only N i = 8 pairs of initial and final states are used to infer the system's Hamiltonians. Random initial states were generated using quantum entanglement theory laboratory (QETLAB)'s random density matrix function [110], which generates a random density matrix uniformly according to the Hilbert-Schmidt measure. The final states used in this experiment were generated by evolving the initial states forward in time according to (19). In addition, the state estimation error (9) is assumed to be a random (traceless) noise. The sampling time of the final states is chosen as t f = 1ns.
The maximum number of iterations in Algorithm 1, which is called by Algorithms 2 and 3, is chosen to be C = 15.
One hundred numerical experiments were performed with various noise intensities. The results using Algorithm 2 to infer the internal Hamiltonian are depicted in Fig. 4(a). The recovery error of our proposed QHL method is sublinear to the level of noise in the experimental data. Next in each experiment, Algorithm 3 was used to infer the interaction Hamiltonians. For inferring each control Hamiltonian, H l , the corresponding control input was placed under a ZOH with value c l = 1 GHz. Fig. 4(b) presents the error in recovering the internal and control Hamiltonians over the 100 experiments. The recovery error for the control Hamiltonians is usually higher than that of the internal Hamiltonian, which is expected due to the design of Algorithm 3. In both figures, the y-axis denotes the percent error. In all the numerical experiments, the percent error in recovering any Hamiltonian was less than 1%.

B. EXPERIMENT 2
Now, the efficacy of data-driven MPQC will be demonstrated using the same four-qubit network studied in the previous experiment. The Hamiltonians inferred in the previous experiment, which correspond to the worst-case error, were given to MPQC. The goal of the control is to drive the qubit network from the state |0011 to the consensus state | + + + + , where |+ 1 √ 2 (|0 + |1 ). This task is known as achieving "consensus" in a quantum network or driving the network to a "consensus state," i.e., all nodes (qubits) in the network share the same state [111]. An (α, β)-minimum norm controller was used with {α s } k+Kp s=k = {1, 2, . . . , K p } and β = 0.3 to generate a control input offline in simulation. The control is limited with u max = 1GHz, the prediction horizon is K p = 4, the control horizon is K c = 4, and the discretization time is Δ t = 0.05ns. The optimized control input, which was calculated using the learned model, was given to the ground-truth model and the outcome recorded. As depicted in Fig. 5, data-driven MPQC quickly drove the qubit network to a consensus state.

C. EXPERIMENT 3
In this insightful experiment, the performance of two different MPQC controllers is analyzed in the case of a simple qubit, where the Hamiltonians are perfectly known and their efficacy is compared to QOC. The internal Hamiltonian of the system is H 0 = ω Z, the qubit frequency is ω = 5 GHz, and the interaction Hamiltonian is H 1 = Y . The control  is limited with u max = 1 GHz and the discretization time is Δ t = 0.01 ns. The system is initiated in the excited state |1 and driven to the ground state |0 .
First, a β-minimum norm controller with β = 0.005 is considered; second is an (α, β)-minimum norm controller with {α s } k+Kp s=k = {1, 2, . . . , K p } and β = 0.005. A prediction horizon of K p = 5 and a control horizon of K c = 2 are used for both of these controllers. If we allow K p and K c to be the entire length of the control sequence, then MPQC is equivalent to the QOC problem on that interval. Using this approach, the third controller is an (α, β)-minimum norm quantum optimal controller. Fig. 6 depicts the results of this experiment. Fig. 6(a) plots the control input generated by each controller, and Fig. 6(b) depicts the evolution of the qubit's state along the Bloch sphere in response to each control input. Moreover, Fig. 7 shows the infidelity between the qubit's state and the desired state at each time step for the three scenarios.
For this simple experiment, it took the basic laptop discussed at the beginning of Section V roughly 6 s to compute each MPQC control sequence and over 31 min to solve the QOC problem despite using the same underlying optimization procedure. QOC achieved complete population transfer in the fewest time steps; however, the (α, β)-minimum norm MPQC achieved quite similar performance. The β-minimum norm MPQC took longer to achieve population transfer, which was expected. While  the control sequence generated by this controller is longer than required by the others, it appears to be more smooth. This may be of practical interest in cases where highly discontinuous changes in a control field excite unwanted energy levels in a quantum system.

D. EXPERIMENT 4
For the final experiment, the goal is to generate arbitrary unitary gates for the qubit system studied in Experiment 3. We compare the efficacy of MPQC and GRAPE for this task. Let K ∈ N be the total length of the control sequence. GRAPE generates a control signal by solving the following optimization problem: 4 Notice that GRAPE only considers a terminal cost as opposed to MPQC, which can consider a different cost at each step in the control sequence. For this comparison, Problem   ×'s) for a nominal 5-GHz qubit were given to a qubit with each new frequency, and the resulting gate error was recorded. In most cases, the control generated by MPQC is more robust to uncertainty in the qubit frequency than that of GRAPE. K c = 4 and K p = 12, respectively. Problem 5 optimizes the variable U (t k ), which is of dimension d 2 , at each time step as opposed to Problem 4, which optimized the variable |ψ(t k ) of dimension d at each time step. Thus, the number of decision variables optimized at each step of SQP in MPQC is greater than that of Experiment 3. Hence, longer computation time can be expected. Our GRAPE implementation follows that of [61], which uses a gradient-based procedure for solving Problem 6. For this comparison, the control is limited to u max = 2 GHz. The iterative optimization procedures of both MPQC and GRAPE were set to terminate after the gate error reached a predefined threshold of 10 −3 . The qubit dynamics are the same as in Experiment 3. Unlike MPQC, the number of control steps, K, must be defined prior to solving Problem 6 (GRAPE), and there are no established rules for choosing this parameter. However, MPQC is designed to iterate through time, solving Problem 5 at each step, until a desired gate error is achieved. Hence, for this experiment, MPQC was performed first and K was recorded. Then, Problem 6 (GRAPE) was solved using the same value. Table 5 summarizes the results. For all gates studied, we found that MPQC offered a significant speedup over GRAPE and that both the methods achieved low gate errors.
Next, the robustness of the control signals generated by MPQC and GRAPE is studied. Previously, control signals for three single-qubit gates were optimized under the assumption that the qubit's frequency was 5 GHz. These control signals were then provided to qubits whose frequency differs from the 5-GHz value. Fig. 8 plots the resulting gate errors as a function of qubit frequency. In this figure, circles represent data points for MPQC and crosses represent data points for GRAPE. These points are color-coded to indicate which unitary gate they correspond to. We observe that, in almost all cases, MPQC generated gates, which are more robust to uncertainty in the qubit frequency.

VI. CONCLUSION
This article introduced the concept of data-driven MPQC to control general closed quantum systems, where no nominal model is provided. To this end, a novel and efficient approach for QHL was proposed. To learn the internal and control Hamiltonians, a novel QPT algorithm was developed, which involves optimization on the unitary group. The learned Hamiltonians were then used by MPQC to compute quantum control sequences. The proposed QHL algorithm allows for inferring the Hamiltonians of high-dimensional quantum systems due to its low memory requirement. In addition, the MPQC framework is flexible and works for a variety of control costs. In our experiments, when an SQP solver is used, MPQC is observed to be much faster at computing control sequences than the current state-of-the-art, QOC. The success of many quantum technologies depends on the ability to precisely characterize and control quantum systems, and our results offer a promising approach for controlling quantum systems with uncertainty. Our results can serve as guidelines for designing robust and efficient data-driven control policies for the intervention in quantum dynamics such as interactions of atoms and molecules or superconducting qubits.

A. PROOF OF PROPOSITION 4
Proof: Let X, Z ∈ C d×d be arbitrary matrices. By definition of the cost, we have Engineering uantum Using the linearity and cyclic properties of the trace, we have Above, the second equality holds because, for any complex number c ∈ C, the relationship c + c * = 2 R{c} holds. Since each A n and B n are positive semidefinite, they admit principal square roots P n and L n , respectively, such that A n = P † n P n and B n = L † n L n . Thus which shows that f B (Z) grows on the order O( Z 2 F ). The definition of the Frobenius norm and its submultiplicative property were used to obtain (27). Combining (25)- (27) gives which holds for all X, Z ∈ C d×d . Hence, this expression also holds for all Z ∈ T X (d) ⊂ C d×d , and using Proposition 3, the desired result follows.

B. PROOF OF PROPOSITION 5
Proof: Let Y ∈ U(d) and N , A ∈ C d×d . When A is Hermitian and N is diagonal, it is well known that the solution to the Brockett optimization problem tr N Y AY † is given by some " Y whose rows are orthonormal eigenvectors of A [95]. Hence, by the spectral theorem, " Y A " Y † is a diagonal matrix whose diagonal entries are the eigenvalues is also a diagonal matrix. Hence, the Brockett function is maximized by choosing the rows of " Y to be an appropriate permutation of the eigenvectors of A.
Next, we consider the implications of the previous discussion on Problem 2. Define the orthogonal transformation Y Q † X, whereQ is the orthogonal matrix of eigenvectors of Ξ e (t f ), as defined in (12). In Problem 2, the variable X is unitary; hence, the variable Y is unitary as well. Using (12) and the cyclic property of the trace, Problem 2 becomes which is an optimization problem of the form discussed at the beginning of the proof. Once a maximizer " Y to this problem is obtained, we can recover the unitary maximizer to Problem 2 as " X =Q " Y . Recalling the prior discussion, sinceΛ is diagonal and Ξ e (t 0 ) is Hermitian, the solution to this problem is obtained by choosing the rows of Y to be some permutation of the eigenvectors of Ξ e (t 0 ) to maximize tr{ΛY Ξ 0 Y † } = tr{ΛΛ}. Since the eigenvalues in decompositions (11) and (12) are nonnegative and are in nondecreasing order, " Y = V † is a solution to the problem. Therefore,Û = " X =QV † is a solution to Problem 2. In the case of pure states, Problem 2 is equivalent to Problem 1, soÛ =QV † solves Problem 1 as well. In the case of mixed states, a similar conclusion is drawn for solutions of Problems 2 and 3.

C. PROOF OF PROPOSITION 6
Prior to presenting the proof, we provide a few remarks.
Since Ξ e (t 0 ) and Ξ e (t f ) are related by a unitary transformation, they share the same eigenvalues. Hence, given the assumption that Ξ e (t 0 ) has simple eigenvalues, so does Ξ e (t f ).
The eigenspace associated with each simple eigenvalue is of dimension one. Therefore, the orthonormal eigenvectors of each matrix are unique up to a shift by a complex phase. For instance, let q k represent an arbitrarily computed unit-length eigenvector of Ξ e (t f ) corresponding to the eigenvalue λ k .
Namely, q k represents the kth column Q from (12). Then, all possible unit-length eigenvectors (corresponding to λ k ) of Ξ e (t f ) can be represented as Ξ e (t f ) e ıφ k q k = λ k e ıφ k q k for an arbitrary phase φ k ∈ R. It should also be noted that, using (8) and (11), is a valid spectral decomposition of the final state; the orthonormal eigenvectors of Ξ f are the columns of the product U (t f )V . Hence, if the error E is zero, the estimateÛ of Proposition 5 exactly recovers U (t f ) up to a diagonal phase shift Δ φ . The proof of Proposition 6 follows below in the general case for any estimation error E. Proof: Suppose, without loss of generality, that U (t f ) = QV † . Then The second equality follows from the unitary invariance of the Frobenius norm. It follows that If for each pair of columns q k andq k of Q and its estimatê Q, respectively, the following inequality holds: which via (30) proves the desired result. Hence, it only remains to show that (31) holds.
We will now prove (31) for an arbitrary k ∈ {1, 2, . . . , d}. Therefore, q k andq k represent the kth columns of Q and its estimateQ, respectively. For an arbitrary angle ϑ ∈ R, note that The last equality above holds because q k andq k are unitary. Fix the angle ϑ to be ϑ 0 such that R{e −ıϑ 0 q k |q k } = q k |q k . Since q k andq k are unit length, | q k |q k | 1 and R e −ıϑ 0q † k q k = q k |q k q k |q k 2 from which (32) provides Since the columns of Q form a basis for C d , it is possible to rewriteq k asq Next, using the orthonormality of the eigenvectors, observe that The first equality follows from (34), the second from the eigenrelation between Ξ e (t f ) and its eigenvector q k , and the third equality comes from applying the definition of the 2norm and the orthogonality of the columns of Q. The final equality follows from the fact thatq k is unitary and q m for m ∈ {1, 2, . . . , d} forms an orthonormal basis. On the other hand, we have While deriving the previous inequality, we have used the following facts: 1) the definition Ξ e (t f ) = Ξ e (t f ) + E; 2) the triangle inequality; 3) the absolute homogeneity property of the 2-norm and submultiplicative property of the operator norm; 4) the fact thatq k is unit length; 5) for the perturbed Hermitian matrix Ξ e (t f ) = Ξ e (t f ) + E, Weyl's celebrated eigenvalue perturbation inequality states that λ k − λ k E for all k.

D. PROOF OF PROPOSITION 9
Proof: Algorithm 3 estimates H k according tô Equation (16) allows the following deductions: which proves the desired result.