Designing Monte Carlo Simulation and an Optimal Machine Learning to Optimize and Model Space Missions

This paper investigates applying artificial intelligence (AI) algorithms to attitude control system of satellites to optimally tune the controller using high performance computing. This methodology is applied to the Virtual Telescope for X-ray Observation mission, which is a precise formation of two separate spacecraft observing multiple objects in the space in the X-ray domain. The mission is divided into phases based on the instrumentation and the mission goal. To reach an stable precise formation robust to stochastic slew and slew rate (i.e., Euler angles and angular velocities) in a minimal constrained time <inline-formula> <tex-math notation="LaTeX">$T$ </tex-math></inline-formula>, consumed energy of the attitude control system, denoted as <inline-formula> <tex-math notation="LaTeX">$E$ </tex-math></inline-formula>, and root-mean-square state error of attitude control system, denoted as <inline-formula> <tex-math notation="LaTeX">$e$ </tex-math></inline-formula>, are minimized. Monte-Carlo simulation is used for the sensitivity analysis of optimization and designing a controller. Deep neural networks (DNN), Gaussian processes (GP), and support vector regression (SVR) learn this optimization as a surrogate model, while their hyperparameters are optimized in a novel approach. THETA supercomputer at Argonne Leadership Computing Facility (ALCF) is used for optimizing the hyperparameters of DNN. The surrogate model meets the requirements of the mission, and it shows a better performance over the optimization and Monte-Carlo. The optimal DNN can satisfy the mission requirements <inline-formula> <tex-math notation="LaTeX">$e$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$T$ </tex-math></inline-formula> while reducing <inline-formula> <tex-math notation="LaTeX">$E$ </tex-math></inline-formula> for 90% compared to the other given methods.


I. INTRODUCTION
Space missions require the spacecraft to be positioned and oriented in a certain way. The guidance, navigation, and control (GNC) oversee the attitude control system (ACS) and relative position control systems [1]. GNC is used in spacecraft formation flying to synchronize two or more spacecraft in an accurate alignment in space. Formation flying provides advantages over a single spacecraft including robustness, The associate editor coordinating the review of this manuscript and approving it for publication was Zhouyang Ren . redundancy, reconfiguration, and broader coverage. Formation flying, as a key factor in virtual spacecraft formation [2], and rendezvous [3], is investigated in missions including virtual telescope for X-ray observation (VTXO) [4], Proba 3 [5], MASSIM X-ray virtual telescope [6], PRISMA [7], and SIMBOL-X [8]. Making a formation of spacecraft for astrophysics observation and optimizing the science mission lifetime of VTXO by 50% is studied in [9]- [12]. Making a formation of distributed telescope to observe exoplanet AEgir and optimizing the energy consumption of the mission is studied in [13]. Formation has been studied as leader-follower VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ approach, behavior-based method, potential field approach, and generalized coordinates [14]. In VTXO, leader-follower approach is taken [12], [15], [16] where the leader holds the lens, known as the optics spacecraft (OS), and the follower holds the X-ray camera, known as the imaging spacecraft (IS). VTXO uses a lightweight Phased Fresnel Lens (PFL) to obtain near diffraction-limited angular resolution in the X-ray band [4], [10], [12], [15]- [18]. PFL provides the VTXO mission with nearly 55 milli-arcsecond (mas) (FWHM) imaging resolution of astrophysical sources in the X-ray band for a 1 km focal length. 55 milli-arcsecond (mas) (FWHM) imaging resolution provided by PFL requires VTXO to hold a minimum 55 milli-arcsecond (mas) (FWHM) 1 km precision formation of IS and OS with a sub millimeter transverse alignment accuracy. The imaging resolution of around 55 mas is approximately 10 fold higher than the state-of-the-art Chandra X-ray 0.5 arcsecond pointing resolution [12] and James Webb space telescope 0.5 arcsecond pointing resolution [19]. With the VTXO's high resolution, the search for exoplanets [20] and high-energy environments around space objects including black holes, neutron stars, and stellar systems at the smallest spatial details is possible for the first time.
The Navy Interferometric Star Tracker Experiment II (NISTEx-II) instrument [21] provides accurate formation navigation for VTXO. Using NISTEx-II star tracker on the optical bench of IS and laser beacons on OS, GNC keeps the formation within millimeter level transverse alignment and 55 milli-arcsecond attitude accuracy [9], [10], [12]. GNC of VTXO uses thrusters, reaction wheels, inertial navigation system sensor (IMU), GPS, NISTEx II precision star tracker on the optical bench of IS, laser beacons on the OS, and a radio ranging system that also serves as an inter-satellite communication link. The instrumentation of VTXO [12] imposes requirements on the ACS state error. Beside, consumed energy of ACS and the time of the mission are desired to be minimal as the requirements of the VTXO.
To meet the requirements of VTXO, the relative position control system and ACS consist of 4 phases and 3 phases, respectively. Trajectory optimization is used to determine the optimal phases, the optimal orbits, and the optimal controllers for the GNC [6], [10]- [12]. For the VTXO, 3 phases are assigned to the ACS, and each phase is designed to optimize the performance of the VTXO mission by increasing the accuracy and reducing the energy consumption [4], [15]- [18], [22]. The benefits of assigning phases to the mission are • Objective functions are defined for each phase based on the goal of the mission and the instrumentation.
• The AI and the controller for each phase are designed to satisfy the objective functions. Based on the instrumentation constraints and the accuracy requirement for the science observation, 2 accuracy requirements for e are assigned to the phase prior to science observation phase. and an objective function is designed to satisfy the mission requirements and keep E minimum.
The benefits of maintaining E minimum are longer mission lifetime since one of the limitations of a mission lifetime is depletion of energy resources, lower spacecraft weight since less resources are needed to provide energy, and less cost for a mission. The benefits of maintaining e and T minimum are high pointing resolution, satisfying the e constraint of the instrumentation, and satisfying the T requirement respectively. In the objective function, a global asymptotic stable Lyapunov controller function is applied as a constraint to the optimization problem for VTXO to guarantee an ideal global asymptotic stable response for all the initial conditions.
Sliding-mode control (SMC) can provide robustness and asymptotic stability in the presence of noise, disturbances with the disadvantages of chattering [1], [23]. Chattering has to stay minimal in VTXO since NISTEx-II requires the angular velocities to stay close to zero. Other approaches including robust control [24], [25] including SMC, adaptive control [23], [26], Deep neural networks (DNN) controllers for nonlinear system [27] have shown asymptotic stability in the presence of disturbances and noise. However, usually adaptive controllers, including model reference adaptive control (MRAC), are developed for linear systems [27]- [29] and they are relatively slow in convergence. The nonlinear adaptive controller version developed by State Dependent Riccati Equations (SDRE) doesn't provide global asymptotic stability [28], as it only provides local asymptotic stability. DNN controllers, on the other hand, are stable globally and they can estimate the nonlinear dynamics [27], but, still the slow convergence is a concern, and the process of designing the controller is complex [27]. In VTXO, the speed of controller convergence is important as it defines T for the transient phase, and the dynamics of the nonlinear system is known. Multiplicative extended Kalman filter (MEKF) [1], [3], [4], [16], [30] can provide the state estimation with noisy sensor measurements and having disturbances and uncertainties in the system. Extended state observer (ESO) [23] have shown promising solutions to the disturbance and uncertainty estimation and rejection while providing asymptotic stability. ESO estimates the time varying differentiable uncertainties and disturbances in the system, and the controller compensates for them in a single spacecraft [23] and multiple spacecraft formation control [31].
In this paper, Monte-Carlo simulation is used for control system robustness and stability demonstration, and it is used to prove control system robustness against external disturbances and noise measurement. Monte-Carlo simulation is used to show the control system robustness and stability for flexible robot arm [32] and for the formation control robustness of UAVs [33]. In the Monte-Carlo simulation, Quaternions are used for modeling the dynamics of spacecraft.
Quaternions are preferred for solving the equations of motion since they do not face singularities, they are easy to normalize, and they are computationally less expensive [1], [34]. Quaternions exist in attitude dynamics of spacecraft, fluid mechanics, and quantum mechanics [35]. Highly accurate space missions including VTXO [4], Hubble telescope and James Webb telescope are usually long missions, and the quaternion solution accuracy in the simulation decreases while the quaternion norm decreases during the long intervals of time. Classical approaches like third and fourth order Runge-Kutta can solve ordinary differential equations (ODE) including quaternion ODE of motion. More recent algorithms including multiple order Crouch-Grossman Lie group methods [34], [36] and multiple Runge-Kutta-Munthe-Kaas [34] have shown higher accuracy over the classical Runge-Kutta. However, the effect of the noise and the effect of the controller are not considered in these approaches. In this research, the effect of the a nonlinear Lyapunov controller's parameters is considered on the accuracy of the quaternion, and an appropriate adaptive time step is taken based on the controller's parameters.
Reaching a single robust solution to a problem with multiple objective functions requires two steps: Optimization and decision-making [37]. In the single-objective optimization algorithms, a prior preference is available, and the decision making is performed first. Next, the objective functions are accumulated, e.g. weighted-sum method, and optimized; or one objective is optimized, and the rest of the objective functions can be assigned as constraints to the problem. Having prior considerations requires a deep understanding of the problem. Besides, single-objective optimization algorithms provides the solution with only one solution without showing the trade-off between the other objective functions, i.e., Pareto front. Single-objective optimization algorithms and priors are discussed in chemical processes [38], [39] and relative position of spacecraft [40]. Multi-objective optimization algorithms, on the other hand, provide the solution with a Pareto front when a prior is not available, and, consequently, based on the posterior, the decision making is applied to the problem after the optimization is performed. Multi-objective optimization algorithms produce the Pareto front not based on a desired distribution of solutions with numerous iterations that take a long time, and some of these solutions might not be desired. The decision making and the posterior in multi-objective optimization algorithms are discussed in ACS [41], ACS of VTXO [4], [17], optimal sizing and sitting of power electronic interfaced [42], and water distribution system [43]. Algorithms including weighted-sum method, epsilon constraint method, and evolutionary algorithms [37] provide a Pareto front solution to multi-objective optimization problems considering the accuracy, diversity, spread requirements for the Pareto front. In addition, ML algorithms, including Gaussian process (GP) [44], [45], and DNN [46], [47] can estimate the Pareto front solution during the optimization and reduce the iterations. Due to the uncertainties in space, including disturbances in space and using control techniques to tackle them [48], [49], detumbling [47], [50], and available energy uncertainties [51], defining a prior does not provide an efficient and robust solution to the GNC solution for every mission. Besides, the space objects to observe and the mission keeps updating for VTXO. For the VTXO ACS solution, the weights in weighted-sum method, denoted as w, are produced with a desired distribution in the single-objective optimization algorithm offline using Monte-Carlo method, and later w is incorporated into ML to estimate Pareto front to find a single real-time solution in the space when a posterior is available. This increases the adaptability of VTXO. w is the prior in the optimization, and then w is posterior in real-time when using ML. Weighted-sum method is applicable to VTXO since the Pareto front in VTXO problem is convex [4]. Multi objective genetic algorithm, one of the algorithms in multi-objective optimization algorithms, is used to show the Pareto front for VTXO and its convexity, and simulated annealing (SA) is used as a single-objective optimization algorithm for the data production. Using ML for a real-time estimation is applicable to other problems including water distribution system [43] with convex Pareto front [37].
To find a solution to an optimization problem with constraints, MPC can be used to reach a Pareto front for a nonlinear dynamical system by finding the control inputs at each time step. The disadvantage of MPC is its high computation cost, lack of proof for converging to a Pareto front for all the cases, and lack of proof for stability. To reduce the problem of high computation cost, MPC can be first trained offline, second learned by a ML algorithm and obtain a surrogate model, and finally applied in real-time to the control problem. DNN and MPC are used for optimal real-time control landing of spacecraft [46], Boolean spacecraft attitude control [47], and surrogate modeling of MPC for optimal control of low-level RF control system [52]. ML and optimization have been implemented on the applications including VTXO [4], [16], [17], proportional-integral-derivative controller (PID) and MPC for particle accelerators [53]- [57], MPC in smart grids [58] optimal tuning of the controller's parameters in the relative position of formation control [40], trajectory optimization for VTXO [11], ML hyperparameter tuning [59]. The technologies using ML and optimization can be improved using the developed techniques in this research. For example, the controllers including linear quadratic regulator (LQR) and MPC can reach a general optimum using ML, and the performance of modelling complex systems and complex ML algorithms can be improved using high performance computing (HPC).
HPC can be used to tune the ML algorithms and make them reach their optimum performance by parallelizing and distributing the ML. Distributed ML algorithms, including distributed DNN [60] and deep GP [61] have been implemented on applications including sensor networks [62] and aerospace systems [61]. On the HPC platforms, the optimization algorithms including DeepHyper [63] and Hyperband [64], [65] packages optimize the ML's hyperparameters. The disadvantage of Hyperband is that the algorithm is not adaptive in learning [64], [65] to increase the speed of convergence. In these packages, setting the optimization based on the data is complex, and DeepHyper is a black-box optimization not giving insight into the ML's hyperparameters optimization and not initially tuned based on the complexity of the ML VOLUME 10, 2022 and data. For this project we have access to the THETA at Argonne Leadership Computing Facility (ALCF) for HPC. Randomized search (RS) and grid search (GS) are used to solve the distributed DNN's hyperparameters optimization problem. To reduce the cross-validation (CV) computation time, CV can be parallelized in ML algorithms [66] and a subset of data can be used for CV [67]. Hyperparameters optimization problem is designed based on the data and the computation resources so that the computation resources are more efficiently used in a less amount of time. In this research, hyperparameters optimization problem is a discrete Coarseto-fine optimization problem for DNN solved with GS first and next Randomized search RS. Coarse-to-fine optimization is used in applications including image processing [68] and Speech Enhancement [69]. HPC is used in THETA for parallelizing RS and parallelizing CV. The advantages of RS are it is easy to implement, it can be easily parallelized on HPC, and it converges with the probability one to the local optimum. The disadvantage of RS is it takes long to converge. To make the convergence faster, the iterations' span of RS are adapted to the speed of RS convergence. After the ML is designed and trained, the ML is implemented on the Field Programmable Gate Arrays (FPGAs).
FPGAs are integrated circuit that are based around a matrix of configurable logic blocks (CLBs) connected via programmable interconnects. Embedded intelligence (EI) is incorporating AI and decision making into embedded systems [70]. ML is embedded into FPGA the for EI. Compared to central processing unit (CPU) and graphics processing unit (GPU), FPGAs can provide robustness toward space radiation [71], lower latency [71], connectivity for connecting to any input with a high bandwidth [70], [72], energy efficiency [70], [72], and parallelism [70]. These advantages have resulted in using EI for spacecraft communication systems [73] and spacecraft GNC [74], [75]. To form a charged particle counter in VTXO [12], Amptek A225 and A206 hybrid FPGA chips are used to provide the data to the computer.
The unique contributions of this research provide the VTXO mission with a simulation-based algorithms to increase the performance of VTXO by satisfying the following mission objectives • Accuracy requirements imposed by the instrumentation and the required science observation accuracy.
• Energy optimization of ACS.
• Constraining the time of the transient phase to a few minutes to increase the performance of the mission.
The VTXO objectives are satisfied by the 6 unique contributions as the following: • Introducing ML-based optimal controller and ACS phases to satisfy the designed constraints and objective functions imposed to VTXO because of the instrumentation and objectives of the VTXO mission.
• Robust Runge-Kutta solver for solving the quaternion equations. • Integration of the adaptive Runge-Kutta solver in the data production and optimization process for a time efficient data production to reduce the computation time while maintaining high accuracy for the ACS solver.
• Obtaining the optimal ACS parameters by Monte-Carlo simulation and comparing it with ML approach.
• Integration of the w in the ML with a desired distribution to provide an accurate, diverse, and spread Pareto front for the posterior to find a single solution in the ML.
• ML hyperparameters optimization with a systematic approach to the computation resources. For the rest of paper, first VTXO mission is described. Section II shows the elements of the simulation and the equations of GNC. Afterwards, the AI structure is given. Section IV shows the Monte Carlo simulation experiments and the corresponding simulated data. Next, In the discussion section, the results show that the objectives of the mission are satisfied. In the conclusion, the mythology and the future work are discussed.
The codes and the data in this paper are available at https://github.com/Rpirayesh.

II. VTXO MISSION CONCEPT OF OPERATION
VTXO mission has three main sections as shown in Fig. 1: commissioning, operation, and de-commissioning [10]. Commissioning is when the spacecraft perform maneuvers to reach their desired orbits and correct the drifts initiated from the release of the launch vehicle. The formation happens in the operation section for observing the desired objects. Commissioning introduces large slew and slew rate at the beginning of the operation phase. In decommissioning, spacecraft deorbit. In the operation section of VTXO, each orbit in the VTXO mission consists of three major phases for ACS [4] and four major phases for the position formation between OS and IS.

A. PHASES OF ACS
ACS has 3 phases as the following.
• The formation stabilization phase. • The transient phase.
• The science observation phase. In the formation stabilization phase, the spacecraft are stabilized while they pass the perigee to come into the next orbit phase. The Global Positioning System (GPS) navigation is used in this phase for orbit determination since the OS and IS reach the perigee and GPS provides higher accuracy closer to earth. During this phase, different techniques such as using gravity gradient torque [22], feed forward controller [48], [49], [76] detumbling [50] and attitude orientation for the solar panels [77] can be used before initiating the transient phase. As a result, maintaining this phase longer can help with the energy systems like obtaining energy through solar panels [77]. The controller is a feedforward controller compensating for the gravity gradient external torque.
In the transient phase, there are large-angle maneuver, and the E is desired to be minimized. A Lyapunov controller is used to have the states of the system converge to the equilibrium point, denoted as x x x e , given any initial condition [1]. Plus, the controller guarantees a minimal path between the initial quaternion and final quaternion to consume the least action [1]. In the transient phase, the large angle maneuver control is applied to provide enough attitude accuracy for beginning the science observation phase. OS and IS in this phase are controlled independent of each other, and the ACS does the formation in the science observation phase. Inertial measurement unit (IMU) and star trackers are used for the navigation. To increase the time of science observation phase and maintain the formation stabilization phase longer, a few minutes is considered for the transient phase as a requirement since reducing the time of the large angle maneuvers in the transient phase will increase the time of the formation stabilization phase. Due to the large-angle maneuvers and the possible chattering cause by SMC, SMC is not used in this phase since the chattering will increase e, E, and T in the transient phase.
After the spacecraft passes the transient state while being stabilized, the science observation phase begins the precise formation with the following specifications • 55 mas (FWHM) attitude accuracy between the formation of OS and IS.
• Sub millimeter transverse alignment accuracy between OS and IS.
• A few arcminute pointing accuracy for each spacecraft.
• The distance between OS and IS should remain 1 km with a meter accuracy.
A few arcminute pointing accuracy for OS and IS is already achieved in VTXO [4], [16], [17]. Science observation takes place at the apogee since the disturbances are minimum at the apogee. Half of the observation takes place before the apogee, and half after the apogee. For the navigation, IMU on both IS and OS, star trackers on both IS and OS, laser beacons on the IS, radio ranging on the IS are used for navigation. GPS doesn't provide high accuracy resolution at the high altitudes in the apogee. MEKF and SMC are already chosen for science observation phase in VTXO [4] since robustness to external disturbances and model uncertainties is required. In addition, the SMC is optimized to minimize E and e, and SMC is developed to guarantee a minimal path and least action is taken between the initial quaternion and final quaternion FIGURE 2. Orbits and ACS phases in the formation are shown. The pseudo orbit of IS keeps the formation for |r r r rel | = 1km with 1 meter accuracy. r r r rel = r r r IS − r r r OS . r r r OS and r r r IS show the distance vectors of the spacecraft.
to reduce E [4]. A saturation function is embedded in the controller to reduce the chattering.

B. PHASES OF POSITION FORMATION
Position formation has 4 phases as the following • The de-formation phase. • The tracking phase. • The formation phase. • The science observation phase. After the science observation phase, the 1 km formation breaks and the spacecraft formation drifts during the de-formation phase. Near the perigee, the tracking phase occurs and the formation is held for 20 m (for minimal energy consumption [10]). During this phase, GPS provides the orbit determination and radio ranging determine the relative distance between the OS and IS. In this phase, the data from the spacecraft including the images from the observation is sent to the earth by the communication system. During the formation phase, the relative distance between the OS and IS reaches 1 km with a few meter accuracy to begin the science observation phase. Meter accuracy in focal length can be tolerated during the science observation phase since this distance error in focal length corresponds to the energy error in the imaging observations of the space objects [12]. In this phase, the relative transverse alignment accuracy is sub millimeters.
During the formation, OS can stay in its natural orbit while IS changes its distance with the OS to hold the formation. Fig. 2 illustrates formation of OS and IS and the phases on each orbit.

C. ACCURACY REQUIREMENTS
NISTEx-II star tracker sensor enables the precise formation for the science observation phase. The field of view (FoV) of the NISTEx-II star tracker sensor is 5 deg [10]. Instrument alignment inaccuracies together with 41 mas (1-σ ) of the NISTEx-II star tracker while knowing the location of the laser beacons on the OS lead to the 55 mas (FWHM) formation accuracy. The slew rate should stay near zero for 41 mas VOLUME 10, 2022 (1-σ ) accuracy, with 206 mas accuracy when the slew rate is 1 deg/sec. FoV of VTXO is obtained by the division of the size of the X-ray camera and the focal length. ACS of OS and IS satisfy the accuracy requirements imposed by the FoV of VTXO equal to 0.18 deg [12]. The accuracy requirements at the end of the transient phase are imposed by • R1: NISTEx-II star tracker • R2: FoV of VTXO Satisfying the R2 accuracy requirement satisfies the few arcminute pointing accuracy requirement for the OS and IS in the science observation phase.
Since the noise and disturbances are minimally bounded, and since the nonlinear Lyapunov controller provides the ideal global asymptotic stable response, when the slew converges to zero the slew rate stays bounded close to zero. As a result, satisfying the accuracy requirements will lead to near zero slew rate required by the NISTEx-II star tracker.

III. VTXO SIMULATION
A high-fidelity, six-degree-of-freedom, nonlinear Monte Carlo simulation has been created using MATLAB to model the VTXO transient phase. The elements included in the simulation are orbital dynamics of VTXO, uncertainties in the model (variation in the inertial momentum), external disturbances (gravity-gradient torques, random accelerations, J2 gravity model, and torques to account for drag, solar pressure, higher-order-gravity terms, etc.), reaction wheels, the misalignment, noise, and bias associated with the reaction wheels, and GNC.
The following simplifications are applied to the model to reduce the computation and the simulation time T Q • To reduce the computation, measurement noise and state estimation observer are not included in the simulation as the sensors' accuracy are high. As a result, the true states x x x are equivalent tox x x.
• ACS and relative position control are decoupled from eachother as the states of relative position and attitude system can be controlled independently in the transient phase. However, for the precision formation during the science observation, ACS and relative precision control are coupled [6].
• In the transient phase, each spacecraft is controlled individually.
• Since the OS and IS are close to each other (max distance is 1km) compared to their altitude, and since OS and IS have the same GNC and characteristics, the rest of the paper is based on one spacecraft. The inclusion of noise in the dynamics is regularizing the ML, which reduces the chance of over fitting for the ML.
Not including these simplifications require higher computation resources which can be achieved by using HPC, which is left as the future work.

A. ACCURACY REQUIREMENT FORMULATION
Percentile is used to characterize the accuracy requirements. P 99 (e) is used to characterize the accuracy requirements for the slew error. Using P 99 , 99 percent of times the mission meets the accuracy requirements.

ATTITUDE REPRESENTATION AND NOTATION
The estimated value or the value in the flight computer for the true variable x is denoted asx, and the measured value for x is denoted asx. diag(x x x) represents the diagonal matrix with elements of vector x x x in the diagonal.
The Earth coordinate frame is the Earth-centered inertia (ECI) frame and the frames used for the satellites body frame F b are the Local-Vertical-Local-Horizontal (LVLH) frames shown in Fig. 3.
The spacecraft position in the body frame is denoted as r r r as shown in Fig. 2 and Fig. 3. Quaternions are used to model the attitude dynamics of equations [1]. Quaternion q q q is a four-element vector with a three-element vector q q q 1:3 and a scalar part q 4 . The quaternion unity constraint is The orientation of frame F a to frame F b through Euler angle θ and Euler axis υ υ υ is expressed through quaternion as The associated attitude matrix corresponding to the quaternion q q q ba is shown as R R R ba . ⊗ is quaternion multiplication operator defined by (3). Quaternion multiplication between reference frames are q q q ac = q q q bc ⊗ q q q ab (4) Equation (4) corresponds to the multiplication of attitude matrices as Small rotations θ can be written in terms of attitude matrix R R R as Small rotations θ can also be written in terms of quaternion q q q as where θ θ θ = θυ υ υ, and I I I is an identity matrix. The cross product The identity quaternion is

C. ORBITS AND THE DESIRED TRAJECTORIES
The baseline flight dynamics of VTXO uses highly-elliptical supersynchronous geostationary transfer orbit with a 32.5-hour period for providing a 10-hour observation in the apogee. The 5 Keplerian elements are the same for OS and IS except the eccentricity γ . The eccentricity of OS and IS are designed to include a few minutes of buffer between the time the OS and the IS pass the point where the orbits intersect, avoiding a collision between satellites. A larger difference between the eccentricities results in a lower risk of collision, since the satellites would have longer relative distances. However, this results in a higher energy consumption that is needed to keep the desired 1 km relative distance between the satellites. The list of objects to be observed by VTXO are given in Table 1. This table of desired objects for VTXO can be updated in the future.
The desired attitude trajectory to observe the desired objects given in Table 1 is denoted as q q q f . q q q f doesn't vary by time, and it just switches from one space object to the next one when the science observation satisfies the time of observation.

D. DYNAMICS
q q q bi represents the orientation of the spacecraft body frame with respect to the earth Earth-centered inertial (ECI) frame, and ω ω ω bi b corresponds to slew rate of the spacecraft body frame with respect to the inertial frame. For the rest of the paper, q q q bi and ω ω ω bi b are shown as q q q and ω ω ω respectively for simplicity. The states x x x defined for ACS are quaternion and angular velocities.
x x x at the beginning of the transient phase are the initial slew and initial slew rate for the transient phase denoted as x 0 x 0 x 0 , and x x x at the end of the transient phase is denoted as x f x f x f . A time variant modeling error δJ J J is considered in the nominal inertial momentumJ J J to model the changes in the true value of inertial momentum J J J as the following The total applied torque to the spacecraft due to control input, noise, and disturbances is denoted as τ τ τ . The equations of motion for the attitude dynamics of spacecraft arė where τ τ τ is τ τ τ = τ τ τ in + τ τ τ g + w w wω ω ω (14) τ τ τ in corresponds to the control input, w w wω ω ω corresponds to external disturbances from the space environment modeled as Gaussian white noise, and τ τ τ g corresponds to the time variant gravity gradient torque. τ τ τ in derives the quaternion q q q to the desired quaternion q q q f and ω ω ω to zero, which is equal to reaching x x x e .

E. EXTERNAL DISTURBANCES AND GRAVITY GRADIENT TORQUE
Gravity gradient torque τ τ τ g is derived from point mass gravity models [1], [78] as the following τ τ τ g = 3µ |r r r| 3 n n n × (J J Jn n n) where n n n is the nadir vector in the body frame F b , and |r r r| is the radial distance of spacecraft from earth. w w wω ω ω corresponds to the random torques, J2 gravity model, and torques to account for drag, solar pressure, higher-ordergravity terms, etc. w w wω ω ω is modeled as zero-mean Gaussian white noise process where the power of the noise is captured in the variance as σ 2 ω ω ω [79].
δ(t − t ) is the Dirac delta function defined as σ 2 ω ω ω are chosen based on the orbit and the size of spacecraft. If the spacecraft are bigger or closer to earth, σ 2 ω ω ω is chosen larger due to higher drag forces and other torques. Choosing the external disturbances on the spacecraft is shown by Lear [3], [79], where the estimated disturbance corresponds to the expected downrange and attitude error after one orbit.

F. ACTUATOR MODEL
In VTXO, reaction wheels are used for ACS and thrusters are used for relative position control as the primary actuators in GNC for both IS and OS. The torque generated by the reaction wheels τ τ τ in uses the commanded torqueτ τ τ in given by the control law. In τ τ τ in , Gaussian white noise w w w τ , bias b b b τ , scale factor f f f τ , and misalignment τ are included as the following The variance σ 2 w w w τ captures the power of the noise in the random noise w w w τ as

G. CONTROL LAW
A nonlinear global asymptotic stability Lyapunov controller [1], [80] is defined as the control law. Lyapunov stability theorem is given in the following Definition:Global Asymptotic Stability A point is considered to be an equilibrium point x x x e for the system ifẋ x x(t) = 0 for all t. x x x e for the system (15) is global asymptotic stable if the positive scalar function V (x x x) satisfies the following conditions is a Lyapunov function and the system is stable. LaSalle's theorem can prove the asymptotic stability.
The control lawτ τ τ in is proved to be global asymptotic stable using Lyapunov stability theory when the following simplification assumptions hold Which corresponds to the system (13) with no noise, disturbance, and uncertainty considered in the system. In the simulation, noise, disturbances, and uncertainty are considered and the stability for the closed loop system is shown using Monte Carlo simulation.
substituting (22) into (13) and using the simplifications (30) and (31), the closed loop system is obtained by (27)  The following function is chosen as the candidate Lyapunov function At the x x x e , V (x x x e ) = 0, which satisfies the first condition.
Since k 1 and k 2 are positive scalars, the function (33) SinceV (x x x) <= 0, V (x x x) is a Lyapunov function and the system is stable. Asymptotic stability is be proven using LaSalle's theorem.V (x x x) = 0 at ∂q 4 = 0 and ω ω ω = 0. At ∂q 4 = 0, V (x x x) is maximum with a 180 deg rotation (θ = π) (2). ∂q 4 = 0, is not an equilibrium point and ∂q 4 = 0 corresponds to ∂q ∂q ∂q 1:3 = 1 1 1 because of quaternion unity constraint (1). x x x e is an invariant set since it is the equilibrium point, and using LaSalle's theorem [1], the limit for ω ω ω is obtained as Using (36) in the closed loop dynamic equation (32), the following asymptotic condition for x x x e as an equilibrium point is obtained as lim t→∞ ∂q ∂q ∂q 1:3 = 0 As a result, the control law (22) is globally asymptotic stable without the consideration of external disturbances and uncertainties. In the stability section, Monte-Carlo simulation is used to show the stability of system. x 0 x 0 , w , and x f x f x f are given to the SA optimization with the optimization variable z z z, and SA minimizes e and E given the objective function χ . The data produced in this phase is used for training the ML, and the trained ML can estimate the optimalŷ ŷ y .

IV. ARTIFICIAL INTELLIGENCE FRAMEWORK
The goal of AI is to provide ACS with optimal adaptive values for k 1 , k 2 , and T based on x 0 x 0 x 0 , x f x f x f , and w to satisfy the mission requirements and maintain E, e, and T minimal. k 1 , k 2 , and T are optimization variables denoted as z z z. The optimal estimated value for z z z is denoted asẑ z z. Fig. 4 illustrates the stages involved in the AI.
The AI has 3 phases: • Optimization and data production phase • ML phase • Implementation phase In the optimization and data production phase, 7893 data are produced. For each data, the objective function χ, which is the weighted sum of E and e given by (38), is optimized with the optimization variables z z z for each set of x 0 x 0 x 0 , w, and x f x f x f . In the ML phase, the inputs to ML are w and x 0 x 0 x 0 , and the outputs of the ML are the E, e, and z z z defined as y y y = [E, e, z z z].
The optimization and data production phase and ML phase are implemented offline, whereas the implantation phase is implemented in the real-time in space using FPGAs. In the implantation phase, the values of x 0 x 0 x 0 obtained from attitude estimation algorithms [1], [4], denoted byx x x 0 , and the value of w gives the estimated optimal optimization variable z z z denoted byẑ ẑ z. The ML estimates E and e, denoted byÊ and e respectively. The estimated y y y used in the implementation phase is denoted byŷ ŷ y. It is shown later that DNN is chosen to estimateŷ ŷ y. The DNN is incoprated into FPGA, and the input to the FPGA isx x x 0 , w and the output of FPGA isŷ ŷ y.

A. PROBLEM FORMULATION AND OPTIMIZATION
The objective function χ for the optimization is (38) χ = E + we (38) where w weights e over E. e is obtained in the last D = 4 seconds (39) to minimize the steady state tracking error in the transient phase. The value for D is chosen to be low to reduce the computation cost and represent the steady state VOLUME 10, 2022 error. Variable e is defined as (39) where e e e(t) is the absolute difference between the desired attitude and the actual attitude. T end is the end of the transient phase when the science observation phase begins. Variable E is defined as

e e(t) e e e(t) dt
T 0 represents the time the transient phase begins. It is shown later that it takes maximum 129.26 s for the ML to perform the transient phase. After the transient phase, which takes place for the estimated T by the ML, the science observation begins. SA defines the constrained variable T , and T end is defined as The constraints in SA are the constraints related to the dynamics of the system, the unity norm of the q 0 q 0 q 0 , and the range of the variables. The optimization structure for SA is formulated as The upper and lower values for the optimization variables z z z and x 0 x 0 x 0 are x The lower values for k 1 and k 2 are greater than zero since in the Lyapunov function they are positive scalar for global asymptotic stabilty. N and U represent a Gaussian and a uniform distribution, respectively. The distribution parameters c c c l and c c c u represent the upper and lower values for the uniform distribution, and µ µ µ, σ σ σ are the mean and standard deviation given in In [4], it is shown for a 22.5-hour formation stabilization phase, ω 3 reaches 0.59 rad/sec. As a result, 0.6 is chosen as the 1-sigma of ω ω ω with zero mean distribution (46) for the data production in the transient phase. In the formation stabilization phase, q i , (i = 1, 2, 3, 4) varies between -1 and 1. As a result, the transient phase initial quaternions q q q 0 distribution is a uniform distribution between -1 and 1 (44).
w is considered to be more centered over 1 and 5 with 700 data points equal to 1, and the rest of data is produced by 50% probability with Gaussian distribution and 50% probability with uniform distribution. z z z upper and lower values depend on the ACS, and they are chosen to provide the ACS with with a minimal e and E for a less number of iterations in SA optimization. SA with the optimization variables z z z is used to produce the data shown in Algorithm 1.

Algorithm 1: Simulated Annealing Optimization for the Data Production
Result: Data matrix: x f x f and w with the distribution in (42); 2)Initialize z z z; 3)SA minimises χ and obtain y y y given the ACS equations and the constraints ; x f x f , y y y] to the data matrix; end SA (Algorithm 1) is used to produce optimal k 1 , k 2 , and T for each data. 7983 data are produced when the desired trajectory is the Crab Pulsar in Table 1, and the rest of the paper is based on that. 1000 data are produced for all the other desired trajectories, and they show similar results.

B. ML PHASE
DNN, GP, and support vector regression (SVR) ML algorithms are trained on 7893 data produced in the data production phase. Both mean absolute percentage error (MAPE) and mean absolute error (MSE) are used for showing the estimation accuracy of DNN, GP, and SVR. k-fold cross validation with k = 4 is used to measure MAPE and MSE. As a result, 25% of the data is used as the test data.

1) DEEP NEURAL NETWORK
DNN is composed of connected neurons in multiple layers. The inputs of the DNN belong to R 8 , and they are q q q 0 , ω ω ω 0 , and w. The outputs of the DNN belong to R 5 , and they are E, e, T , k 1 , and k 2 . The layers can vary between 1 to 4 layers, and the neurons can vary between 10 to 2560 with a uniform distribution and with logarithmic spacing as the following where n i shows number of neurons at the spacing i which varies between 0 and 8. The activation function for the hidden layers are chosen between rectified linear units (ReLU) and sigmoid. This choice is also made through cross validation. ReLU activation functions outputs the positive part of its argument as And the nonlinear sigmoid function sig(x) is given by where x is the input to a neuron. Since all the outputs are positive, the ReLU is chosen as the activation function for the output neurons. The hyperparameter of the DNN's structure are number of layers, number of neurons at each layer, and activation function. The parameters weight and bias of the DNN are optimized through the Adaptive Moment Estimation (Adam) [81] and Nesterov-accelerated Adaptive Moment Estimation (Nadam) [82] optimizer algorithms.
The optimization criterion consists of minimizing the MAPE through the Adam and Nadam algorithms. Adam optimization is a stochastic gradient descent based method that uses an adaptive estimation of first-order moment (the mean) and second-order moments (the uncentered variance). The parameters of ADAM are learning rate µ = 0.001, = 10 −7 for numerical stability, and β 1 = 0.9 and β 2 = 0.999 are the exponential decay rate for the 1st and 2nd moment estimates respectively.
The parameters of Nadam are learning rate µ = 0.001, = 10 −7 for numerical stability, β 1 = 0.9 and β 2 = 0.999. While training the DNN, 20% of the training data is used as a validation data set for the early stopping.
The parameter of the algorithm to be defined is patience, which is the number of epochs with no improvement in MAPE after which training will be stopped. Number of batches, number of epochs, l1 regularization, kernel constraint should be cross validated. For kernel constraint, MaxNorm(x) function is used which limits the maximum norm of the weight vector for the layer with x. As a result, the DNN's weights are bounded.
The initial values for the parameters weight and bias of DNN are chosen randomly using a Gaussian distribution of zero mean and variance 0.02. DNN multi-output and single-output structure are trained on the data. Single-output DNN provides the minimum MAPE compared to DNN multi-output. The hyperparameter optimization problem obtains the minimum MAPE during the validation phase. The hyperparameters of the DNN's structure and algorithm parameters are: number of layers, number of neurons at each layer, epochs, batches, weight initialization, l1 regularization, kernel constraint, activation function for the hidden layers, and patience. Adding other regularization methods including batch normalization and dropout doesn't reduce MAPE.
GS is used to find the optimal number of layers, number of neurons at each layer, epochs, batches, patience, and weight initialization. RS is used to find the optimal dropout value, l1 regularization, kernel constraint, number of layers, number of neurons and activation method. The coarse-to-fine optimization approach is to solve the hyperparameter optimization problem first with GS. Next, the optimal solution from GS is used to solve the hyperparameter optimization problem with RS on the 20 cores, and finally the RS's solution is validated on ALCF.
The permutation of the DNN's structure and algorithm parameters values are given in the list P as the search-space for RS with the index of the list i as the optimization variable. As a result, RS finds the optimal configuration P opt , which solves the hyperparameter optimization problem.
RS selects i at random with a uniform distribution. The termination criterion is when the number of iterates k reaches the value of the variable T Iter . At each iteration performed in the RS, DNN is trained and the value of MAPE is obtained.
When the difference in the reduction between the current MAPE and the previous MAPE is more than a threshold T Drop , T Iter increases with a constant and a variable denoted as k Iter and span respectively.
The value for k Iter , the initial value for T Iter , the initial value for MAPE, the initial value for span, and T Drop are defined so that convergence with a low number of iterations is obtained.
The advantage of this termination criterion is increasing the maximum number of search iterations as long as MAPE is decreasing more than T Drop . As a result, for each output in y y y, when MAPE converges to the optimal MAPE, MAPE doesn't decrease more than T Drop and the termination criterion is met. Algorithm 2 shows the RS.

2) HIGH PERFORMANCE COMPUTING FOR DEEP NEURAL NETWORK
Using HPC, RS and k-fold cross validation are both parallelized on THETA cores for DNN. The search-space P, which is the permutation of the DNN's structure and algorithm parameters values, is split into b groups equally. Each combination of these parameters values is a configuration. N ranks are assigned to solve hyperparameters optimization problem. N is divided into b groups, where each group has k-fold ranks. VOLUME 10, 2022 Algorithm 2: Randomized Search 1) Initialize the optimization parameters and iteration index k = 0; 2) Randomly choose the initial value i 0 and define MAPE(P i 0 ); 3) Generate the integer V k+1 ∈ [1, |P|] randomly with a uniform distribution ; End; 6) If k reaches T Iter , stop. Otherwise, increment k and return to Step 3.
For each group, the cross validated MAPE is obtained when all the ranks assigned to that configuration return MAPE. RS solves hyperparameters optimization problem for each group, and the minimum MAPE is chosen. As a result, the time of computation is decreased when b increases with the cost of more computation resources. When b reaches the size ofP, RS turns into a GS on ALCF.

3) GAUSSIAN PROCESS
The kernel function is chosen among the linear dot product (LDP), the exponential sine square (ESS), the Matérn, the radial basis function (RBF), and the rational quadratic (RQ) [83]. The GP covariance function includes a linear combination of a kernel function, a constant term and a white noise term. The kernel function is chosen so that minimum MAPE is obtained.

4) SUPPORT VECTOR REGRESSION
The hyperparameter to be optimized for the SVR are kernel, gamma, epsilon, and C. Each hyperparameter is defined in the following • Kernel function: The function to map the data into a higher dimension data.
• ε: The margin where no penalty is applied to the error in the loss function.
• C: Regularization coefficient. Lower C corresponds to higher regularization. The polynomial kernel with degrees more than 3 and using the polynomial kernel for estimating k 1 take more than 3 days to train for the GP. Thus the polynomial kernel is removed from the kernel search space.

V. MONTE CARLO SIMULATION EXPERIMENTS
The goal of the Monte Carlo simulation experiments for the VTXO transient phase are three folds as the following   • The Monte Carlo simulation is used to analyse the performance and stability of the closed loop system.
• Monte Carlo simulation produces the data required for the ML.
• Monte Carlo simulation is used to design controller gains and T using the mode of data set. For the simulation, the Keplerian elements are eccentricity γ , semi-major axis a, inclination i, right ascension of the ascending node , and argument of periapsis ω. The Keplerian elements are given in Table 2.
The IS and OS are 6U CubeSat with inertial mass 10.2kg and nominal inertial momentum matrixJ J J as The time variant modeling error δJ J J in J J J (11) for both OS and IS is modeled as The variance of external disturbances and reaction wheels modeling parameters for both OS and IS are given in Table 3.
The initial values in RS Algorithm 2 are given in Table 4.

A. ACS SOLVER
The solver plays a critical role in producing the data since it defines the accuracy of the data and the time it takes to produce the 7983 data. Compared to the other solvers such as ODE 45 or ODE 23s in MATLAB software, fourth-order Runge-Kutta method (RK4) with a fixed time step provides the ACS equations with more accurate solutions in less time. The time step in the RK4 solver is tuned so that the ACS solution is obtained in a short time while maintaining the quaternion unity norm constraint(1) (q q q T q q q = 1) for each k 1 and k 2 . The bigger the time step is, the less time it takes to solve the ACS ODE equations while making the solution unstable. As an example, the solution to the ACS equations are not stable for the controller gains with k 1 = 2, k 2 = 6 with the time step dt = 0.05. Reducing the dt = 0.05 to dt = 0.01 or setting K 1 = 1 makes the solution stable. dt is chosen so that 97% of data is stable in a reasonable time. Decreasing the value dt leads to all the data have stable solutions with the cost of high computation. The 300 iterations in SA solve the problem of 3% of unstable solutions. The reason is that when the ACS solution is not stable in a specific iteration of SA, that iteration produces large values for χ, which is not chosen as a minimum solution for the optimization.
The time to produce data is denoted as T Q . T Q of the data production phase is dependant on the 300 iterations of SA and dt of ACS solver. T Q of the data production phase is 0.51 h on average.
The goal in the data production is to maintain high accuracy in the data while reducing the computational expense. 300 iterations, which lead to the average and the variance of the data converge to its minimum, is chosen as a stopping iteration. 0.8% of the data with the highest values for χ are removed from the dataset since they corresponds to the data that are not optimized well and removing them increases the accuracy of ML significantly. ML estimates the removed data.

B. DATA ANALYSIS
The Pareto front obtained from multi objective genetic algorithm for minimizing E and e is given in Fig. 5 for a random x x x 0 . Fig. 5 illustrates the inverse relationship between e and E since spacecraft requires higher energy to reach lower error in a specific time.     6 shows the inverse relationship between e and E for all the data. Fig. 6 inherits similarity to to Fig. 5 since both optimization lead to the minimum inversely related e and E. Fig. 7 shows that the data are positive and bounded. The mode of the data set is between the maximum and minimum close to average except for T , which has its mode equal to its maximum. This data distribution shows that the T is constrained and not minimized as its mode is close to its maximum. Table 5 shows the statistics of the data. Max, min, and abs corresponds to maximum, minimum, and absolute value respectively. P k is the percentile of a variable such that k represents the percentage in the percentile. In P k (x) = Tr, k% of cases from the distribution of variable x are less than or equal to the value Tr. Table 5 shows that the maximum of e and E are not close to their P 99 , and it is caused by ACS solver, noise in the dynamical system, SA optimization Algorithm 1, and the characteristic of the data distribution.
When e converges to zero, ω f converges to zero with the distribution shown in Fig. 8. Table 5 shows that the upper P 99 and lower values P 1 for the ω f are less than 1 deg/sec close to zero. P 99 (e) of the data shows the requirement of the mission is met by the optimization data, where P 99 (e) = 0.067 deg < 0.09 deg. P 99 (E) = 0.37 J, which is the energy consumption for one orbit, shows the energy consumption is maintained low for the mission. With the current technology, this energy can be provided for numerous number of orbits. Similar to MPC, the optimization can be actively applied to the mission with the advantage of meeting the mission requirements. The disadvantage is the that it takes on average 30 mins to do the optimization.

VI. DISCUSSION
In this section the results of the given methods are shown and the bench marking section compares the results.
A. DEEP NEURAL NETWORK Table 6 and Table 7 show the global optimal solution to the hyperparameters optimization problem for GS and RS, respectively, when the cross validation is running in series in RS and GS. Table 6 and Table 7 show MAPE and MSE for the testing dataset. As Table 6 and Table 7 show, GS can not satisfy the accuracy requirement R2 B since for the GS P 99 (e) = 0.0932 deg > 0.0900 deg, while RS satisfy the accuracy requirements. Comparing Table 6 and Table 7 shows  the effectiveness of the coarse-to-fine hyperparameter optimization for the VTXO mission.
T Iter in Table 7 shows that the controller parameters converge to the optimum solution the fastest, and T converges to the optimum solution the latest due to the distribution of data.

B. HIGH PERFORMANCE COMPUTING FOR DEEP NEURAL NETWORK
Using HPC in ALCF for solving hyperparameters optimization problem for E with different values for b, MAPE = 15.007 is obtained as the minimum, which is close to MAPE = 15.07 given in Table 7. As a result, Table 7 is verified to provide the mission with relatively the lowest MAPE for DNN with the given search-space P. When b reaches the size of P, T Q reduces 25 times to solve the hyperparameters optimization problem for E with the cost of more computation resources. Fig. 9 illustrates the optimal MAPE with respect to each one of the kernel functions, and it shows the MAPE for each one of the estimators in y y y.

C. GAUSSIAN PROCESS RESULTS
The optimal MAPE and MSE for each kernel is given in Table 8.      Table 8 show that RQ and Matérn provides the least MAPE for the estimation of the outputs in y y y with a similar gamma distribution, which are e, E, k 1 , k 2 . T is estimated by LDP kernel. RBF kernel shows the least spread of MAPE for y y y among using other kernels. Table 9 shows the optimal SVR hyperparameters, minimum MAPE, and the optimization parameters for each output in y y y. Table 9 shows SVR provides the least MAPE using RBF kernel. Table 9 shows that the optimal value of the regularization parameter C in SVR is small 0 < C ≤ 1 for the estimation of the outputs in y y y with a similar gamma distribution, which are e, E, k 1 , k 2 . The optimal value of the regularization parameter C for estimating T is obtained as 1000.

D. SUPPORT VECTOR REGRESSION RESULTS
Similar to GP, the accuracy requirements are closely reached for e but not satisfied using SVR.

E. MACHINE LEARNING RESULTS COMPARISON
As Fig. 10 and Table 10 compares MAPE for different ML methods, DNN provides the lowest MAPE and GP relatively   the highest MAPE for y y y estimation. As a result, DNN is chosen to find the optimal quantity of data.

F. OPTIMAL DATA QUANTITY
For each portion of 7893 pieces of data, the optimal DNN models given in Table 7 are used to obtain MAPE. Fig. 11 shows the convergence of MAPE with respect to each portion of data, which shows that 7893 pieces of data are enough for training the ML. The MAPE plateaus when it reaches 0.8 of the whole data. VOLUME 10, 2022 Table 11 compares P 99 of the y y y's values for DNN, SVR, and GP. Table 11 shows that only DNN can provide e < 0.09, and so the accuracy requirement R2 B is not satisfied with SVR and GP. As a result, DNN is chosen for the VTXO mission. P 99 of y y y for DNN, GP, and SVR are close to each other. It shows that the 3 ML algorithms provide similar results for y y y estimation since the amount of data is not high.
G. BENCH MARKING AND ACCURACY REQUIREMENT CHECKLIST 5 approaches to the mission and designing the control system, denoted as A i , are compared in Table 12.
• A 1 : There is no optimization and ML implemented. k 1 , k 2 , and T are chosen randomly with a uniform distribution. The upper and lower value for k 1 , k 2 , and T are the same range given in (43). Monte-Carlo simulation is used to produce 10000 data for defining P 99 (e), P 99 (E), and T Q .
Monte-Carlo simulation is used to produce 10000 data for defining P 99 (e), P 99 (E), and T Q .
• A 3 : In this approach, SA optimization is applied directly to the mission to obtain the optimalẑ z z.
• A 4 : Based on the histogram of the data given in Table 5, the optimal value for Monte-Carlo based controller and optimal T are estimated to be the mode of the data set in the histogram (53) to keep e and E minimal. P 99 (e) = 0.15 deg satisfies R1 and R2 A , but doesn't satisfy R2 B . As a result, the camera is not stabilized enough for starting the observation phase. Fig. 12 illustrates the histograms of e and E when k 1 = 0.085, k 2 = 0.25, T = 72s. P 99 (E) = 1.95 J is 5 times higher than doing the optimization on board for each spacecraft. Fig. 12 shows that the average of gamma distribution e is higher than the accuracy requirement.
• A 5 : In this approach, the data from the Monte-Carlo simulation using SA optimization is used to train the ML, and ML does the estimation ofẑ z z. For each A i , Table 12 Table 12, no represents the accuracy requirement is not satisfied, and yes represents the accuracy requirement is satisfied. Table 12 shows that the mission approaches A 3 and A 5 satisfy the accuracy requirement. e, E, T , and T Q in Table 12 are more discussed in the following 1) e ANALYSIS A 1:2 show that choosing k 1 and k 2 and T with a uniform distribution or constant as T = 72 s with a uniform distribution leads to P 99 (e) that don't satisfy the accuracy requirement. However, P 99 (e) of A 2 compared to A 1 is marginally lower since T is marginally higher for A 2 with the cost of higher  P 99 (E) for A 2 . Increasing T reduces e and increases E since more E is used in a longer T to reduce e. Using DNN, the summation of each spacecraft's P 99 (e) corresponds to 0.1734 deg, which satisfies the requirements of the mission. As a result, ML meets the requirement of the VTXO mission. A 5 provides the mission with the lowest e.

2) E ANALYSIS FOR THE VTXO MISSION
A 1:2 shows that choosing k 1 and k 2 and T with a uniform distribution or constant as T = 72 s leads to a higher P 99 (E) compared to A 3:5 . P 99 (E) of A 2 is marginally higher than A 1 since P 99 (T ) is marginally higher for A 2 . A 5 provides the mission with the lowest E. In the scientific phase, sliding mode controller (SMC) provides the mission with E = 0.0066 J/hr while maintaining the desired e [4], [16], [17]. In the formation stabilization phase, E is negligible [4], [15], [16] compared to the other two phases as it is in the order of e − 5 J. Since For each orbit a 10-hour formation is maintained in the scientific phase, E is roughly 0.066 J for the science observation phase. L i is P 99 (E) for each orbit defined as L i = P 99 (E)ofA i + 0.066, (i = 1, 2, 3, 4, 5) (54) Fig. 13 compares the P 99 (E) of each orbit for each A i . This comparison show that for each orbit, the ML and optimization approaches use similar E, and the ML approach use relatively 90% less E.
To show the ratio of L i to the maximum L i of all A i s, Table 13 compares L i /L 2 for each A i . Table 13 shows that the ML approach L 5 uses relatively P 99 (E) 90% less than A 2 for the orbit. This shows using real-time optimization A 3 only provides roughly 2% less L compared to using ML A 5 . Using Monte-Carlo simulation approach A 4 , L 4 is relatively 64% of L 2 . A 1 and A 2 shows similar values for L. As a result, choosing ML approach A 5 increases the lifetime of ACS for 90% compared to using A 2 .

3) T ANALYSIS
ML meets the requirement of the VTXO mission transient phase with P 99 (E) = 0.47 J in a constrained T , where P 99 (T ) = 129.26 s. Since P 99 (T ) = 129.26 s, the transient phase is set to begin 130 s before the observation phase. ML approach satisfies the requirement of a few minutes for T in the transient phase.

4) T Q ANALYSIS AND THE TIME OF OBTAINING z
It takes P 99 (T Q ) = 22.02s for the A 1 and A 2 to provide the mission with the estimation of e and E. However, it takes only a fraction of second to defineẑ z z. A 3 provides the mission with P 99 (T Q ) = 1.04h using real-time optimization, which makes it impractical for the mission to defineẑ z z. For A 4 , it takes only a fraction of second to estimate the optimalŷ y y using the Monte-Carlo method. For the implementation of ML approach A 5 , FPGA is used which has the latency in the order of microseconds [70], which is acceptable since T is in the range of seconds.
Comparing A 5 and A 4 , ML approach provides the mission with lower P 99 (e) and P 99 (E) which meets the mission requirements. However, A 5 requires offline computation for training ML and requires implementing ML on the hardware of spacecraft. A 5 provides the mission with the lowest e, E, and T with P 99 (T Q ) = 1.04h which makes it impractical for the mission. As a result, A 5 is chosen for the mission which satisfies the mission requirements.

H. TRANSIENT PHASE ACS STABILITY ANALYSIS
In the ML approach, Using ReLU as the activation function for the output layer forces the controller gains k 1 and k 2 to be positive scalars (48). Since ML defines the positive Lyapunov controller gains at the beginning of the transient phase, the controller is globally asymptotic stable without considering the noise, disturbances, and model uncertainties. However, the closed loop system is shown to be stable using Monte-Carlo simulation for all the approaches A 1:4 since P 99 (e) and P 99 (E) are bounded. Using A 3 , A 4 , and A 5 approaches, the closed loop system is stable and P 99 (e) < 0.2 deg and P 99 (E) < 2 J. Having the kernel constraint enforces the weights to be bounded so the DNN outputs are bounded when the inputs are bounded. Besides, the ML gives P 99 (ê) < 0.09 deg and P 99 (Ê) < 0.5 J using Monte Carlo simulation for all the data, which shows the stability of closed loop system using DNN.

VII. CONCLUSION
VTXO mission is divided into different phases for the relative transverse alignment position control and ACS. Instrumentation requirements of VTXO are satisfied with a ML-based ACS. SA optimization provides the mission with a minimal VOLUME 10, 2022 E and e in a minimal constrained T to increase the lifetime of the mission and the duration of observing the space objects. Monte-Carlo simulation is used to design a controller and do sensitivity analysis. Since it takes on average 0.51 h to run the optimization, ML is used to learn the optimum behaviour of the optimization as a surrogate model. The hyperparameters of ML are optimized, and DNN shows the highest accuracy for the estimation. However, there is a confidence interval in the measured performance of GP and SVR that must be estimated in order to be more precise regarding the validity of the systems. Noise and disturbances in the control system is used as regularization in the ML. To reduce e and chattering, ESO can be used in the implementation phase to estimate and compensate for the disturbances and model uncertainties. For a smooth transition between the phases of VTXO and to ensure reliability and stability, an intelligent hybrid dynamic control system will be implemented on the VTXO mission in the future. In the future design of VTXO with the same methodology, the mass and inertial momentum will be upgraded to the design of the mission and the propellant each spacecraft will be carrying. to mentor students. She leads many research projects and recently served as the Deputy Lead Engineer for the Integration and Test of an Innovative Naval Prototype through a Boeing Contract. Formerly, she was the Department of Defense Project Office Director and a Physicist with the Argonne National Laboratory. She was an Associate Director of the Argonne Accelerator Institute. She served as a Technical and Management Consultant on the successful FERMI free-electron laser project at Sincrotrone Trieste. Her research interests include particle accelerator systems, laser systems, the use of artificial intelligence in controls, modeling, and prediction of complex systems, sensors and detectors, and applications of these technologies in science, security, and defense. She recently served as a Co-Lead for the Department of Energy's report-Basic Research Needs for Compact Accelerators for Security and Medicine for the Computing, Controls and Design Technical Group. She is a fellow of the American Physical Society (APS), a Senior Member of the Optical Society of America (OSA), a fellow of the SPIE, and a member of the Italian Optical Society (SIOF). She serves as a reviewer for several journals and served on the Editorial Board for the American Physical Society's Physical Review Accelerators and Beams and IEEE ACCESS. She recently served as a Senior Guest Editor for IEEE TRANSACTIONS ON NUCLEAR SCIENCE and as an Associate Editor of IEEE PHOTONICS. She serves as a technical reviewer on projects worldwide, including as the Chair of the Program Advisory Committee (APAC) of the Brookhaven National Laboratory's Accelerator Test Facility (ATF). She has myriad archival and conference papers and technical documents, holds a U.S. patent, has an international patent pending, and holds an international trademark. Further, she served on a NATO panel for sensors and electronics. In 2010, she was presented a Letter of Commendation by the Chief of Naval Research for her technical efforts. In 2018, she received the IEEE Nuclear and Plasma Sciences Society's Particle Accelerator Science and Technology Award.
JORGE ALBERTO DÍAZ CRUZ (Member, IEEE) received the B.S. degree in electrical engineering from the National University of Colombia, Bogota, Colombia, in 2013, and the M.S. degree in electrical engineering from Colorado State University, Fort Collins, CO, USA, in 2019. He is currently pursuing the Ph.D. degree with the Electrical and Computing Engineering Department, The University of New Mexico.
In 2017, he joined the LCLS-II Low-Level RF Team as a Student with the Stanford Linear Accelerator Center, SLAC, Menlo Park, CA, USA. His research interests include control for RF systems for particle accelerators using artificial intelligence and machine learning. He has attended numerous advanced classes of the United States Particle Accelerator School both as a student and as a grader. He is also a user of the Theta supercomputer with the Argonne National Laboratory.