Heterogeneous Traffic Offloading in Space-Air-Ground Integrated Networks

While the 5-th generation of communication networks (5G) is taking its first deployment steps, high doubts still concern the fulfilment of the stringent requirements of its slices. To meet these complex requirements, robust access networks should support the current 5G air interface. In this regard, space and air networks, namely satellites and unmanned aerial vehicles (UAVs) are expected to play a key role due to wide coverage and flexible deployment, respectively. Currently, the integration of these platforms in the terrestrial networks is weak though. Therefore, we suggest bridging this gap by designing a heterogeneous traffic offloading approach in the space-air-ground integrated network (SAGIN). Our innovative offloading approach covers the co-existing requirements of two heterogeneous slices of 5G by offloading smartly the traffic to the appropriate segment of SAGIN. Specifically, the ultra-reliable low-latency communications (URLLC) traffic is offloaded to the UAV link and to the terrestrial link to satisfy its stringent requirements in terms of latency. However, the enhanced mobile broadband (eMBB) traffic is offloaded to the UAV link, to the terrestrial link and to the satellite link because it is less sensitive to delay but needs high data rates. Our offloading approach boosts the network’s availability and reduces the latency experienced in SAGIN through an efficient resource allocation and an optimized design of the UAVs trajectory. Our findings highlight the key role that the concrete integration between SAGIN segments plays to achieve a better quality of service (QoS) for different slices with heterogeneous requirements.


I. INTRODUCTION
T HE fifth generation of communication networks (5G) and beyond is envisioned to address various slices with heterogeneous requirements [1]. Among these slices, ultrareliable low latency communication (URLLC) is receiving tremendous attention in academia and industry because it enables new applications and emerging services such as autonomous factory, inter-vehicular communications and ehealth. URLLC addresses the intermittent transmissions that impose stringent requirements in terms of reliability (the acceptable dropping rate is less than 10´5) and low latency (1ms) [2], [3]. These requirements make URLLC a challenging implementation scenario though, especially since 5G should simultaneously answer different needs imposed by the other slices. For instance, the enhanced mobile broadband (eMBB) slice of 5G, which addresses the stable connections with large payloads over extended time intervals, demands high data rates but is less sensitive to reliability [1].
To face these challenges, various technologies such as Terahertz communications, optical wireless communications, free-space optical communication, and dynamic network slicing should support the 5G networks [4]. Among these technologies, space-air-ground integrated network (SAGIN) is expected to stand out as a key player [4], [5]. SAGIN consists of three network segments: the space segment which includes the satellites, the air segment which includes the unmanned aerial vehicle (UAV) and the ground segment which includes the terrestrial communication networks [5]- [7]. These segments can cooperate and complement each other through their own privileges [6], [7]. For instance, satellites have the privileges of wide coverage and wide bandwidth. However, UAVs have the privileges of scalable deployment, mobility, low latency and reliability [8], [9]. Hence, space and air segments can support the terrestrial network to increase the users' connectivity and hence improve the service continuity, especially in under-served and rural areas. Moreover, space and air segments can support the terrestrial network to increase the backhaul capacity and hence improve the network's availability, especially in dense areas where a large volume of data is exchanged constantly.
Therefore, we observe that a concrete and self-adaptive integration of space and air segments in the terrestrial network could help fulfill the complex and heterogeneous slices of 5G while reducing considerably the service costs [5], [7]. The main integration approach in the literature is traffic offloading in SAGIN [9]- [14]. For instance, in [9], a sensor offloading framework was developed to design the UAVs trajectories while maximizing the collected data from different sensors and considering the limited energy on-board of UAVs. In [10], an offloading scheme, which schedules the offloaded tasks of the edge users between the ground base stations and the UAVs, was presented. The authors investigated the role of UAVs to improve the average throughput and the spectral efficiency during tasks offloading. In [11], the authors optimized the ground base stations and UAVs locations in MEC. An online offloading algorithm was developed to minimize the energy consumed by the user equipment (UE) through appropriate user association and resource allocation. In [12], service provisioning to the Internet of things (IoT) devices was facilitated by using a collaborative offloading scheme between UAVs and edge servers in MEC. The offloading scheme reduces the service delay experienced by IoT devices and the energy consumed by UAVs. In [13], a traffic offloading scheme in integrated satellite-terrestrial networks (ISTN) was proposed to maximize the number of accommodated users and their sum-rate under a dynamic backhaul capacity constraint. The proposed scheme schedules the resources available in both networks to optimize users' association and to increase the dynamic backhaul capacity. In [14], a SAGIN architecture was studied where the air network nodes served as flying edge servers in the air segment. However, the low earth orbit (LEO) satellites in the space segment connected the IoT devices with the cloud servers. The proposed architecture helps the remote IoT applications to decide on the typical offloading location, namely locally, at the air segment, or at the space segment. Although huge efforts were invested in the topic of traffic offloading in SAGIN, few shortcomings can be spotlighted. Indeed, most of the current research studies separately air platforms [9]- [12], space platforms [13] and terrestrial platforms [15]. Moreover, energy efficiency is the main focus point in the literature [11], [12], [14], [16] and the UAV trajectory is usually predefined in SAGIN [14]. However, the high traffic assignment and the dynamic network condition and uncertainties in 5G were widely neglected. Besides, 5G slices, namely eMBB and URLLC, and their specific and heterogeneous requirements in terms of delay and reliability were largely overlooked during offloading. Indeed, offloading the optimal amount of each traffic type while meeting its needs is challenging. Specifically, scanty offloading results in high dropping ratio for eMBB, whereas excessive offloading increases the load of SAGIN segments and leads to supplementary delays for URLLC. Hence, it is crucial to design wisely traffic offloading in SAGIN and to assign adaptively and efficiently the necessary communication resources.
To overcome the previously discussed limitations, we propose in this paper a traffic offloading approach in SAGIN, which hinges on concrete cooperation between satellites, UAVs, and terrestrial networks. Our offloading approach steers smartly the traffic towards the suitable network segment through an efficient resource allocation and an optimized design of the UAVs' trajectories. Specifically, the URLLC traffic is offloaded to the UAV segment and to the terrestrial segment to satisfy its stringent requirements in terms of latency. However, the eMBB traffic is offloaded to the UAV segment, to the terrestrial segment and to the satellite segment because it is less sensitive to delay but needs rather high data rates. The amount of the traffic offloaded to the different segments of SAGIN is intelligently programmed. Additionally, the trajectory of the UAVs and the resource allocation are wisely designed to improve the availability and reduce the latency in SAGIN. To the best of our knowledge, our paper is the first to consider offloading different traffic types and to answer their respective requirements in SAGIN. We propose an action-refined deep reinforcement learning (DRL) approach to adjust the offloading strategy by dynamically observing the information of the system environment. Specifically, we apply the deep deterministic policy gradient (DDPG) algorithm, which embeds the actorcritic method [17]. The main contributions of this paper can be summarized as follows: ‚ We propose an innovative offloading approach in SA-GIN that covers various aspects of the quality of service (QoS) required by heterogeneous traffic types and imposed by the use cases of 5G. Specifically, our offloading approach answers the challenging and co-existing requirements imposed by eMBB slice and URLLC slice respectively in terms of high data rates, high reliability and low latency. ‚ We take advantage of the concrete cooperation between SAGIN segments by smartly offloading the traffic to the appropriate segments of the SAGIN system while optimizing the resource allocation and efficiently designing the trajectory of the UAVs. We formulate our sequential offloading approach as a multi-objective optimization problem and we solve this complex and dynamic problem by applying the action-refined DRL approach. ‚ Our findings highlight that the proposed offloading approach significantly enhances the network's availability and the experienced delay and underline the pivotal role that space and air segments play to boost the QoS in terrestrial networks. is formulated. In section IV, we present our DRL algorithm to solve this problem. In section V, we evaluate our proposed approach and we analyse the obtained results. Section VI concludes the paper. Notations: Lower case boldface letters denote vectors while upper case boldface letters denote matrices. The transpose of matrix A is denoted by A T . ∥¨∥ denotes the Euclidean norm. Ep¨q denotes the expectation over the time sequence.

II. SYSTEM MODEL
We consider a space-air-ground integrated network (SAGIN) architecture that consists of a satellite, N UAV, W macro base stations, and V micro base stations. The network architecture is depicted in Fig.1. The satellite is characterized by a link capacity per beam C sat and a visibility period ∆t. During ∆t, the satellite is fixed. Let us decompose ∆t into T time slots of duration δt such that ∆t " Tˆδt. The UAVs are in motion during ∆t. The location set of the UAVs is denoted as Lrts " tℓ 1 rts,¨¨¨, ℓ n rts,¨¨¨, ℓ N rtsu, where ℓ n rts " px n rts, y n rts, z n rtsq T represents the coordinates of the n-th UAV. The UAV's altitude z n is fixed. Each UAV n is characterized by the data rate R UAV v,n of its respective link with the micro base station v. The macro base station w P t1,¨¨¨, W u serves v w micro base stations and each micro base station v P t1,¨¨¨, V w u serves I v users. The terrestrial backhaul between the micro base station and the macro base station is characterized with a link capacity C ter v . The serving region of each micro base station is confined as a small cell. The users I v can generate two types of traffic; namely eMBB and URLLC; which are sent to the micro base stations v P t1,¨¨¨, V w u. The generated traffic can be offloaded by the micro base stations v P t1,¨¨¨, V w u to the macro base station w P t1,¨¨¨, W u, to UAV n P t1,¨¨¨, N u and to the satellite based on the allocated link capacities in the different segments of SAGIN. Therefore, we study the traffic model and the allocated data-rates in the following sub-sections:

A. TRAFFIC MODEL
In each small cell, a user U i sends either eMBB traffic or URLLC traffic to its associated micro base station. We define U e rts " tU e 1 ,¨¨¨, U e i ,¨¨¨, U e Nerts uas the set of N e rts eMBB users and U u rts " tU u 1 ,¨¨¨, U u i ,¨¨¨, U u Nurts u as the set of N u rts URLLC users during time slot t. The users are randomly positioned and their traffic is independent in each time slot t. The total users set is defined as U rts " U u rts Ť U e rts, such that U rts " tU 1 ,¨¨¨, U i ,¨¨¨, U Ivrts u and I v rts is the total number of the users served by micro base station v. In each time slot t P t1,¨¨¨, T u, the eMBB flows outgoing from all eMBB users to micro base station v follow a Poisson process with an arrival rate λ e v rts because eMBB traffic is generated by applications that exchange large payloads over an extended time interval [18]. Therefore, the inter-arrival time can be modeled with an exponential distribution whose events occur continuously and independently. However, the URLLC flows outgoing from all URLLC users to micro base station v follow a Pareto distribution with arrival rate λ u v rts because URLLC traffic is generated by applications with intermittent transmissions that exchange more or less important payload during short periods of time [18]. Therefore, it can be modeled with a power-law distribution namely a Pareto distribution [18]- [22]. Flow sizes are independently and identically distributed with mean 1 µe for eMBB slice and 1 µu for URLLC slice, such that 1 µe and 1 µu represent the flow size for both traffic types, respectively.
The total load of URLLC traffic and eMBB traffic in the v-th micro base station are denoted by L e v rts and L u v rts respectively and are assessed as [19], [20], [23]: and L e v rts " where b v i rts denotes the frequency resources allocated to eMBB user i by micro base station v in time slot t, f v i rts denotes the frequency resource allocated by micro base station v to URLLC user i in time slot t, N 0 is the noise power density, P v i rts is the downlink transmit power of micro base station v, g v i rts is the channel gain between micro base station v and user i in time slot t and is determined based on the Rayleigh channel model and ř v 1 ‰v P v 1 i rts g v 1 i rts is the interference caused by the other micro base stations. In our system, we assume that the macro base stations allocate orthogonal channels to the micro base stations. Therefore, the interference between the micro base stations is negligible ( . Therefore, the total load of URLLC traffic and eMBB traffic in the vth micro base station VOLUME 4, 2016 L e v rts and L u v rts can be simplified as follows: and L e v rts "

B. UAV DOWN-LINK DATA RATE
We consider the wireless communication between the hovering UAV n and micro base station v. Both line-of-sight (LoS) propagation and non-line-of-sight (N-LoS) propagation are considered in the module of average path-loss. The LoS probability is formulated as [24]: p LoS v,n rts " where h n and d v,n denote the height of the n-th UAV and the horizontal distance between the n-th UAV and the vth macro base station respectively; b 1 and b 2 are the Scurve parameters determined by the chosen environment, e.g. urban, sub-urban or dense urban. The signal propagated from the UAV first goes through the free space and then through the urban environment. Therefore, the signal path loss mainly consists of two parts: the free space path-loss (FSPL) P L FSPL and the excessive path-loss P L ur . Based on the models proposed in [24] and [25], the expression of the total path-loss is given by: P L v,n rts "P L FSPL rts`P L ur rts where ξ LoS and ξ NLoS represent respectively the additional path-loss corresponding to the LoS and N-LoS transmission, c is the speed of the light and f c is the carrier frequency. The channel gain of the links between the n-th UAV and the v-th micro base station is given by [24]: Accordingly, the signal-to-interference-plus-noise ratio (SINR) of the link between the n-th UAV and the v-th micro base station is given by: where P v,n rts is the power consumed due to the signal transmission between the UAV n and the micro base station v; b UAV v,n rts represents the bandwidth allocated by UAV n for the vth micro base station at time slot t and I v, n is the total interference experienced by micro base station v given by: To provide efficient services and avoid overlap, the UAVs are usually placed far enough from each other such that their mutual interference can be overlooked (i.e. ř n 1 ‰n P v,n 1 rtsg UAV v,n 1 rts « 0). Furthermore, we assume that the UAVs assign orthogonal channels to the micro base stations and the co-channel interference becomes negligible (i.e. ř v 1 ‰v P v 1 n rtsg UAV v 1 n rts « 0) since the existing techniques such as cell planning, frequency reuse, and beam-forming are capable of significantly mitigating the interference [26]. We note also that the macro base station, the UAVs and the satellite operate in different frequency bands. Therefore, their mutual interference can be neglected, and we consider the signal-to-noise ratio (SNR) in SAGIN.
To satisfy the dynamic QoS requirements and to achieve an energy-efficient communication, the serving process by the UAV should be well designed. Specifically, the communication with the micro base station is set up only if the SNR of the link between the n-th UAV and the v-th micro base station is higher than the predefined SNR threshold Γ th for a given QoS, such that Γ v,n ě Γ th .
According to Shannon capacity bound and after neglecting the interference effects, the instant data rate served by the nth UAV is given by:

C. SATELLITE DOWN-LINK DATA RATE
The achievable data rate R sat v rts between the vth micro base station and the satellite at time slot t is expressed as [13], [23]: where b sat v rts and P sat v rts are the bandwidth and power allocated by the satellite to the v-th micro base station at time slot t; g sat v rts is the channel gain between the satellite and the v-th micro base station at time slot t that mainly depends on the path loss due to the transmission distance in satellite links such that g sat v rts8 1 d η v,s rts where d v,s rts is the distance between the satellite and the v-th micro base station and η is the pathloss exponent [27]. We should note that R sat v ď C sat , where C sat is a metric fixed by the satellite manufacturer.

III. OFFLOADING STRATEGY IN SAGIN
Our goal is to design an offloading strategy in SAGIN that jointly schedules the UAVs trajectory and allocates efficiently the resources in order to improve the network availability and to minimize the latency experienced by the URLLC traffic. According to our offloading strategy, URLLC traffic is offloaded to the UAV link and to the terrestrial link to satisfy its stringent requirements in terms of latency. However, eMBB traffic is offloaded to the UAV link, to the terrestrial link and to the satellite link because it is less sensitive to delay but needs high data rates. We note that the traffic generated by the users and sent to the micro base stations v P t1,¨¨¨, V w u as expressed in (3) and (4) is offloaded to the different segments of SAGIN based on the terrestrial link capacity C ter v and the available data rates developed in (10) and (11). In this section, we define and formulate our offloading problem to meet the aforementioned requirements and constraints.

A. OFFLOADED TRAFFIC IN SAGIN
We define α v rts "`α ter v rts, pα UAV v rtsq T , α sat v rts˘T as the offloading vector of eMBB traffic from the v-th micro base station at time slot t, where α ter v rts, α UAV v rts, and α sat v rts correspond to the offloading proportions of the eMBB traffic to the terrestrial network, the UAVs, and the satellite respectively.
Similarly, we define β v rts "´β ter v rts, pβ UAV v rtsq T¯T as the offloading vector of URLLC traffic from the v-th micro base station at time slot t, where β ter v rts and β UAV v rts correspond to the offloaded proportions of the URLLC traffic to the terrestrial network and to the UAVs, respectively. Specif- v,n rts and β UAV v,n rts represent the proportions of the eMBB traffic and the URLLC traffic offloaded by the vth micro base station to the n-th UAV, respectively. Accordingly, the traffic that can be offloaded by micro base station v to the satellite can be expressed as: The traffic that can be offloaded by micro base station v to UAV n can be expressed as: where a UAV v,n rts is a binary variable that controls the establishment of the link with UAV n and is defined as: The traffic that can be offloaded by micro base station v to the macro base station can be expressed as: In a nutshell, the total traffic offloaded by micro base station v at time slot t is given by:

B. DROPPED TRAFFIC IN SAGIN
If the traffic supposed to be offloaded to the network exceeds the channel capacity, the eMBB flows are preferentially dropped to satisfy the reliability requirement of the URLLC traffic because URLLC has a stringent requirement in terms of reliability contrarily to eMBB. In this regard, the dropped traffic per micro base station v at time slot t in SAGIN is calculated as: We note that f drop v rts ě 0 since the offloaded traffic is always less than the incoming traffic.

C. URLLC DELAY IN SAGIN
In this section, we study the mean delay f delay v experienced by the URLLC packets offloaded by micro base station v in time slot t over the terrestrial link and the UAV link [23], [28], [29]. We assume that the scheduling discipline is first come first serve (FCFS). For micro cell v, the waiting time D k of packet k, whether it is eMBB or URLLC, scheduled after K packets during time slot t over link κ P tter,UAVu is given by: where X j v,κ is the service time of the j-th packet that arrived before the k-th packet, and r k v,κ is the residual service time. Using Little's formula, we obtain the average waiting time D κ v of any packet k as: where ρ κ v rts is the load of the micro cell v and is expressed in both links as follows: over the terrestrial link (κ " ter) L UAV v,n rts R UAV v,n rts over the UAV link (κ " UAV) (20) and the first moment of the residual service time can be developed as: In order to evaluate the mean service time X κ v , we approximate our general distribution to an exponential distribution with a mean service rate µ e that depends on the eMBB packet length. Therefore, the first moment of the residual service time can be simplified, based on (21), as: Accordingly, the mean delay experienced by URLLC packets over both links namely UAV and terrestrial is given by:

D. PROBLEM FORMULATION
We aim to adjust the association between the micro base stations with the satellite, the UAVs and the macro base stations respectively by determining: 1) the offloaded eMBB and URLLC traffic vectors denoted by α v rts and β v rts, where v P t1,¨¨¨, V u, t P t1,¨¨¨, T u. 2) the bandwidth resources allocated to the micro base stations denoted by b v rts " tb sat v rts, b UAV v,n rtsu, where v P t1,¨¨¨, V u, n P t1,¨¨¨, N u. We focus on the allocation of the bandwidth particularly because the frequency bands are limited resources.
3) the trajectory of UAVs denoted by Lrts. The problem can be posed as follows: Problem (24) is formulated as a constrained multi-objective optimization problem with constraints C1-C5. C1 guarantees the reliable offloading of URLLC traffic, where ϵ is a small positive constant (ϵ « 1) that keeps outage probability of URLLC traffic below a negligible threshold. Meanwhile, C2 and C3 ensure that the offloaded eMBB and URLLC traffic are respectively less than the total traffic. C4 guarantees that the bandwidth allocated to the micro base stations do not surpass the total available resources in the air network b UAV n and the satellite network b sat respectively. Finally, C5 ensures that the next location of the UAV is within the maximum moving range during δt where v max is the maximum speed of the UAVs. We can simplify constraint C1 by using F D p.q, the cumulative distribution function (CDF) of λ u v rts, as follows: Since λ u v follows a Pareto distribution with parameters arts and x m rts, we express (25) as: where arts is a constant that determines the distribution shape and x m rts is the positive scale distribution parameter. It is challenging to solve the optimization problem formulated in (24) by using standard optimization tools, since the instantaneous traffic types and amounts incoming to the different base stations cannot be determined beforehand.
Moreover, the channel condition related to the resource allocation is not foreseeable and cannot be easily predicted. Therefore, we have recourse to DRL to learn the network and its dynamics and to handle our multi-objective problem with its various constraints imposed by SAGIN [8].

IV. DEEP REINFORCEMENT LEARNING FOR TRAFFIC OFFLOADING IN SAGIN
In this section, we adopt a DRL-based method to solve problem (24). Therefore, we define our Markov Decision Process and we design our algorithm based on an constrained DDPG approach.

A. MOTIVATIONS TO ADOPT DRL
Compared with the conventional analysis and the optimization-based approaches, which usually assume a static or perfectly characterized network model, the reinforcement learning (RL)-based-approach we proposed can refine the offloading strategy by dynamically observing the information from the system environment. We opted for reinforcement learning because the complete model of the environment's evolution is unknown in wireless networks. For instance, in the scope of this paper, the sequential load of eMBB traffic and the URLLC traffic at each time slot, the instantaneous channel path-loss, and the resource distribution are unknown to the agent in advance.
Specifically, we use DRL because it can efficiently handle the high-dimensions status in the network system we proposed and circumvent the high-computation and storage costs of the tabular-based RL. For tabular-based RL, we need to store and traverse the whole table to update the Q-value. The space complexity of the tabular-based RL algorithm is OpS¨A¨Hq, and the time complexity of the proposed schemes is OpK¨Hq [30], where S is the number of status of the observation space S, A represents the number of the status of the action space A, K is the number of training episodes, and H is the number of steps in each episode. For continue action space, the number of the status is infinite. Therefore, the tabular-based RL algorithm can not converge. On the other hand, for the DRL algorithm, we store only the parameters of the neural networks. The space complexity is Op ř F f "1 n f q and the time complexity is Op , where n f is the number of neural units in fullyconnected layer f , and F is the total layer of the network. With the instantaneous observations as input, the well trained deep neural network (DNN) model predicts the optimized UAV steps and the efficient resource allocation. The actions of the system considered in this paper are extracted from the continuous action space, which can not be determined by the conventional DRL. Therefore, we apply DDPG algorithm, which is an advanced DRL algorithm that embeds the actorcritic method and combines the value based-RL with the policy based-RL [17]. The DDPG algorithm is extended from the deterministic policy gradient (DPG) algorithm [32] and takes advantage of experience replay and slow-learning target networks from deep Q-network (DQN) [33].
a s r Q X q d s j + A t r 7 w K z f O a 5 9 a 8 G 6 / q u z B X C Y 7 h B M 7 A g w v w 4 R r q 0 A A G A 3 i A J 3 h 2 p P P o v D i v 8 9 K C s + g 5 g j 9 y 3 n 4 A K M i Q l g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 l Z S K / l b u M 4 U I V d E x g c Q D f f C U r E = " > A A A B 6 n i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 s t A z Y W E Y 0 H 5 A c Y W + z l y z Z 3 T t 2 5 4 R w 5 C f Y W C h i 6 y + y 8 9 + 4 S a 7 Q x A c D j / d m m J k X p V J Y 9 P 1 v r 7 S x u b W 9 U 9 6 t 7 O 0 f H B 5 V j 0 / a N s k M 4 y 2 W y M R 0 I 2 q 5 F J q 3 U K D k 3 d R w q i L J O 9 H k d u 5 a s r Q X q d s j + A t r 7 w K z f O a 5 9 a 8 G 6 / q u z B X C Y 7 h B M 7 A g w v w 4 R r q 0 A A G A 3 i A J 3 h 2 p P P o v D i v 8 9 K C s + g 5 g j 9 y 3 n 4 A K M i Q l g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 l Z S K / l b u M 4 U I V d E x g c Q D f f C U r E = " > A A A B 6 n i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 s t A z Y W E Y 0 H 5 A c Y W + z l y z Z 3 T t 2 5 4 R w 5 C f Y W C h i 6 y + y 8 9 + 4 S a 7 Q x A c D j / d m m J k X p V J Y 9 P 1 v r 7 S x u b W 9 U 9 6 t 7 O 0 f H B 5 V j 0 / a N s k M 4 y 2 W y M R 0 I 2 q 5 F J q 3 U K D k 3 d R w q i L J O 9 H k d u 5 3 n r i x I t G P O E 1 5 q O h I i 1 g w i k 5 6 s A M x q N b 8 u r 8 A W S d B Q W p Q o D m o f v W H C c s U 1 8 g k t b Y X + C m G O T U o m O S z S j + z P K V s Q k e 8 5 6 i m i t s w X 5 w 6 I x d O G Z I 4 M a 4 0 k o X 6 e y K n y t q p i l y n o j i 2 q 9 5 c / M / r Z R j f h L n Q a Y Z c s + W i O J M E E z L / m w y F 4 Q z l 1 B H K j H C 3 E j a m h j J 0 6 V R c C M H q y + u k f V U P / H p w H 9 Q a f h F H G c 7 g H C 4 h g G t o w B 0 0 o Q U M R v A M r / D m S e / F e / c + l q 0 l r 5 g 5 h T / w P n 8 A U p i N v g = = < / l a t e x i t > si+1 < l a t e x i t s h a 1 _ b a s e 6 4 = " b K h M V S j t z V b 9 / E e 9 v 5 r l T V F j D I k = " > A A A B 7 n i c b Z D L S g M x F I b P 1 F u t W q s u 3 Q S L I A h l 4 k a X B T c u K 9 g L t K V m 0 k w b m s k M y R m h D F 3 4 C G 5 c K O L W R / A 5 3 P k 2 p p e F t v 4 Q + P j / c 8 g 5 J 0 i U t O j 7 3 1 5 u b X 1 j c y u / X d j Z 3 S v u l w 4 O G z Z O D R d 1 H q v Y t A J m h Z J a 1 F G i E q 3 E C B Y F S j S D 0 f U 0 b z 4 I Y 2 W s 7 3 C c i G 7 E B l q G k j N 0 V t P 2 M n l O J 7 1 S 2 a / 4 M 5 F V o A s o V 4 u P n / c A U O u V v j r 9 m K e R 0 M g V s 7 Z N / Q S 7 G T M o u R K T Q i e 1 I m F 8 x A a i 7 V C z S N h u N h t 3 Q k 6 d 0 y d h b N z T S G b u 7 4 6 M R d a O o 8 B V R g y H d j m b m v 9 l 7 R T D q 2 4 m d Z K i 0 H z + U Z g q g j G Z 7 k 7 6 0 g i O a u y A c S P d r I Q P m W E c 3 Y U K 7 g h 0 e e V V a F x U q F + h t 7 R c 9 W G u P B z D C Z w B h U u o w g 3 U o A 4 c R v A E L / D q J d 6 z 9 + a 9 z 0 t z 3 q L n C P 7 I + / g B g i G R I Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " D o U Q T 6 q l r k P j s y X 4 3 x g w W / 5 g d m E = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o M g C G H H i x 4 D X j x G M A 9 I l j A 7 m U 2 G z M 4 u M 7 1 C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d Y a q k R d / / 9 k o b m 1 v b O + X d y t 7 + w e F R 9 f i k b Z P M c N H i i U p M N 2 R W K K l F C y U q 0 U 2 N Y H G o R C e c 3 M 3 9 z p M w V i b 6 E a e p C G I 2 0 j K S n K G T O n a Q y y s 6 G 1 R r f t 1 f g K w T W p A a F G g O q l / 9 Y c K z W G j k i l n b o 3 6 K Q c 4 M S q 7 E r N L P r E g Z n 7 C R 6 D m q W S x s k C / O n Z E L p w x J l B h X G s l C / T 2 R s 9 j a a R y 6 z p j h 2 K 5 6 c / E / r 5 d h d B v k U q c Z C s 2 X i 6 J M E U z I / H c y l E Z w V F N H G D f S 3 U r 4 m B n G 0 S V U c S H Q 1 Z f X S f u 6 T v 0 6 f a C 1 h l / E U Y Y z O I d L o H A D D b i H J r S A w w S e 4 R X e v N R 7 8 d 6 9 j 2 V r y S t m T u E P v M 8 f 8 K i P O g = = < / l a t e x i t > si+1 < l a t e x i t s h a 1 _ b a s e 6 4 = " b K h M V S j t z V b 9 / E e 9 v 5 r l T V F j D I k = " > A A A B 7 n i c b Z D L S g M x F I b P 1 F u t W q s u 3 Q S L I A h l 4 k a X B T c u K 9 g L t K V m 0 k w b m s k M y R m h D F 3 4 C G 5 c K O L W R / A 5 3 P k 2 p p e F t v 4 Q + P j / c 8 g 5 J 0 i U t O j 7 3 1 5 u b X 1 j c y u / X d j Z 3 S v u l w 4 O G z Z O D R d 1 H q v Y t A J m h Z J a 1 F G i E q 3 E C B Y F S j S D 0 f U 0 b z 4 I Y 2 W s 7 3 C c i G 7 E B l q G k j N 0 V t P 2 M n l O J 7 1 S 2 a / 4 M 5 F V o A s o V 4 u P n / c A U O u V v j r 9 m K e R 0 M g V s 7 Z N / Q S 7 G T M o u R K T Q i e 1 I m F 8 x A a i 7 V C z S N h u N h t 3 Q k 6 d 0 y d h b N z T S G b u 7 4 6 M R d a O o 8 B V R g y H d j m b m v 9 l 7 R T D q 2 4 m d Z K i 0 H z + U Z g q g j G Z 7 k 7 6 0 g i O a u y A c S P d r I Q P m W E c 3 Y U K 7 g h 0 e e V V a F x U q F + h t 7 R c 9 W G u P B z D C Z w B h U u o w g 3 U o A 4 c R v A E L / D q J d 6 z 9 + a 9 z 0 t z 3 q L n C P 7 I + / g B g i G R I Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " D o U Q T 6 q l r k P j s y X 4 3 x g w W / 5 g d m E a e p C G I 2 0 j K S n K G T O n a Q y y s 6 G 1 R r f t 1 f g K w T W p A a F G g O q l / 9 Y c K z W G j k i l n b o 3 6 K Q c 4 M S q 7 E r N L P r E g Z n 7 C R 6 D m q W S x s k C / O n Z E L p w x J l B h X G s l C / T 2 R s 9 j a a R y 6 z p j h 2 K 5 6 c / E / r

l a t e x i t s h a 1 _ b a s e 6 4 = " b K h M V S j t z V b 9 / E e 9 v 5 r l T V F j D I k = " > A A A B 7 n i c b Z D L S g M x F I b P 1 F u t W q s u 3 Q S L I A h l 4 k a X B T c u K 9 g L t K V m 0 k w b m s k M y R m h D F 3 4 C G 5 c K O L W R / A 5 3 P k 2 p p e F t v 4 Q +
q J d 6 z 9 + a 9 z 0 t z 3 q L n C P 7 I + / g B g i G R I Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " D o U Q T 6 q l r k P j s y X 4 3 x g w W / 5 g d m E a e p C G I 2 0 j K S n K G T O n a Q y y s 6 G 1 R r f t 1 f g K w T W p A a F G g O q l / 9 Y c K z W G j k i l n b o 3 6 K Q c 4 M S q 7 E r N L P r E g Z n 7 C R 6 D m q W S x s k C / O n Z E L p w x J l B h X G s l C / T 2 R s 9 j a a R y 6 z p j h 2 K 5 6 c / E / r

l a t e x i t s h a 1 _ b a s e 6 4 = " b K h M V S j t z V b 9 / E e 9 v 5 r l T V F j D I k = " > A A A B 7 n i c b Z D L S g M x F I b P 1 F u t W q s u 3 Q S L I A h l 4 k a X B T c u K 9 g L t K V m 0 k w b m s k M y R m h D F 3 4 C G 5 c K O L W R / A 5 3 P k 2 p p e F t v 4 Q +
q J d 6 z 9 + a 9 z 0 t z 3 q L n C P 7 I + / g B g i G R I Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " D o U Q T 6 q l r k P j s y X 4 3 x g w W / 5 g d m E a e p C G I 2 0 j K S n K G T O n a Q y y s 6 G 1 R r f t 1 f g K w T W p A a F G g O q l / 9 Y c K z W G j k i l n b o 3 6 K Q c 4 M S q 7 E r N L P r E g Z n 7 C R 6 D m q W S x s k C / O n Z E L p w x J l B h X G s l C / T 2 R s 9 j a a R y 6 z p j h 2 K 5 6 c / E / r H p g 6 K t B w L n n n M v N / e 4 s e B K O 8 6 L l V l a X l l d y 6 7 n N j a 3 t n f s 3 b 2 q i h J J W Y V G I p J 1 l y g m e M g q m m v B 6 r F k J H A F q 7 m 9 q 5 F f 6 z O p e B S W 9 S B m r Y B 0 Q + 5 x S r S R 2 v Z N s 8 + o T p t E x D 4 Z t l N + g o e n a C q 6 T M 9 r l 7 O 6 P z P

c 1 Y 0 5 l 9 + A P r + w e K p 6 2 7 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " C o p S 8 F W u X u K 3 t j F H E w / T 5 3 n m V t s = " > A A A C N 3 i c b V D L S g M x F M 3 4 r P U 1 6 t J N s A i C U i Z u d F l 0 4 0 o q 9 A W d Y c i k a S c 0 k x m S T K E M 8 1 d u / A 1 3 u n G h i F v / w L Q d i r Y e C J x 7 z r 3 c 3 B M k n C n t O C / W y u r a + s Z m a a u 8 v b O 7 t 2 8 f H L Z U n E p C m y T m s e w E W F H O B G 1 q p j n t J J L i K O C 0 H Q x v J 3 5 7 R K V i s W j o c U K 9 C A 8 E 6 z O C t Z F 8 + 9 4 d U a I z F / M k x L m f s X O U X 8 B C D K h e 1 G 7 m 9 W h u N E L T N 6 t 8 u + J U n S n g M k E F q Y A C d d 9 + d n s x S S M q N O F Y q S 5 y E u 1 l W G p G O M 3 L b q p o g s k Q D 2 j X U I E j q r x s e n c O T 4 3 S g / 1 Y m i c 0 n K q / J z I c K T W O A t M Z Y R 2 q R W 8 i / u d 1 U 9 2 / 9 j I m k l R T Q W a L + i m H O o a T E G G P S R M G H x u C i W T m r 5 C E W G K i T d R l E w J a P H m Z t C 6 r y K m i B 1 S p O U U c J X A M T s A Z Q O A K 1 M A d q I M m I O
H p g 6 K t B w L n n n M v N / e 4 s e B K O 8 6 L l V l a X l l d y 6 7 n N j a 3 t n f s 3 b 2 q i h J J W Y V G I p J 1 l y g m e M g q m m v B 6 r F k J H A F q 7 m 9 q 5 F f 6 z O p e B S W 9 S B m r Y B 0 Q + 5 x S r S R 2 v Z N s 8 + o T p t E x D 4 Z t l N + g o e n a C q 6 T M 9 r l 7 O 6 P z P c 1 Y 0 5 l 9 + A P r + w e K p 6 2 7 < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 x M J 9 q U c + Z 7 u 9 L H q y 1 l v L P 7 n t W L V v X A S F k S x o g H J F n V j j l S I x l G h D h P 6 c D 7 S B B P B 9 F 8 R 6 W O B i d K B F n Q I 9 u z J 8 6 R + W r a t s n 1 r F y s W Z M j D P h z A E d h w D h W 4 g i r U g M A D P M E L v B q P x r P x Z r x n r T l j O r M L f 2 B 8 / g D P o q M q < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 P f C y D p L 1 0 V z W e V 6 E m 1 W K y y a K f 0 = " < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 7 / V 8 w + y n W L I / G q 6 y Z + / X E g E e c Y = " x M J 9 q U c + Z 7 u 9 L H q y 1 l v L P 7 n t W L V v X A S F k S x o g H J F n V j j l S I x l G h D h P 6 c D 7 S B B P B 9 F 8 R 6 W O B i d K B F n Q I 9 u z J 8 6 R + W r a t s n 1 r F y s W Z M j D P h z A E d h w D h W 4 g i r U g M A D P M E L v B q P x r P x Z r x n r T l j O r M L f 2 B 8 / g D P o q M q < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 P f C y D p L 1 0 V z W e V 6 E m 1 W K y y a K f 0 = "

l a t e x i t s h a 1 _ b a s e 6 4 = " W A X F K O l B z R x A W 6 l 9 5 A V y c 2 s Q K a w = " > A A A C D H i c b V C 7 S g N B F L 3 r M 8 b X q q X N Y h A j Q t i 1 0 U Y I 2 F g m Y B 6 Q x D A 7 m U 2 G z D 6 Y u S u E N R 9 g 4 6 / Y W C g i W P k B d v 6 F n + B s N o U m H h g 4 c 8 6 5 z N z j R o I r t O 0 v Y 2 F x a X l l N b e W X 9 / Y 3 N o 2 d 3 b r K o w l Z T U a i l A 2 X a K Y 4 A G r I U f B m p F k x H c F a 7 j D y 9 R v 3 D K p e B h c 4 y h i H Z / 0 A + 5 x S l B L X b N A u g k / c c Y X b T 8 + K q r s c t f G A U N y k 6 T i + F i n 7 J I 9 g T V P n C k p l M 3 q 9 z s A V L r m Z 7 s X 0 t h n A V J B l G o 5 d o S d h E j k V L B x v h 0 r F h E 6 J H 3 W 0 j Q g P l O d Z L L M 2 D r U S s / y Q q l P g N Z E / T 2 R E F + p k e / q p E 9 w o G a 9 V P z P a 8 X o n X c S H k Q x s o B m D 3 m x s D C 0 0 m a s H p e M o h h p Q q j k + q 8 W H R B J K O r + 8 r o E Z 3 b l e V I / L T l 2 y a k 6 h b I N G X K w D w d Q B A f O o A x X U I E a U L i H R 3 i G F + P B e D J e j b c s u m B M Z / b g D 4 y P H 7 K c n O 0 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " P p 5 D j J O R z M a 7 s q J s W c U q T Y U W E R A = " > A A A C D H i c b V D L S g M x F M 3 U V 6 2 v q k s 3 g 0 W s C G X G j W 6 E g h u X F e w D O m P J p J k 2 N M k M y R 2 h j P M B b v w V N y 4 U c e s H u P N v z L R d a O u B w M k 5 5 5 L c E 8 S c a X C c b 6 u w t L y y u l Z c L 2 1 s b m 3 v l H f 3 W j p K F K F N E v F I d Q K s K W e S N o E B p 5 1 Y U S w C T t v B 6 C r 3 2 / d U a R b J W x j H 1 B d 4 I F n I C A Y j 9 c o V 3 E v Z q Z t d e i I 5 r u r p 5 c G D I Q V 8 l + Z i d m J S T s 2 Z w F 4 k 7 o x U 0 A y N X v n L 6 0 c k E V Q C 4 V j r r u v E 4 K d Y A S O c Z i U v 0 T T G Z I Q H t G u o x I J q P 5 0 s k 9 l H R u n b Y a T M k W B P 1 N 8 T K R Z a j 0 V g k g L D U M 9 7 u f i f 1 0 0 g v P B T J u M E q C T T h 8 K E 2 x D Z e T N 2 n y l K g I 8 N w U Q x 8 1 e b D L H C B E x / J V O C O 7 / y I m m d 1 V y n 5 t 6 4 l b o z q 6 O I D t A h q i I X n a M 6 u k Y N 1 E Q E P a J n 9 I r e r C f r x X q 3 P q b R g j W b 2 U d / Y H 3 + A J L V m p w = < / l a t e x i t >
Q 0 (si+1, µ 0 (si+1|✓ µ 0 )|✓ Q 0 )

< l a t e x i t s h a 1 _ b a s e 6 4 = " J 9 r n m V N i Q 2 M M V m Q F z N 5 l n j o 3 q + w = " > A A A C H X i c b V D J S g N B E K 1 x N 2 6 j H r 0 0 B j G i h B k R 9 B j w 4 j E D Z o E k h p 5 O J 2 n s W e i u E c K Y H / H i r 3 j x o I g H L + J f + A l 2 F o I m F j S 8 p Y r q e n 4 s h U b H + b L m 5 h c W l 5 Z X V j N r 6 x u b W / b 2 T l l H i W K 8 x C I Z q a p P N Z c i 5 C U U K H k 1 V p w G v u Q V / / Z y 4 F f u u N I i C q + x F / N G Q D u h a A t G 0 U h N + 8 w 7 z O l m K o 7 d / g m p B 8 m E 3 d e x y 5 H e p A O x f z S h n i F N O + v k n W G R W e C O Q b Z g e 9 9 P A F B s 2 h / 1 V s S S g I f I J N W 6 5 j o x N l K q U D D J + 5 l 6 o n l M 2 S 3 t 8 J q B I Q 2 4 b q T D 6 / r k w C g t 0 o 6 U e S G S o f p 7 I q W B 1 r 3 A N 5 0 B x a 6 e 9 g b i f 1 4 t w f Z F I x V h n C A P 2 W h R O 5 E E I z K I i r S E 4 g x l z w D K l D B / J a x L F W V o A s 2 Y E N z p k 2 d B + T T v O n n X c 7 M F B 0 a 1 A n u w D z l w 4 R w K c A V F K A G D B 3 i C F 3 i 1 H q 1 n 6 8 1 6 H 7 X O W e O Z X f h T 1 u c P q g + j P Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " L t + Z e m c F 7 p N O K n f C k Z F F 5 o G g v E 0 = " > A A A C H X i c b V B L S w M x E M 7 W V 6 2 v V Y 9 e g k W s K G V X B D 0 W v H h s w T 6 g u 5 Z s m r a h 2 Q f J r F D W / h E v / h U v H h T x 4 E X 8 N 2 b b p W j r Q O B 7 z D C Z z 4 s E V 2 B Z 3 0 Z u a X l l d S 2 / X t j Y 3 N r e M X f 3 G i q M J W V 1 G o p Q t j y i m O A B q w M H w V q R Z M T 3 B G t 6 w + v U b 9 4 z q X g Y 3 M I o Y q 5 P + g H v c U p A S x 3 z o n Z c U p 2 E n 9 r j M + z 4 8 Y w 9 O D B g Q O 6 S V B y f z G h N k 4 5 Z t M r W p P A i s D N Q R F l V O + a n 0 w 1 p 7 L M A q C B K t W 0 r A j c h E j g V b F x w Y s U i Q o e k z 9 o a B s R n y k 0 m 1 4 3 x k V a 6 u B d K / Q L A E / X 3 R E J 8 p U a + p z t 9 A g M 1 7 6 X i f 1 4 7 h t 6 V m / A g i o E F d L q o F w s M I U 6 j w l 0 u G Q U x 0 o B Q y f V f M R 0 Q S S j o Q A s 6 B H v + 5 E X Q O C / b V t m u 2 c W K l c W R R w f o E J W Q j S 5 R B d 2 g K q o j i h 7 R M 3 p F b 8 a T 8 W K 8 G x / T 1 p y R z e y j P 2 V 8 / Q C K S K D s < / l a t e x i t >
Q 0 (si+1, µ 0 (si+1|✓ µ 0 )|✓ Q 0 )

TD error
< l a t e x i t s h a 1 _ b a s e 6 4 = " A Z D Y s Y z / q 8 P D 3 F K 6 E P i Z j X B S 5 k o = " > A A A C Y 3 i c b Z F L S w M x E M d n 1 1 e t r 7 V 6 E R G C R V C U s u t F L 0 L B i 0 c r V o W 2 l t k 0 b Y P Z B 0 l W K G u / m 5 / B m z c v 3 v 0 I T l s F t U 5 I + P P / z e Q x C V M l j f X 9 V 8 e d m Z 2 b X y g s F p e W V 1 b X v P X S j U k y z U W d J y r R d y E a o W Q s 6 l Z a J e 5 S L T A K l b g N H 8 5 H / P Z R a C O T + N o O U t G K s B f L r u R o y U q 8 M 7 i C N k g 4 h C b 0 A C G i g c C g x r Z g H w y x f E w D G M I R + U 3 i 2 T / s i Y i F P g h a E e 6 J f G c O 4 W C K j n Y / a H t l v + K P g 0 2 L 4 E u U q 1 7 t 4 x k A L t v e S 7 O T 8 C w S s e U K j W k E f m p b O W o r u R L D Y j M z I k X + g D 3 R I B l j J E w r H / d o y P b I 6 b B u o m n G l o 3 d n x U 5 R s Y M o p A y I 7 R 9 8 5 e N z P 9 Y I 7 P d 0 1 Y u 4 z S z I u a T g 7 q Z Y j Z h o 4 a z j t S C W z U g g V x L u i v j f d T I L X 1 L k Z o Q / H 3 y t L g 5 r g R + J a g F 5 a o P k y j A N u z S N w R w A l W 4 g E u o A 4 c 3 Z 9 5 Z c z z n 3 V 1 y S + 7 m J N V 1 v m o 2 4 F e 4 O 5 + E D q H n < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 L q b H P y g 8 h s G S d h K 8 i 6 T S 6 S j K M Q = " > A A A C Y 3 i c b Z F N S w M x E I Z n V 6 2 1 9 W O t X k S E Y B E s S t n 1 o h d B 8 O J R x W q h r W U 2 T d t g 9 o M k K 5 S 1 f 9 K b N y / + D 6 e 1 g l o n J L y 8 z 0 w + J m G q p L G + / + a 4 C 4 t L h e X i S q m 8 u r a + 4 W 1 W 7 k 2 S a S 4 a P F G H 5 L e J Z / + w F y I W h i B o R X g k 8 p 0 5 h t o c n e x e 6 3 p V v + 5 P g 8 2 L Y C a q M I v r r v f a 7 i U 8 i 0 R s u U J j W o G f 2 k 6 O 2 k q u x L j U z o x I k T / h Q L R I x h g J 0 8 m n P R q z A 3 J 6 r J 9 o m r F l U / d n R Y 6 R M a M o p M w I 7 d D 8 Z R P z P 9 b K b P + s k 8 s 4 z a y I + d d B / U w x m 7 B J w 1 l P a s G t G p F A r i X d l f E h a u S W v q V E T Q j + P n l e 3 J / U A 7 8 e 3 A T V C 3 / W j i L s w j 5 9 Q w C n c A F X c A 0 N 4 P D u F J w N x 3 M + 3 L J b c b e / U l 1 n V r M F v 8 L d + w R k R 5 + W < / l a t e x i t > Ri + Q 0 (si+1, µ 0 (si+1|✓ µ 0 )|✓ Q0 ) < l a t e x i t s h a 1 _ b a s e 6 4 = " A Z D Y s Y z / q 8 P D 3 F K 6 E P i Z j X B S 5 k o = " > A A A C Y 3 i c b Z F L S w M x E M d n 1 1 e t r 7 V 6 E R G C R V C U s u t F L 0 L B i 0 c r V o W 2 l t k 0 b Y P Z B 0 l W K G u / m 5 / B m z c v 3 v 0 I T l s F t U 5 I + P P / z e Q x C V M l j f X 9 V 8 e d m Z 2 b X y g s F p e W V 1 b X v P X S j U k y z U W d J y r R d y E a o W Q s 6 l Z a J e 5 S L T A K l b g N H 8 5 H / P Z R a C O T + N o O U t G K s B f L r u R o y U q 8 M 7 i C N k g 4 h C b 0 A C G i g c C g x r Z g H w y x f E w D G M I R + U 3 i 2 T / s i Y i F P g h a E e 6 J f G c O 4 W C K j n Y / a H t l v + K P g 0 2 L 4 E u U q 1 7 t 4 x k A L t v e S 7 O T 8 C w S s e U K j W k E f m p b O W o r u R L D Y j M z I k X + g D 3 R I B l j J E w r H / d o y P b I 6 b B u o m n G l o 3 d n x U 5 R s Y M o p A y I 7 R 9 8 5 e N z P 9 Y I 7 P d 0 1 Y u 4 z S z I u a T g 7 q Z Y j Z h o 4 a z j t S C W z U g g V x L u i v j f d T I L X 1 L k Z o Q / H 3 y t L g 5 r g R + J a g F 5 a o P k y j A N u z S N w R w A l W 4 g E u o A 4 c 3 Z 9 5 Z c z z n 3 V 1 y S + 7 m J N V 1 v m o 2 4 F e 4 O 5 + E D q H n < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 L q b H P y g 8 h s G S d h K 8 i 6 T S 6 S j K M Q = " > A A A C Y 3 i c b Z F N S w M x E I Z n V 6 2 1 9 W O t X k S E Y B E s S t n 1 o h d B 8 O J R x W q h r W U 2 T d t g 9 o M k K 5 S 1 f 9 K b N y / + D 6 e 1 g l o n J L y 8 z 0 w + J m G q p L G + / + a 4 C 4 t L h e X i S q m 8 u r a + 4 W 1 W 7 k 2 S a S 4 a P F G t i Q a s r Q X q d s j + A t r 7 w K z f O a 5 9 a 8 G 6 / q u z B X C Y 7 h B M 7 A g w v w 4 R r q 0 A A G A 3 i A J 3 h 2 p P P o v D i v 8 9 K C s + g 5 g j 9 y 3 n 4 A K M i Q l g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 l Z S K / l b u M 4 U I V d E x g c Q D f f C U r E = " > A A A B 6 n i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 s t A z Y W E Y 0 H 5 A c Y W + z l y z Z 3 T t 2 5 4 R w 5 C f Y W C h i 6 y + y 8 9 + 4 S a 7 Q x A c D j / d m m J k X p V J Y 9 P 1 v r 7 S x u b W 9 U 9 6 t 7 O 0 f H B 5 V j 0 / a N s k M 4 y 2 W y M R 0 I 2 q 5 F J q 3 U K D k 3 d R w q i L J O 9 H k d u 5  t i Q a s r Q X q d s j + A t r 7 w K z f O a 5 9 a 8 G 6 / q u z B X C Y 7 h B M 7 A g w v w 4 R r q 0 A A G A 3 i A J 3 h 2 p P P o v D i v 8 9 K C s + g 5 g j 9 y 3 n 4 A K M i Q l g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 l Z S K / l b u M 4 U I V d E x g c Q D f f C U r E = " > A A A B 6 n i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 s t A z Y W E Y 0 H 5 A c Y W + z l y z Z 3 T t 2 5 4 R w 5 C f Y W C h i 6 y + y 8 9 + 4 S a 7 Q x A c D j / d m m J k X p V J Y 9 P 1 v r 7 S x u b W 9 U 9 6 t 7 O 0 f H B 5 V j 0 / a N s k M 4 y 2 W y M R 0 I 2 q 5 F J q 3 U K D k 3 d R w q i L J O 9 H k d u 5  o c h 7 Y y R j P U y 9 n U / C / r Z C a 6 D H I u 0 8 w w S e c f R Z k g J i H T v U m f K 0 a N G F t A q r i d l d A h K q T G X q d s j + A t r 7 w K z f O a 5 9 a 8 G 6 / q u z B X C Y 7 h B M 7 A g w v w 4 R r q 0 A A K A 3 i A J 3 h 2 h P P o v D i v 8 9 K C s + g 5 g j 9 y 3 n 4 A D V y Q h A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J b C o M z G g e z J P 5 V F S R u 0 C O A y u l p g = " > A A A B 6 n i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 s t A z Y W E Y 0 H 5 A c Y W + z l y z Z 3 T t 2 5 4 R w 5 C f Y W C h i 6 y + y 8 9 + 4 S a 7 Q x A c D j / d m m J k X p V J Y 9 P 1 v r 7 S x u b W 9 U 9 6 t 7 O 0 f H B 5 V j 0 / a N s k M 4 y 2 W y M R 0 I 2 q 5 F J q 3 U K D k 3 d R w q i L J O 9 H k d u 5  o c h 7 Y y R j P U y 9 n U / C / r Z C a 6 D H I u 0 8 w w S e c f R Z k g J i H T v U m f K 0 a N G F t A q r i d l d A h K q T G X q d s j + A t r 7 w K z f O a 5 9 a 8 G 6 / q u z B X C Y 7 h B M 7 A g w v w 4 R r q 0 A A K A 3 i A J 3 h 2 h P P o v D i v 8 9 K C s + g 5 g j 9 y 3 n 4 A D V y Q h A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J b C o M z G g e z J P 5 V F S R u 0 C O A y u l p g = " > A A A B 6 n i c b V A 9 S w N B E J 2 L X z F + R S 1 t F o N g F e 5 s t A z Y W E Y 0 H 5 A c Y W + z l y z Z 3 T t 2 5 4 R w 5 C f Y W C h i 6 y + y 8 9 + 4 S a 7 Q x A c D j / d m m J k X p V J Y 9 P 1 v r 7 S x u b W 9 U 9

B. MARKOV DECISION PROCESS FORMULATION
The first step is to reformulate the proposed optimization problem (24) as a Markov decision process (MDP). Specifically, in each time slot, the DRL agent operated on the central server collects the information from the SAGIN environment and makes a decision based on the observations. We define the MDP with 4 elements tuple ă S, A, T , R ą, where S is the observation space, A represents the action space, T is the transition matrix and R is the possible reward set that can be obtained from the SAGIN environment. The definition of each element in the tuple is described as follows: ‚ Observation space: S fi tℓ n rts, L u v rts, L e v rtsu. The observation space includes continuous variables, such as the location of the UAVs tℓ n rtsu, the load set of eMBB traffic tL u v rtsu, and the load set of URLLC traffic tL e v rtsu. ‚ Action space: A fi tα v rts, β v rts, b v rts, M n rtsu where M n rts denotes the moving action of the n-th UAV. M n rts consists of the navigation speed v n rts, v n rts P r0, v max s and the rotation angle Θ n rts, Θ n rts P r0, 2πs, v n rts and Θ n rts are continuous variables. The moving action M n rts is defined such that the constraint C5 in problem (24) is satisfied. ‚ Transition matrix: T fi SˆAˆS. The value of the transition matrix is binary and is determined by the constraints (C2-C5) in problem (24) such that: T psrts, arts, srt`1sq " In this regard, the transition matrix is sparse and most of the action exploration are wasted. To accelerate the training process, we develop a wrapper adhered to the SAGIN environment. The actions are compressed beforehand with linear normalization and soft-max nor-malization as follows: K v rts Ð e Kvrts ř vPV e Kvrts , K v rts P tB UAV n rts, B sat rtsu. (30) ‚ Reward Set R: We design the reward function to jointly minimize the traffic dropping and the network latency while guaranteeing the availability of URLLC traffic. We note that minimizing the dropped traffic is equivalent to maximizing the offloaded traffic for a certain amount of traffic at each time slot. Therefore, and to avoid a persistent negative reward, we involve the offloaded traffic f load v rts rather than the dropped traffic f drop v rts in the definition of the reward function. Hence, the value of the reward function in time slot t is defined as: The non-linear constraint C1 of problem (24) is considered in the reward function Rrts by adding a sufficient penalty through a u av rts, which is the indicator of the reliable offloading of URLLC traffic. If constraint C1 in problem (24) is satisfied, a u av rts is equal to zero. Otherwise, a u av rts is equal to 1. By considering ω 3 " ω 1 and ω 3 " ω 2 , the DRL agent is assigned a big penalty if the reliability is not guaranteed. We note that the offloading function f load rts and the latency function f delay rts are normalized for a joint optimization, stable training and faster convergence. We make the trade-off between the offloading and the latency, by tuning the parameters w 1 and w 2 . In general, the total offloaded traffic increases at VOLUME 4, 2016 the expense of latency and vice versa. If we care more about the total offloaded traffic, then w 1 is set to be slightly larger than w 2 . However, if the network is more sensitive to latency, then w 2 is larger than w 1 .

C. CONSTRAINED DDPG ALGORITHM DESIGN
To design the sequential strategy, we optimize the long-term accumulative reward by evaluating the state-action value function, which is given by: γ Tn Rrt`T n sˇˇˇˇs, a, π¸, (32) where 0 ă γ ă 1 is the discount factor. The DDPG algorithm is implemented with an actor-critic approach. The actor policy network µps|θ µ q specifies the action a given the state s currently occupied by the agent, where θ µ shows the weights of the actor network. The critic value network Qps, a|θ Q q specifies the temporal difference (TD) error to criticize the actions made by the actor, where θ Q represents the weights of the critic network. The network architecture is depicted in Fig. 2 and a detailed description of the training process is presented in Algorithm 1. To get a stable training, we used DNN rather than the tablebased value estimation method because the observation space is highly-dimensional and the action space is continuous. We conducted also the experience replay technique because it provides buffer for mini-batch sampling. Moreover, we imported the target models by using different networks for both critic and actor training processes to avoid the harmful correlations.
During the training process, the critic value network Qps, a|θ Q q, the actor policy network µps|θ µ q, the environment status S and the replay buffer are first randomly initialized (Line 2). Then, the target networks Q 1 ps, a|θ Q 1 q and µ 1 ps|θ µ 1 q are initialized by reproducing the weights of the networks Qps, a|θ Q q and µps|θ µ q such that (Line 3): The action is selected from a continuous action space by: where Grts is a random Gaussian noise added to balance the exploitation of the optimized action and the exploration of the environment (Line 5).
To equivalently reformulate problem (24) as MDP and solve the problem, we propose a constrained DDPG RL algorithm. We develop the action wrapper to re-scale the elements of actions by (28), (29), (30). The traffic is offloaded to different types of networks and the bandwidth resource is assigned to different macro base stations, while the sum of the proportions equals to one. In this manner, the constraints (C2-C4) of problem (24) can be satisfied (Line 6). If the offloaded traffic exceeds the capacity of the network, we preferentially drop the eMBB traffic due to the stringent reliability requirement of URLLC traffic (Lines 6-8). Another borderline case occurs when the SNR of the link between the UAV and the micro BS is lower than the SNR threshold. The link should not be set up and the traffic should not be offloaded through that given link, such as R UAV v,n " 0. In order to satisfy the QoS, the traffic should be served by other links or network segments (Lines 9-11).
By executing the action arts, the agent observes the new state srt`1s from the SAGIN environment and obtains the reward Rrts (Line 12). The interaction information is stored as a tuple psrts, arts, Rrts, srt`1sq in the replay buffer for further sampling (Line 13). We train the model by using batch normalization. The samples of each mini-batch are randomly selected in the replay buffer (Line 14). The adaptive moment estimation (Adam) algorithm [34] is the deployed optimizer for both critic network and actor network.
In each mini-batch, the critic value network can be updated by minimizing the loss (Line 15): where is the target value network, and I is the batch size. Index i P I describes the sample selected from the replay buffer. The weight of the value network is given by: where α Q is the actor learning rate.
The actor policy network is updated by maximizing the expected accumulative reward (Line 16): Consequently, the weight of the policy network is updated as: Equation (40) is proved by [32]. The target value network and the target policy network are fixed for several steps and then are updated as (Line 17): where τ p ! 1 is the update coefficient, which describes how much the target networks are updated by the current networks.
The training process is operated until convergence, which is validated in Section V. Once well trained, the model can be installed in the UAVs embedded with computational system. Given the observation from the environment, the time to process the inputs and generate the output actions is negligible. Consequently, the execution of the model can be regarded as a real-time implementation.

V. RESULTS AND ANALYSIS
In this section, we evaluate the performance of our system in terms of latency and availability. For our simulations, we consider two macro base stations separated by a distance d " 40km. We deploy 11 micro base stations according to a Point Poisson Process with a density λ p " 0.004. Initially, the UAVs are randomly positioned. We run our simulations with Python 3.7 and Torch 1.3 on a supercomputer server with Linux 3.10 operating system, NVIDIA TITAN, 1 GPU and 6 CPUs for each task. A generic RL environment for SAGIN is developed based on open AI gym framework [35]. For each neural network, we adopted a multi-layer perception structure with two layers and 64 neurons in each layer. The  commonly used Ornstein Uhlenbeck noise is used as action noise, where the mean is set as zero and the variance equals to 0.5. We trained the proposed DRL-based method for 1M episodes, each of which has ten epochs (i.e. time slots). After training, we test the algorithm for a period of T " 10 epochs. We set the reward function weights as ω 1 " 1, ω 2 " 1 and ω 3 "´10. The critic learning rate and the actor learning rate are set as 10´3 and 10´4, respectively. The discount factor is set as 0.95. The simulations parameters are detailed in Table 1. We note that the used carrier signal to noise power mentioned in Table 1 is relative to the Telesat LEO satellites constellation [36]. We start first by examining the behaviour of our algorithm over different epochs. Fig.3 shows the accumulated reward over time for different SNR thresholds of the UAVs links. We remark that the accumulated reward increases monotonically over time until 200000 epochs. This observation is due to the fact that the more we train the model, the better the agent learns the environment until it converges after 200000 epochs. We also remark that the reward increases when the SNR threshold of the UAV links decreases. This observation is due to the fact that when the SNR threshold is lower, more traffic is accepted into the network over the UAV links. Consequently, our offloading approach offloads more traffic to the backhaul and hence obtains a higher reward as VOLUME   anticipated by equation (31). Then, we evaluate the performance of our offloading approach through three simulations, according to the variation of the time, the SNR threshold of the UAV links Γ th and the number of UAVs. The optimization problem presented in (24) is a non-convex and non-linear programming problem, which is hard to solve in general. Obtaining an optimal solution even for a single time slot will yield prohibitive time complexity. Furthermore, the time and computational resource consumption are not affordable by implementing greedy or DQN algorithm since both the state and the action need to be traversed in continuous spaces. Therefore, we compare our results to two benchmarks: 1) Random offloading approach: the first benchmark is a random offloading approach where the offloading proportions and the resource allocation parameters are randomly determined. The comparison to this first benchmark underlines the key role of DRL to achieve a better network performance. 2) Integrated satellite terrestrial network (ISTN) offloading approach [23]: the second benchmark is an ISTN offloading approach, where all the base stations are in the ground covered by one satellite. For a fair comparison, the ground base stations in ISTN have an equivalent capacity equal to the sum  of the capacities of the ground base stations and the data rates of the UAVs in SAGIN. In the ISTN offloading approach, the authors offload the URLLC traffic to the terrestrial backhaul only and offload the eMBB traffic to the satellite backhaul and to the terrestrial backhaul [23]. The comparison to this second benchmark highlights the importance of UAVs mobility to improve the network's QoS. Indeed, ISTN has only ground base stations. However, SAGIN has ground base stations and air base stations, which are the UAVs.

A. VARIATION OF TIME
In the first simulation, we examine the behaviour of our offloading approach through 10 time slots. We fix the number of UAVs at 3 and the SNR threshold for the UAV links at Γ th " 0.1. First, we compare the used capacities by our offloading approach for both traffic types respectively in the three links, namely, satellite, UAVs and terrestrial links. Because eMBB is generated in huge volumes (in the order of 10 7 bps) compared to URLLC (in the order of 10 4 bps), we can see in Fig.4 and Fig.5 that most of the available capacity in the UAVs links and in the terrestrial links is taken by the eMBB  traffic. We highlight also the pivotal role of the satellite to deliver the eMMB traffic to the backhaul as illustrated in Fig.4. Indeed, approximately 68% of the total eMBB traffic is offloaded to the satellite against 30% to the terrestrial links and 1% to UAV links. As far as URLLC is concerned, the offloaded traffic is almost equally offloaded to the terrestrial links and to the UAV links as illustrated in Fig.5.
Second, we study the variation of the URLLC delay experienced in the terrestrial link and in the UAV links. Based on the results depicted in Fig.6 , we observe that our offloading approach reduces substantially the total latency experienced by the URLLC packets over 10 time slots compared to both benchmarks.
Then, we study the total amount of the offloaded traffic in the three links; namely satellite, UAV and terrestrial. As illustrated in Fig.7, our offloading approach boosts more importantly the total offloaded traffic to the backhaul compared to both benchmarks, especially for higher traffic amounts (i.e. at the last time slots). This proves its capability to operate decently in dense networks.
Afterwards, we study the network availability for both traffic types. Therefore, we evaluate the percentage of the successfully offloaded traffic out of the totally sent traffic.
In the results presented in Fig.8 and Fig.9 , we notice that our offloading approach surpasses both benchmarks in terms of the offered availability to both eMBB and URLLC traffic. Interestingly, our offloading approach succeeds to achieve an availability rate of 100% over all the time slots for the URLLC traffic. This result fulfills perfectly the requirements of this latter slice.

B. VARIATION OF THE SNR THRESHOLD OF UAV LINKS
In the second simulation, we investigate the influence of the UAV link quality on the performance of our system. We fix the number of UAVs at 3 and we vary the SNR threshold for the UAV links Γ th between -15 dB and 10 dB. We study first the variation of the URLLC delay experienced in the terrestrial link and in the UAV links. The results depicted in Fig.10 show that our offloading approach surpasses distinctly both benchmarks. We note also that the variation of Γ th does not alter significantly the URLLC latency for all three models. This observation is due to the fact that the decrease of the offloaded traffic for increasing SNR thresholds Γ th is VOLUME 4, 2016 not important enough to impact the latency.
Then, we study the total amount of the offloaded traffic to the three links (i.e. satellite, UAV and terrestrial) as depicted in Fig.11. As anticipated, we notice that the total offloaded traffic decreases when Γ th increases for our offloading approach. This observation is due to the fact that less traffic is accepted in the UAVs links when Γ th increases. We remark also that our offloading approach has the highest offloaded traffic even for the lowest Γ th values (i.e. the UAV links are mostly established). This observation emphasizes the validity of our model in dense networks.
Afterwards, we study the network availability for both traffic types. On the one hand, the results presented in Fig.12 show that our offloading approach ameliorates the eMBB availability, especially compared to ISTN (with 20% on average) and to the random approach also (around 15% on average). It is noteworthy that eMBB availability is not strongly affected by the SNR threshold in ISTN. On the other hand, the results presented in Fig.13 underline that our offloading approach has advantage over both benchmarks in terms of URLLC availability, especially for the lowest   Γ th where availability is stable at 100%. This observation is due to the fact that even a mediocre signal quality can establish communication links with UAVs. These results are also in line with the total offloaded traffic results depicted in Fig.7, since higher availability levels are achieved when Γ th decreases.

C. VARIATION OF THE NUMBER OF UAVS
In the third simulation, we inspect how the UAVs number impacts our system performance. We vary the number of the present UAVs between 2 and 7 and we consider the SNR threshold of UAVs links as a parameter rated at 4 dB and 10 dB respectively. We study first the variation of the URLLC delay experienced in the terrestrial link and in the UAV links. Based on the results depicted in Fig.14, we observe that our offloading approach helps decrease significantly the URLLC latency compared to ISTN. This observation reveals the key role that the UAVs mobility plays to enhance the experienced latency in the network. We notice also that the URLLC latency in ISTN is seriously affected by the number of the present UAVs because more traffic is offloaded to   the backhaul links that are all static. Then, we study the total offloaded traffic in the three links (i.e. satellite, UAV and terrestrial). As illustrated in Fig.15, we remark that our offloading approach outperforms ISTN specially when a low number of UAVs are entailed. We note also that the SNR threshold parameter influence more clearly the results for ISTN than our offloading approach. This observation is due to the benefits of the UAV trajectory design adopted in our DRL algorithm, which places the UAVs in the most loaded areas and maintains a close total offloaded traffic for different SNR thresholds. Afterwards, we study the network availability for both traffic types. As depicted in Fig.16, we remark that our offloading approach improves considerably the eMBB availability compared to ISTN for different SNR threshold of the UAV links. We also notice that the eMMB availability increases with the number of UAVs because more traffic can be offloaded to the backhaul, which is endorsed by the capacities of the added UAVs. As illustrated in Fig.17, our offloading approach outperforms ISTN in terms of URLLC availability for different SNR threshold of the UAV links. Particularly, we note that URLLC needs are not met by ISTN for both SNR thresholds where the availability is less that 80%. However, our DRL algorithm meets the URLLC requirements once 4 UAVs or more are present in the network, and achieves a 100% rate exclusively for Γ th " 4 dB.

VI. CONCLUSION
In this paper, we proposed a heterogeneous traffic offloading approach in SAGIN to meet the various requirements of 5G slices in terms of high data rates, delay and reliability. According to this approach, the eMBB traffic is offloaded to the satellite and to UAVs to satisfy its need for high throughput. However, the URLLC traffic is only offloaded to UAVs to satisfy its need for ultra-low latency. Our results stressed the importance of the integration between satellite, UAVs and terrestrial network to fulfill a better QoS for different slices and traffic types. Precisely, our offloading approach succeeded to enhance substantially the availability and the latency experienced respectively by eMBB and URLLC traffic compared to ISTN and spotlighted hence the power of UAVs mobility. Our results stressed also the importance of DRL as a key tool to learn more easily the network dynamics than the optimization approaches and to allocate efficiently the available resources. For future work, we plan to investigate the energy consumption of partial observable multi-UAV system in this network dynamic context with heterogeneous traffic types and requirements. Moreover, we will focus on the massive machine type communications (mMTC) traffic in addition to eMBB and URLLC to cover simultaneously and efficiently the conflicting needs of the 5G slices.