AI-Driven Energy-Efficient Content Task Offloading in Cloud-Edge-End Cooperation Networks

To tackle a challenging energy efficiency problem caused by the growing mobile Internet traffic, this paper proposes a deep reinforcement learning (DRL)-based green content task offloading scheme in cloud-edge-end cooperation networks. Specifically, we formulate the problem as a power minimization model, where requests arriving at a node for the same content can be aggregated in its queue and in-network caching is widely deployed in heterogeneous environments. A novel DRL algorithm is designed to minimize the power consumption by making collaborative caching and task offloading decisions in each slot on the basis of content request information in previous slots and current network state. Numerical results show that our proposed content task offloading model achieves better power efficiency than the existing popular counterparts in cloud-edge-end collaboration networks, and fast converges to the stable state.


I. INTRODUCTION
As beyond 5G and 6G wireless communication technologies rapidly develop, emerging network services represented by virtual reality and 8K video transmission have brought a severe energy efficiency problem and stringent service requirements to the existing Internet [1]. Given that the centralized working paradigm of cloud computing generates huge cross-domain traffic and transmission delay [2], how to achieve green content transmission and meet differentiated service requirements is an urgent issue to solve in heterogeneous cloud-edge-end networks [3].
By enhancing caching and computing capacities at the access networks, edge computing can satisfy users' content requests and improve network energy efficiency [4], [5]. Liu et al. [6] presented an efficient file placement and distribution scheme to balance energy efficiency, spectral and cache allocation by utilizing content popularity and users' preferences. Vu et al. [7], [8] first discussed the minimal energy problem caused by the backtrip and access links from the perspectives of non-encoded and encoded caching policies, and then optimized energy efficiency by pre-encoded caching while ensuring users' request rate. Xu et al. [9] investigated the impact of the cellular and D2D modes on content distribution, and adaptively selected the content delivery modes to promote energy efficiency by jointly considering transmitter deployment, channel state, transmission coverage and quality of service (QoS). Moreover, a random waypoint solution was proposed to handle the challenges caused by data explosion and reduce energy consumption [10]. Li et al. [11] formulated the energy minimization issue as a two-stage stochastic mixed integer programming model, and discussed the performance under uncertain content requests. Hassan et al. [12] improved energy and spectral efficiency in multi-access edge computing (MEC)-assisted wireless scenarios by optimizing heterogeneous network resource allocation. Xu et al. [13] proposed an optimal energy saving model in green city systems, which chose the candidate content placement positions to improve content distribution and energy efficiency.
Although edge computing can reduce energy consumption by fast proceeding users' requests and providing their interested contents in access networks, the heterogeneity and limitation of edge resources constrain its service capability. The application of cloud-edge cooperative computing to the Internet to enhance network performance has been widely concerned by academia and industry. Wu et al. [14] designed a novel coded caching framework in cellular networks to minimize power consumption and satisfy users' quality of experience (QoE). Zhang et al. [15] proposed a cloud-edge coordinated content caching mechanism to promote energy efficiency, search accuracy and latency in cyber-physical systems. By introducing the cloud and edge computing paradigms, Yang et al. [16] reduced network delay and power consumption by jointly optimizing computation offloading and content caching. Chen et al. [17] presented a new energy-efficient service model in the cloud-edge-enabled IoT networks by simultaneously considering system runtime, switching, and computation energy of all network participants. Ning et al. [18] proposed an energy-efficient virtual network mapping architecture, which reduced energy consumption and improved sustainability in cloud of things by collaborative edge computing.
To guarantee system QoS in complex and dynamic network environments, the application of machine learning to cloudedge collaborative networks has been increasingly concerned. Given that advantages of machine learning in QoS and QoE improvement, Dai et al. [19] proposed a novel deep reinforcement learning (DRL) algorithm in multi-access vehicle networks to solve the challenging edge cache problem brought by efficient content delivery and high mobility of vehicles. Javed et al. [20] presented a task-driven intelligent content caching architecture in vehicular edge computing to improve task processing and energy consumption. Kong et al. [21] developed a joint computing and caching system to minimize energy consumption of mobile network operators by utilizing deep deterministic policy gradient policy to make resource allocation decisions. He et al. [22] presented a DRLbased integrated framework to realize green wireless networks by optimizing cache and computation resource allocation. Ning et al. [23] proposed a DRL-based offloading policy in a three-layer vehicle system to improve energy efficiency while meeting the delay requirement. Chen et al. [24] designed a task offloading and caching decision-making strategy by using the deep Q network (DQN) algorithm to optimize transmission power allocation in heterogeneous cloud-edge cooperation environments. Although the existing work in cloud-edge collaboration networks can achieve energy-efficient content distribution, the service capacities of end-users have been largely overlooked. In this paper, we propose a new DRL-based content task offloading solution in cloud-edge-end environments to improve power efficiency.
The main contributions of the paper are as follows.
• We formulate the energy-efficient content task offloading problem as a minimal power model in cloud-edge-end cooperation networks, where requests arriving at a node for the same content can be aggregated in its queue and in-network caching is utilized in the system.
• We propose a new DRL algorithm to minimize power consumption by making collaborative caching and task offloading decisions in each slot according to content request information in previous slots and current network state.
• Evaluation results in different network scenarios show that our proposed content task offloading model achieves better power efficiency than the existing popular counterparts in cloud-edge-end collaboration networks, and fast converges to the stable state.
The rest of our paper is organized as follows. We present the network and power models, and formulated the optimization objective in Section II. In Section III, the power consumption minimization model is solved via the new DQN algorithm. In Section IV, we evaluate the proposed model in heterogeneous network environments and discuss the simulation results. Finally, this study is concluded in Section V.

II. SYSTEM MODEL
In this part, we illustrate network and power consumption models, and formulate the optimization objective in the cloudedge-end cooperation networks.

A. NETWORK MODEL
The heterogeneous cloud-edge-end network is shown in Fig. 1, which consists of mobile users (MUs), small base stations (SBSs), macro base stations (MBSs), and the cloud. We assume that the MUs and BSs have limited caching and computing capacities. The BS set is denoted by B = {1, 2, . . ., N b }, where N b is the number of BSs. The ith BS in the slot n is accessed by a group of MUs M i (n). The MUs set in the slot n is denoted by M(n). F = {1, 2, . . ., F } is the content set, where F is the amount of different files in the system. We assume that all the contents can be obtained from the cloud. In our system, content requests unsatisfied in the MUs are processed in accordance with the sequence of their accessed SBSs, MBSs, and the cloud to fetch the corresponding files. Each node has a request processing queue that aggregates the arriving same content requests, and its aggregated requests are processed only once. In order to reflect the dynamic characteristics of cloud-edge-end environments, the system is modeled under multiple slots.

B. POWER MODEL
In our system, the total power consumption is mainly caused by network nodes and wired links. The notations of key parameters are summarized in Table 1.

1) POWER CONSUMPTION OF MOBILE USERS
The MUs' power is consumed by static operation, signal transmission, request processing, and content caching. The transmit power consumption of the mth MU sending content requests to the accessed base station i at the nth slot is written where H i,m (n) is a boolean variable to indicate whether BS i is accessed by MU m in the slot n. H i,m (n) takes the value of 1 if the mth MU accesses the ith BS, and 0 otherwise. q f i,m (n) is the amount of network requests to fetch file f , which sent by MU m of the ith base station at the nth slot. p tr, f m,i is the power consumption of the MU m transmitting content request f to the base station i.
The computing power of MU m is consumed in the slot n to process the arriving content requests [25], which is written as where both Q f i,m (n) and W f i,m (n) are boolean variables. Q f i,m (n) is set to 1 if the content request f is in the processing queue of the mth MU accessing the ith BS at the nth slot, and 0 otherwise. W f i,m (n) takes the value of 1 if the mth MU of the ith BS has enough computation capacity to deal with the content request f at the nth slot, and 0 otherwise. D f means the amount of CPU cycles, which is consumed to process the content task f . p com i,m refers to the computation power of MU m in each CPU cycle.
The cache power of MU m accessed to BS i is consumed to store files in the slot n can be expressed as where X f i,m (n) is a boolean variable indicating whether the MU m caches the content f at the nth slot. Therefore, based on [26], the total power consumption of MU m accessed to BS i in the slot n is expressed as where P s i,m (n) is the static power of MU m in the slot n to maintain its normal operations.

2) POWER CONSUMPTION OF BASE STATIONS
Similarly, the transmit power consumption of the ith BS transmitting the requested contents to the mth MU in the slot n is written as where p tr, f i,m is the power consumed by the base station i to transmit the content f to its accessed mth MU.
The computing power consumption of base station i in the slot n is expressed as where both Q f i (n) and W f i (n) are boolean variables. Q f i (n) is set to 1 if the request f is in the processing queue of the ith BS at the nth slot, and 0 otherwise. W f i (n) is set to 1 if the ith BS has enough computation capacity to deal with the content request f at the nth slot, and 0 otherwise. p com i is the computation power of the ith BS in each CPU cycle.
The cache power consumed by the BS i in the slot n to store contents can be expressed as where X f i (n) is a boolean variable indicating whether the ith BS stores the file f in the nth slot. X f i (n) is set to 1 if the file f is stored by the ith BS at the nth slot, and 0 otherwise. p ca i is the cache power efficiency of the ith BS.
Thus, the total power consumed by BS i in the slot n is written as where P s i (n) means the static power consumed by BS i in the slot n.

3) POWER CONSUMPTION OF THE CLOUD
The power consumption of the cloud in the slot n is caused by static operating, content caching, and request processing, denoted by P s c (n), P ca c (n), and P com c (n), respectively. P ca c (n) and P com c (n) is expressed as where p ca c is the cache power efficiency about the cloud, and p com c is the computation power of the cloud in each CPU cycle. Q f c (n) is a boolean variable, which is set to 1 if the content request f is in the processing queue of the cloud at the nth slot, and 0 otherwise. Therefore, the power consumed by the cloud in the slot n is written as where P s c (n) is the static power consumption of the cloud in the slot n.

4) POWER CONSUMPTION OF WIRED LINKS
The power consumption of wired links consists of static and dynamic link power. The total wired link power consumption about link l i, j in the slot n is expressed as where P s l i, j (n) and f l i, j (n) are the static power and traffic generated by link l i, j in the slot n, respectively. p l i, j refers to the power efficiency about link l i, j .

C. PROBLEM FORMULATION
Based on the analysis above, the content task offloading issue in cloud-edge-end cooperation networks can be formulated as a minimal power model, which can improve power efficiency by jointly optimizing the computation, cache and communication resources. : where N and N t are the node set and slot set of the system, respectively. A i is the node set directly connecting to node i, and B i is the set of adjacent network devices of BS i at the same level.
In the above constraints, C1 indicates that the cached content size of node i cannot exceed its maximal storage capacity Ca i . C2 presents that the sum of transmit power consumed by BS i cannot exceed its maximal transmit power P tr,max i . C3 means that the sum of network traffic through the link l i, j must be less than its bandwidth B l i, j . C4 indicates that the consumed computation resources of node i cannot exceed its computing capacity C i . C5 presents that the same content will not be cached in its directly connected BSs at the same layer. C6 requires that all the boolean variables are 0 or 1.

III. DEEP REINFORCEMENT LEARNING BASED COLLABORATIVE CONTENT TASK OFFLOADING
The optimization problem (13) is a Markov decision process (MDP), which can be solved by the reinforcement algorithms by making intelligent caching and offloading decisions. The MDP model can be defined by a tuple {S, A, P (s t+1 |s t , a t ), R(s t , a t )}. S is the set of states, which is a description of the current environment. A is the set of all possible actions of the MDP. P (s t+1 |s t , a t ) indicates the probability of transforming from the state s t to the state s t+1 after performing the action a t . R(s t , a t ) represents the received reward when the action a t is performed under the state s t . In MDP, the main target of the agent is to find the optimal strategy to maximize the cumulative reward. The Q-learning algorithm can optimize the reward by dynamically obtaining environment state information and storing action values [27]. It is challenging to utilize one table to cache the values of all actions in the complex and dynamic cloud-edge-end cooperation environments [28]. As a branch of the integrated deep learning and reinforcement learning, DQN can overcome the dimensionality disaster problem by using neural networks to automatically obtain low-dimensional features. In this section, a new DQN-based content task offloading policy is designed to improve power efficiency by making collaborative caching and offloading decisions. Fig. 2 illustrates the working process of the proposed DQN-based content task offloading algorithm. The evaluation and target networks have the same neural network structure with different parameters. In order to improve the system stability, the target network copies network parameters from the evaluation network to update its own neural network in each specific training cycle. The evaluation network selects an action by an ξ -greedy strategy with the probability of ξ ∈ (0, 1) or a random action with the probability of 1 − ξ according to the known state at time t. During the learning process, the evaluation and target networks randomly extract a set of historical information from an experience replay for training, and modify the related parameters by using gradient descent method. When our DQN model works, the network state generated by the file request f at the time t is defined as s f t = {n t , A n t , X n t , f l n t , j , j ∈ A n t }. n t is the node to process the current content request f . A n t is the set of adjacent nodes of n t . X n t = (X 1 n t , X 2 n t , . . . , X F n t ) is the caching state about node n t . f l n t , j is the traffic of the link l n t , j . The action for the arriving content request f at the time t is defined as a f t = {n t+1 , n t+1 ∈ A n t } to indicate its next hop. The reward obtained by processing the file request f at the time t is η. Y f t is boolean variable, which takes the value of 1 if the content request f is satisfied after action a f t , and 0 otherwise. β is a coefficient that adjusts the ratio of power consumption to the reward. P f t is the power consumed by a mobile user to send the content request f and obtain the corresponding file at time t. η is the penalty parameter set by the system when the content request f is unsatisfied after action a f t . When the current node can satisfy service requirements of the arriving request f , the routing process is terminated and content f is returned to the corresponding end-user. Otherwise, our DQN model will sent the state information s f t to the evaluation network to get an action a f t , then perform the action to get the reward r f t and the state s f t+1 at time t + 1. Meanwhile, a piece of historical information (s f t , a f t , r f t , s f t+1 ) is stored in the experience replay for the next training. The loss function for the content request f is defined as the mean square error L(ω) f = where γ is the discount rate, Q(s f t , a f t ; ω) is the predicted Q-value generated by evaluation network, For the content request f from an end user at the time t, the objective of our DQN policy is to choose the optimal offloading decision a f t according to state space s f t to achieve the maximal reward r f t . Specifically, when processing a file request in a slot, a node makes task offloading decisions to minimize power consumption on the basis of its cache state and available adjacent link bandwidth. At the end of each slot, the cache status of each node is updated to improve network performance by utilizing the content request history arriving it.

IV. SIMULATION AND RESULTS
In this section, we present the simulation environments and analyze the numerical results in different cloud-edge-end scenarios.

A. SIMULATION SETTINGS
In this part, our proposed DRL-based green content task offloading scheme is evaluated in heterogeneous cloud-edgeend cooperation networks. We assume that content popularity of the whole system obeys a Zipf distribution, which mainly depends on its skewness factor [29]. A larger value of the skewness factor indicates that more popular content requests are sent by mobile users. In the simulation, the skewness factor varies between 0.6 and 1.5 [30], [31]. Besides, cache size is abstracted as a ratio of the amount of different files cached by a network node to F , and its range is from 0.1% to 1% [32], [33]. We also assume that each network node has a request service queue, and its arrival requests for the same content in a slot will be aggregated in the queue and processed once [34], [35].
To evaluate the advantages of our model, we compare the proposed DRL-based green content task offloading scheme, denoted by "DQN," with the three existing popular counterparts in cloud-edge-end cooperation networks, denoted by "Without Cache," "Popularity," and "LRFU," respectively. Request aggregation is considered in all the solutions. In "Without Cache," caches are not deployed in network nodes. In "Popularity," network files are collaboratively cached among adjacent nodes on the basis of the known whole content popularity distribution [36]. In "LRFU," the BSs and MUs dynamically update their cached content states according to the historical spatio-temporal request information in each slot [37]. Fig. 3 demonstrates power consumption of all solutions when the cache size of each node varies. When cache size grows, more files that users are interested in are cached in the network, reducing power consumption of the three policies with in-network caching by meeting user requests nearby. Since all the content requests are satisfied by the cloud, the performance of "Without Cache" is unaffected by the varying cache sizes. With the increase of storage capacity, the proposed "DQN" scheme always performs best, and the performance gap between "DQN" and other solutions enlarges. This is because "DQN" can achieve optimal caching and offloading in each slot on the basis of content request information in previous slots and current network state, adapting to dynamic cloud-edge-end networks. Fig. 4 shows the performance of all schemes when content popularity varies. When the content popularity grows, power consumption of all the polices is decreasing. A larger content popularity means that more mobile users are interested in popular contents, reducing the power consumption of "without cache" because of the improved request aggregation. For the schemes with deployed caching capacity, the promoted cache  hit rate further improves their power efficiency. As the content popularity grows, the performance gap between "DQN" and other schemes is narrowed down. The reason is that more requests for popular contents reduces the relative advantages of our "DQN" scheme in terms of caching efficiency and request aggregation. Fig. 5 shows power consumption of all solutions when a queue capacity varies. A large queue length can make each network node aggregate more content requests and avoid their redundant transmission, hence improving the performance of the four schemes. As shown in Fig. 5, the performance of "DQN" is worse than those of "Popularity" and "LRFU," and close to that of "Without Cache" in a small queue capacity. The reason is that the deteriorated request aggregation makes more content requests in the our proposed "DQN" solution first explored and then transmitted to the cloud to obtain their  interesting files, leading to a lower power efficiency. As the queue capacity increases, intelligent caching and offloading decision advantages makes "DQN" perform much better than other solutions. Fig. 6 demonstrates the performance of all schemes when the request arrival rate varies. The request arrival rate is represented by the number of content requests sent by each user in a slot. As shown in Fig. 6, network power consumption of the four strategies first decreases and then increases when less content requests arrive at a end-user in a slot grows. The reason is that request aggregation in each node is improved when the request arrival rate begins to grow, which leads to gradual reduction of network power consumption. However, power efficiency of all the solutions is deteriorated when the request arrival rate continues to increase. The reason is that a larger request arrival rate in a slot restricts the effect of request aggregation under the limited queue capacity, which  makes more requests route to the cloud to fetch contents and brings about more lost packets. When the request arrival rate varies, "DQN" always performs best by making intelligent caching and routing decisions to be suited for the dynamic cloud-edge-end environments. Fig. 7 shows power consumption of all schemes when the amount of different network files changes. When content diversity increases, network power consumption of each solution is growing. The growth of the amount of different files indicates that user requests have an obvious diversity phenomenon. Specifically, more unpopular contents and less popular files are accessed by mobile users, which worsens cache hit rate and the effect of request aggregation, and consumes more network power. In this process, "DQN" has the optimal power efficiency by intelligently offloading content tasks according to current network state and historical request information. Figs. 8 and 9 show the average weighted reward for a content request per slot under different cache sizes and learning rates, respectively. As shown in Fig. 8, a larger storage capacity indicates that more popular contents are cached at access networks, hence reducing the power consumption and achieving a convergence state with less fluctuation. In the small cache size scenario, more content requests are satisfied in the cloud, which deteriorates the learning effect of our proposed model in the edge network. A larger learning rate indicates that the old Q value will have a stronger impact on the new one when the system makes cooperative caching and task offloading decisions. As shown in Fig. 9, our "DQN" solution always fast converges under different learning rates, and performs the best when when the learning rate is 0.008.

V. CONCLUSION
In this paper, we proposed a DRL-based green content task offloading scheme in cloud-edge-end environments to realize cooperative caching and computing resource allocation. The energy-efficient content task offloading problem was first formulated as a power minimization model, where requests arriving at a node for the same content can be aggregated in its queue and in-network caching is widely deployed in heterogeneous environments. Then, a new DRL algorithm was designed to minimize the power consumption by making collaborative cache and computation resource allocation decisions on the basis of the predicted spatio-temporal content popularity distribution. Numerical results showed that our proposed content task offloading model achieved better power efficiency than the existing popular counterparts in cloud-edge-end collaboration networks, and fast converged to the stable state.