HAMEC-RSMA: Enhanced Aerial Computing Systems with Rate Splitting Multiple Access

Aerial networks have been widely considered a crucial component for ubiquitous coverage in the next-generation mobile networks. In this scenario, mobile edge computing (MEC) and rate splitting multiple access (RSMA) are potential technologies, which are enabled at aerial platforms for computation and communication enhancements, respectively. Motivated from this vision, we proposed a high altitude platform-mounted MEC (HAMEC) system in such an RSMA environment, where aerial users (e.g., unmanned aerial vehicles) can efficiently offload their tasks to the HAMEC for external computing acquisition. To this end, a joint configuration of key parameters in HAMEC and RSMA (referred to as HAMEC-RSMA) such as offloading decision, splitting ratio, transmit power, and decoding order was optimally designed for a processing cost minimization in terms of response latency and energy consumption. Subsequently, the optimization problem was transformed into a reinforcement learning model, which is solvable using the deep deterministic policy gradient (DDPG) method. To improve the training exploration of the algorithm, we employed parameter noises to the DDPG algorithm to enhance training performance. Simulation results demonstrated the efficiency of the HAMEC-RSMA system with superior performances compared to benchmark schemes.


I. INTRODUCTION
Mobile network evolution has brought the Internet of things (IoT) toward a new revolution in diverse application scenarios, where spatial limits, geographical location, the microworld, and the biological environment can be efficiently supported by sustainable access infrastructure [1]. To support an ubiquitous coverage, aerial radio access networks (ARANs) appear to be a potential strategy for enhancing existing terrestrial communication infrastructures, which is able to provide services in underserved areas [2], [3]. Particularly, owing to the high mobility, coverage capacity, and the ability to reach places inaccessible to humans, airborne platforms such as aircraft and unmanned aerial vehicles (UAVs) can be successfully exploited in a variety of professional applications, including agriculture, mission-critical services, search and rescue missions, and surveillance systems [4], [5].
To assist the aforementioned scenarios, mobile cloudifica-tion paradigms have changed considerably in tandem with the rise of IoT, particularly in mobile edge computing (MEC). MEC enhances the cloud computing effectively by bringing cloud services to the network edge, where they are available nearby and have low latency to users [6]. Consequently, an integration of MEC into aerial networks consisting of high altitude platforms (HAPs) and UAVs provides additional resources that significantly enhance system capabilities and performances [7]. Due to unique characteristics of the aerial networks, airborne platforms face several challenges such as battery limitations and flying formation to work stably on the air. In this study, we considered a high altitude platformmounted MEC (HAMEC) system, where an edge computing server equipped at a HAP to enhance the performance of an aerial network. In such a HAMEC system, aerial users (AUs) flying in a 3D space coverage harvest information from terrestrial devices to assist professional applications in VOLUME 10, 2022 agriculture, surveillance, etc. In these scenarios, AUs can partially offload their tasks to the HAMEC with high computation resources for processing to reduce user costs in the network.
On the other hand, to increase the robustness in task transmission, rate splitting multiple access (RSMA) techniques are assumed for communication between AUs and the HAMEC. RSMA is a powerful emerging multiplexing strategy for spectral efficiency improvement by integrating space-division multiple access and non-orthogonal multiple access techniques [8]. Regarding our investigated scenarios, where uplink RSMA is exploited to transmit information from AUs to HAMEC, each transmit signal is split into multiple sub-signals, and the successive interference cancellation (SIC) technique is applied at the receiver to decode all the sub-signals. To the best of our knowledge, a combination of the HAMEC model and the RSMA technique is still an open research direction, where multiple system parameter configuration such as offloading decision, splitting ratio, transmit power, and decoding order should be optimized for a processing cost minimization in terms of response latency and energy consumption. Motivated by this observation, our paper dedicates to resolve the mentioned problem. Accordingly, the main contributions of this research are as follows: • First, we investigated a HAMEC system in a uplink RSMA-enabled aerial network, where the AUs serve in a 3D space coverage. In this system model, we proposed an optimization problem of offloading decision, splitting ratio, transmit power, and decoding order to minimize the processing costs in terms of total latency and energy consumption. • Second, we transformed the problem into a reinforcement learning model that can be solved by applying the deep deterministic policy gradient (DDPG) algorithm, referred to as HAMEC-RSMA. To cope with that the action space noise for training exploration in DDPG may violate the variable value range constraints in the optimization problem, we added parameter noises to the DDPG algorithm during training to ensure that the exploration meets all the problem constraints. • Third, we simulated the environment scenario and trained the model. Numerical results demonstrated the efficiency of the HAMEC system in a uplink RSMA-enabled aerial network, where the proposed HAMEC-RSMA algorithm outperforms existing benchmark schemes.
The rest of this research is organized as follows. We summarized some related works in Section I. Then, we introduced the system model and formulated the optimization problem in Section II and Section III, respectively. Next, we proposed the HAMEC-RSMA algorithm in Section IV. Simulation results were presented in Section V. Finally, we concluded the study in Section VI.

RELATED WORK
Mobile edge computing (MEC) has been recently studied in a vast number of studies due to the rapid expansion of the mobile network. Especially, minimizing the cost of task processing is typically an attractive challenge [9]- [14]. For instance, the authors in [9] proposed a task offloading model in an industrial IoT (IIoT) scenario, where they optimized the user equipment's resource allocation and binary offloading decisions to reduce the system cost function. Then, they designed a reinforcement learning model that apply Q-learning algorithm to address the problem. The study in [10] aimed to minimize the total task delay in a multi-MEC system by optimizing the binary offloading decision, resource allocation, and cooperative mode selection. The authors presented an iterative technique based on Lagrangian dual decomposition, the monotonic optimization method, and the ShengJin Formula method to address the problem. In [11], a stabilized green crosshaul orchestration framework is proposed utilizing a Lyapunov-theory-based drift-plus-penalty policy to optimize the offloaded data for an energy consumption minimization problem. In addition, the integration of the UAV and MEC has also been considered in [13]. The authors introduced a UAV-assisted MEC system intending to minimize the total cost of IoT devices. They devised the AA-CAP algorithm to optimize binary computation offloading, computation resource allocation, spectrum resource allocation, and UAV placement for this purpose. The authors in [14] investigated a scenario where the UAV established wireless communication between the mobile users (MUs) and edge clouds. Based on the successive convex approximation method, they developed an algorithm to optimize the UAV placement, communication and computing resource allocation, and task partition variables for minimizing the cost in terms of total service delay and MUs energy consumption.
Along with MEC, multiple access has recently been an attractive research topic. The non-orthogonal multiple access (NOMA) technique was examined to improve the system performance in many scenarios, such as ultra-reliable lowlatency communications systems [15], and blind signal classification and detection systems [16]. In terms of RSMA, several recent studies have looked into various problems related to the adoption of RSMA in wireless networks, which considered both downlink [17]- [19] and uplink [20], [21] transmissions. For instance, the work in [17] considered two schemes of multiple access: NOMA and RSMA. The authors examined the energy efficiency in a millimeter-wave downlink system, and the results showed the outperformance of the RSMA to NOMA in this scenario. In [18], RSMA was applied in a downlink network to maximize user sum rates by optimizing rate allocation and power control at the base station (BS). The authors in [19] considered the combination of RSMA and UAV networks. They present the integration of RSMA with a UAV-BS downlink transmission scenario, where a low-complexity iterative algorithm was proposed to optimize the UAV placement, RSMA precoding, and rate splitting decision to maximize the weighted sum rate of users. Different from downlink, only a few research has focused on RSMA in uplink systems. The paper [20] investigated the performance of a two-source uplink RSMA network in terms of outage probability and throughput. In [21], the authors considered the uplink of a wireless network using RSMA. They aim to maximize the user sum-rates by optimizing the decoding order and power allocation at users, which is solved by the difference of two convex function method and an exhaustive search method.
Although the number of studies on MEC and RSMA has grown over the past years, to the best of our knowledge, an integration of RSMA and MEC remains an open and attractive issue. Hence, this observation motivates us to conduct this study.

A. NETWORK SCENARIO
As illustrated in Fig. 1, we consider a HAMEC system in RSMA-enabled aerial networks consisting of an AU set denoted by U = {1, ..., u, ..., U } and its cardinality U. In this model, AUs fly in a 3D space coverage to serve a certain terrestrial area, where the terrestrial base station is unavailable. To improve the quality of services, a HAP equipped with a MEC server hovers in the air covering all the AUs.
In this study, the time is divided into discrete time slots. At each time slot t, each AU u is assumed to have a computation task to execute, denoted as τ and c u [t] are the size and required computation resource in bits of the task, respectively. Each AU can process its task locally or offload the task to HAMEC for processing and sending back the task result. To ensure generality, a partial offloading scheme is investigated for each AU u at time slot t, where the offloading rate is denoted as o u [t] ∈ [0, 1] . The rate determines that o u [t] percentage of task τ u is offloaded to HAP and (1 − o u [t]) part of task τ u is processed locally.

B. COMMUNICATION MODEL
In this scenario, the light-of-sight (LoS) channel is considered for all communication links since the aerial network has almost no obstacles. Hence, we adopt free space path loss model for the channel gain between HAP and AU u, which is calculated as where α is the path loss exponent, β 0 is the channel power gain at reference distance 1m, and d u [t] is the distance between HAP and AU u at time slot t. We denote l , and l h = (x h , y h , z h ) are the location of the AU u at time slot t, and fixed location of HAP, respectively. Then, the distance d u [t] is calculated as (2) To enhance the spectrum efficiency, RSMA technique is assumed for communications between HAP and the AUs. In this scheme, the transmitted signal s u [t] of each AU u at time slot t is split into K sub-signals. Without loss of generality, we choose K = 2 as in [21], and denote , the transmitted signal is then expressed as where p uk [t] ≥ 0 is the transmit power of sub-signal s uk [t] at time slot t. Then, the total offloaded signal received at HAP is represented as where n 0 is the additive white Gaussian noise with power spectral density σ 2 .
} as the splitting ratio set with two variables of the offloaded part of task τ u [t]. In case that the offloading rate is zero, the splitting ratio variables should be zero due to no offloaded data. Then, the splitting ratio should follow the constraints as where δ o uk [t] is the ratio of the k sub-offloaded task of AU u at time slot t, and ⌈o u [t]⌉ is the ceiling function of o u [t], which is equal to 1 when offloading and equal to 0 in otherwise.
Then, each sub-offloaded task k of each AU u is denoted as the size in bits of the sub-offloaded task k. When δ o uk [t] > 0, the corresponding power p uk [t] also needs to be greater than 0 to ensure all split parts are offloaded to HAP, otherwise, the transmit power is set to zero because of no offloaded data. Thus, we define a RSMA offloading power constraint as At HAP, the successive interference cancellation (SIC) technique is applied for decoding all sub-signals } is the decoding order of sub-signal k of AU u at time slot t. Then, according to the ascending order, the uplink rate of sub-signal s uk [t] on RSMA can be calculated as where B is the communication bandwidth of HAP, and ϕ vj [t] is the decoding order of sub-signal j of AU v at time slot t.

C. COMPUTATION MODEL
For the task τ u [t] with offloading rate o u [t], the local execution time at the AU u is calculated as and the execution time at the HAP is calculated as where f l u and f c hu are the computation resource of AU u and the computation resource that HAP allocates to AU u, respectively. The local execution energy consumption of AU u is then calculated as [22] where κ u is the energy coefficient of the AU u depended on the hardware architecture.

D. OFFLOADING MODEL
When offloading a sub-task, it must be uploaded to HAP before execution. The time for uploading sub-offloaded task and the energy consumption of AU u for uploading sub- .
Consequently, the total time for uploading the offloaded part of task τ u [t] is calculated as The computation tasks in this study are considered with small size results enough that the downloading results time can be neglected. Accordingly, the total latency for processing the offloaded part of task τ u [t] consists of uploading time , which can be calculated as and the energy consumption of AU u for completing the offloaded part of task τ u [t] is the energy consumption for uploading, which is calculated as

III. PROBLEM FORMULATION A. COST FUNCTION
According to the partial offloading scheme, the total task latency of task τ u [t], T u [t], is determined as the longest time between the local execution time and offloaded time, which is provided as and the energy consumption E u [t] of AU u for processing the task τ u [t] is computed as the total energy consumption for local computing and offloading, which is provided as In addition, we examine a task processing cost function of all AUs in terms of energy consumption and the total latency, which is calculated by the sum of energy consumption of all AUs and the total task latency with weight parameters. The cost function at each time slot t is given as where η e , η t ∈ [0, 1] are the weight parameter of the energy consumption and latency, respectively, which satisfy η t + η e = 1.

B. PROBLEM FORMULATION
The goal of this study is to minimize the cost function by optimizing the offloading rates, transmission powers, and splitting ratios of AUs, as well as the decoding order at HAP.
} is the set of transmit power of each AU u, the optimization problem can be formulated as where (19d) and (19e) are the value range constraints of the decoding order and the offloading rate, respectively; constraints (19f) and (19g) specify the value range of splitting ratio; (19b) and (19h) indicate the value constraints of the transmit power of each sub-signal, where P max u is the maximum power of each AU u, and the constraint (19b) is to make sure that all split parts are offloaded; the tasks at time slot t need to be completely executed before the next tasks to ensure system capacity, hence, we set a constraint in (19c), where ζ t is the time slot duration.
The problem (19) is non-convex due to the combination of continuous variables and the decoding order. In addition, the dynamic environment yields a large number of possible model observations in real-time. Therefore, we design a reinforcement learning framework named HAMEC-RSMA for solving the problem, which trains the agent using the DDPG algorithm.

IV. PROPOSED DEEP REINFORCEMENT LEARNING FRAMEWORK
where π vj [t] is the decoding priority of sub-signal j of AU v at time slot t.
To simplify the optimization variables, we reform the splitting ratio set of offloaded task τ u [t] to ε o u [t], where ε o u [t] ∈ [0, 1] denotes the splitting ratio variable of each task τ u [t]. As a result, the values the splitting ratios of each task τ u [t] are calculated as Besides, with the transmission power, given the value of p u1 [t], according to (19h), the value of p u2 [t] must satisfy We denote ε p , respectively. According to (22), the values of p u1 [t] and p u2 [t] are then calculated as Hence, according to the constraints (19b) and (19h), the subsignal transmit powers p u1 [t] and p u2 [t] are calculated as In summary, the decoding order , and the transmission powers of each AU u at time slot t are changed to ε p u1 [t] and ε p u2 [t]. Thus, the optimization problem is rewritten as We transform the problem into an RL model, in which the agent is the HAP with high energy and computational resources, and the environment is the whole system. At each time slot t, based on the system state s[t], the agent decides the action a[t] interacting with the environment, then receives back the reward r[t] and the next state s[t + 1]. The state space, action space, and reward function are defined as follows.

1) State Space
The state space determines dynamic parameters of the environment that affect the reward of the RL model. In this environment, the state space consists of the location and the task information of all AUs, which is represented as The number of location indexes, task sizes, and task required computation resources are U . Then, the total number of entries in state space is 4U .

2) Action Space
The action space includes all the optimization variables, which are the decoding priorities, offloading rates, splitting variables, and transmit power variables. At each time slot t, the action space is represented as where all the constraints in (25) need to be satisfied. The number of decoding priorities, offloading rates, splitting variables, and transmit power variables are 2U , U , U , and 2U , respectively. Therefore, the total number of entries in action space is 6U . All actions are designed to have a value range of [0, 1], which satisfies the value ranges in (25).

3) Reward Function
The reward determines the effect of action a[t] on the environment at state s[t]. Since this study aims to minimize the task processing cost function, a negative cost value is added to the reward function. In addition, if the constraint (25b) is violated, the task τ u [t] will not be successfully executed. Thus, we add a penalty reward to penalize the action violating the constraints in (25b). In summary, the reward function at time slot t can be represented as where ∇ and λ[t] are the negative penalty value and the corresponding binary variable, which is given by

B. HAMEC-RSMA ALGORITHM 1) Applied DDPG algorithm
The DDPG algorithm proposed in [23] is an actor-critic algorithm that handles the continuous action domain in reinforcement learning. It includes an actor-network, µ(s|θ µ ) with the parameter θ µ , defining the policy for the agent to decide the action a at each time step according to the observed state s from the environment. The algorithm also includes a criticnetwork, Q(s, a|θ Q ) with the parameter θ Q , measuring each action a at each state s to determine how effective it is in this corresponding state. In updating the networks, the DDPG algorithm employs a target actor-network, µ ′ (s|θ µ ′ ) with the parameter θ µ ′ , and a target critic-network, Q ′ (s, a|θ Q ′ ) with the parameter θ Q ′ , to improve model training stability. At each training step, using batch learning, the parameter of the main critic-network is updated by minimizing the loss (L) between the action-value function, Q(s i , a i |θ Q ), and the target value y i , which is calculated as where B is the mini-batch size, s i and a i are the state and action of sample i in the mini-batch, respectively. The target value of sample i, y i , is calculated as where r i and s i+1 are the reward and next state of the sample i, respectively, and γ is the discount factor. Besides, the parameter of the main actor-network is updated using the policy gradient as (32) Then, the target network parameters are updated using the soft update with a small constant, τ , which are calculated as Owing to ensure exploration in the training samples, the DDPG adds noise samples to the actor policy to produce action interacting with the environment. This method can be called action space noise. In training, the produced action at time slot t can be given as where N [t] can be chosen from Ornstein-Uhlenbeck process.
Remark. The decided actions have to satisfy the value range of [0, 1] as mentioned in subsection IV-A2. Therefore, the decided action from the policy have to be in the range of [0, 1], which can be presented as However, the additional noise in the exploration in (34) may violate the action value ranges in (25). Specifically, we generate random action in the value range of [0,1] in 5e 6 steps and perform the noise using the Ornstein-Uhlenbeck process to the action. The probability density functions (PDF) of the action and action with noise are illustrated in Fig. 2. Although the action varies from 0 to 1, the action plus noise can go out of the range and hit the new value range of (-1,2), which violate the constraints in (25). Therefore, we employ another way to explore the training samples in this study. Instead of using the additional noise, we embed the parameter space noise to the DDPG algorithm for exploration as proposed in [24].

2) Parameter noise in the DDPG algorithm
This method explores the training samples by adding noise to the network parameters instead of the action space. The agent employs a perturbed actor-network,μ s|θ µ with the parameterθ µ , to decide action interacting with the environment and trains the non-perturbed actor-network, µ(s|θ µ ). Then, the decided action is given as The parameter of the perturbed actor-network is achieved by applying additive Gaussian noise to the non-perturbed actor-network, which can be given as where N (0, 1) is the additive Gaussian noise with mean 0 and variance 1. Perturb the actor-network parameter as (40).

6:
Observe initial state s[t] of the episode as (26). 7: while in episode do 8: Interacting In addition, the parameter space noise requires a scale value ρ to adjust the variance in the action space. The noise scale value is adapted over time and can be calculated as where ς and ϑ are a scaling factor and a threshold value, respectively, d(µ,μ) denotes the distance between the nonperturbed and perturbed policy, which can be calculated as where N is the dimension of the action space, and E s [.] is estimated from a batch of states from the replay buffer. Then, the perturbed actor-network parameter at episode k is calculated asθ We illustrate the whole framework in the Fig. 3, which has five neural networks: two critic-networks and three actornetworks. The perturbed actor-network is used to decide the action a[t] to interact with the observed state from the environment s[t] during exploration. Then, The samples are stored into the replay buffer and used for training the agent using the DDPG algorithm. The procedure is described in the algorithm 1.
At the beginning of each episode, the actor-network is perturbed with the noise scale value using (40). Based on observation s[t] at each step, the perturbed actor-network decides an action a[t] interacting with the environment and gets back the reward r[t] and the next state s[t + 1]. A tuple including s[t], a[t], r[t], and s[t + 1] is then stored to the replay buffer for training the agent. In each training step, the agent randomly samples a mini-batch of experiences from the replay buffer and calculates to update the parameter of the critic-network and the non-perturbed actor-network by minimizing the loss as (30) and policy gradient as (32), respectively. Accordingly, the target networks are updated using the soft update as (33). At the end of each episode, the agent calculates the distance between the non-perturbed and perturbed policy by (39) to update the noise scale value using (38). After training, the trained non-perturbed actor-network is obtained to test the model.

V. PERFORMANCE EVALUATION A. SIMULATION SETTINGS
To evaluate the performance of the proposed HAMEC-RSMA framework, we simulate an environment including a HAP hovers at an altitude of 20 (Km) that serves 4 AUs flying in a range of 500 (m) at an altitude of 200 (m). Each AU is assumed to fly randomly with a velocity of 15 (m/s) in a certain area without violating the others. The networks in this model have two hidden layers, which include 512 and 1024 nodes in the critic-networks, and 128 and 256 nodes in the actor-networks. The model parameters, and VOLUME 10, 2022   [23] to the model, where the exploration is ensured by the additional noise as (35). To deal with the training samples violation issue as analyzed in subsection IV-B1, we directly trim the action to the constraint range. Then, the decided action can be presented as (41) • Random action (RA): In this scheme, the actions are chosen randomly within the constraint value range to interact with the environment.

B. CONVERGENCE ANALYSIS
Firstly, we evaluate the proposed algorithm convergence by varying some training hyper-parameters. The environment in this training is a scenario with 4 AUs moving randomly, the task size and the required computation resource are chosen randomly in the range of [1-1.5] (Mbits) and [1000-1200] (cycles/bit), respectively. The training reward of the model when changing the learning rate and the mini-batch size are illustrated in Fig. 4. The learning rates of the actor-network (lr a ) and the critic-network (lr c ) are chosen in three cases (lr a , lr c ) = {(5e −4 , 5e −4 ), (1e −4 , 5e −4 ), (1e −4 , 1e −4 )}. As the results in Fig. 4a, the rewards converge to a definite range after several hundred episodes and then slightly increase in all three cases.   However, the case (lr a , lr c ) = (1e −4 , 5e −4 ) gives better performance than the others when it reaches the definite range after about 300 episodes and achieves the highest reward value in three cases. Therefore, we choose this case of learning rate for the simulation.
To evaluate the efficiency of the mini-batch size to the training, we train the model with threes values as B = {16, 32, 64} and get the result in Fig. 4b. In the case B = 64, the model starts to converge after about 950 episodes, which is very slow compared to the others. Because of the dynamic environment, the large mini-batch size causes noise in the training data and leads to slow convergence. In the remaining cases, the model converges earlier, and the case B = 32 gives the best performance. Hence, we select B = 32 to train this model.
Then, we compare the efficiency of the parameter space noise and action space noise (DDPG-AN) to the exploration in training. The training rewards are illustrated in Fig. 5. According to the results, the reward for employing action space noise grows over time and stabilizes around 1150 training episodes. The reward when utilizing parameter space noise, on the other hand, is not consistent during training, implying that it is still exploring new experiences. Furthermore, the reward of parameter space noise is higher than the reward of action space noise. Because in DDPG-AN, we explicitly trim the action to the constraint range, which limits the exploration and may lead the model to become stuck at a local point, reducing the model performance. However, parameter space noise can improve this issue, and the results reveal that parameter space noise performs better than action space noise in this scenario.

C. PERFORMANCE EVALUATION
The performance of the proposed HAMEC-RSMA framework is compared to the other schemes in this sub-section. We test the model in 100 episodes, and the results are illustrated in Fig. 6. The episode rewards of the schemes are given in Fig. 6a, with HAMEC-RSMA consistently having the highest reward. The results indicate that the RSMA technique can improve the performance of the NOMA technique, where the reward of HAMEC-RSMA is about 5.1% greater than HAMEC-NOMA. The result also demonstrates the efficiency of the parameter space noise compared to the action space noise, where the proposed framework gives the result approximately 32.11% better than DDPG-AN. In addition, the HAMEC model significantly improve the network performance, where it is about 65.57% and 83.56% higher than FLE and RA schemes, respectively.
Besides, we evaluate the energy cost fairness between the   AUs by measuring the energy costs in 100 testing episodes, which are illustrated in Fig. 6b. The average energy cost of each AU in the HAMEC-RSMA and HAMEC-NOMA is low compared with the others. Furthermore, HAMEC-RSMA gives better fairness than HAMEC-NOMA. Because in the NOMA, an AU with poor channel gain has to use more transmit power to increase the data rate, resulting in an imbalance in transmit power. In the RSMA, each AU can have a high rate path and a low rate path, depending on the splitting ratio and decoding order, which makes data transmission fair. As a result, the total power used to transmit data will be fair among the AUs.
Next, we examine the system performance when increasing the task size, where the task size increases from 1.0 to 1.5 (Mbits), with the cost values shown in Fig. 7a. Firstly, the proposed framework performs best in all cases where the cost value is the lowest. In addition, the cost value rises as the task size increase, where the cost increase approximately 9.81% when the task size increases 100 (Kbits) in the HAMEC-RSMA scheme. Because with a given capacity, increasing the task size leads to an addition in the time for processing or the need for more power to process the task, which grows the cost value of competing for the task.
Also, Fig. 7b illustrates the impact of the required computation resource of the task on the system performance. Our proposed framework also yields the lowest cost compared with the other schemes in all the evaluation cases. Similar to the task size, increasing the task required resource makes the cost value higher due to an increase in the processing time or the power. In particular, the cost increase approximately 4% when the task required resource increases 100 (cycles/bit) in the HAMEC-RSMA scheme.

VI. CONCLUSION
In this research, we have considered the HAMEC-enhanced aerial system with uplink RSMA communication. Here, we formulated an optimization problem involving the offloading decision, splitting ratio, transmit power, and decoding order to minimize task processing costs in terms of the total latency and energy consumption. To overcome the problem, we deployed a reinforcement learning model named HAMEC-RSMA, which trains the agent using the DDPG algorithm. Since the action space noise in the DDPG algorithm may violate variable constraints of the problems, we applied the parameter space noise for exploration to improve the training performance. Numerical results demonstrated the efficiency of the proposed HAMEC-RSMA framework, where it outperforms the existing benchmark schemes. As future work, the integration of RSMA and MEC should be considered in an industrial IoT scenario with dense user equipment. In addition, a combination of HAMEC-RSMA with other emerging technologies potential in 6G networks such as intelligent reflecting surface, massive multiple-input and multiple-output (MIMO) deserves to be investigated thoroughly.