Optimal Resource Allocation for Multi-User OFDMA-URLLC MEC Systems

In this paper, we study resource allocation algorithm design for multi-user orthogonal frequency division multiple access (OFDMA) ultra-reliable low latency communication (URLLC) in mobile edge computing (MEC) systems. To meet the stringent end-to-end delay and reliability requirements of URLLC MEC systems, we employ joint uplink-downlink resource allocation and finite blocklength transmission. Furthermore, we propose a partial time overlap between the uplink and downlink frames to minimize the end-to-end delay, which introduces a new time causality constraint. The proposed resource allocation algorithm is formulated as an optimization problem for minimization of the total weighted power consumption of the network under a constraint on the number of URLLC user bits computed within the maximum allowable computation time, i.e., the end-to-end delay of a computation task of each user. Despite the non-convexity and the complicated structure of the formulated optimization problem, we develop a globally optimal solution using a branch-and-bound approach based on discrete monotonic optimization theory. The branch-and-bound algorithm minimizes an upper bound on the total power consumption until convergence to the globally optimal value. Furthermore, to strike a balance between computational complexity and performance, we propose two efficient suboptimal algorithms. For the first suboptimal scheme, the optimization problem is reformulated in the canonical form of difference of convex programming. Then, successive convex approximation (SCA) is used to determine a locally optimal solution. For the second suboptimal scheme, we use a high signal-to-noise ratio approximation for the channel dispersion. Then, via novel transformations, we convert the non-convex quality-of-service constraints of the original problem into equivalent second-order-cone constraints. Our simulation results reveal that the proposed resource allocation algorithm design facilitates URLLC in MEC systems, and yields significant power savings compared to three baseline schemes. Moreover, our simulation results show that the proposed suboptimal algorithms offer different trade-offs between performance and complexity and attain an excellent performance at comparatively low complexity.


I. INTRODUCTION
Future wireless communication networks target several objectives including high data rates, reduced latency, and massive device connectivity.One important objective is to facilitate ultra-reliable low latency communication (URLLC).URLLC is crucial for mission-critical applications such as remote surgery, factory automation, autonomous driving, tactile Internet, and augmented reality to enable real-time machine-tomachine and human-to-machine interaction [2].URLLC imposes strict quality-of-service (QoS) constraints including a very low latency (e.g., 1 ms) and a low packet error probability (e.g., 10 −6 ).
Recently, significant attention has been devoted to studying and developing resource allocation algorithms for URLLC.In particular, optimal power allocation in a multi-user time division multiple access (TDMA) This paper will be presented in part at IEEE GLOBECOM 2020 [1].
URLLC system was considered in [3], [4].Moreover, resource allocation for orthogonal frequency division multiple access (OFDMA)-URLLC systems was studied in [5]- [9].In [10], [11], resource allocation for secure URLLC was investigated.However, the resource allocation schemes in [3], [4], [6]- [11] focused only on communication while computation was not considered.Nevertheless, devices in mission-critical applications are expected to generate tasks that require computation within a given time.This motivates the investigation of resource allocation algorithm design for efficient computation in URLLC systems.
A promising solution to enable efficient and fast computation for URLLC devices is mobile edge computing (MEC).MEC can enhance the battery lifetime and reduces the power consumption of users with delaysensitive computation tasks [12].By offloading these tasks to nearby MEC servers, the power consumption and computation time at the local users can be considerably reduced at the expense of the power required for data transmission for offloading [12].Thus, careful resource allocation is paramount for MEC to ensure the efficient use of the available resources (e.g., power and bandwidth) while guaranteeing a maximum delay for the computation tasks.Existing resource allocation algorithms for MEC systems, such as [13]- [16], are based on Shannon's capacity formula.In particular, the authors of [13], [15] studied energy-efficient resource allocation for MEC, while computation rate maximization was targeted in [14].However, if the resource allocation design for URLLC MEC systems is based on Shannon's capacity formula, the reliability of the offloading and downloading processes cannot be guaranteed because of the imposed delay constraints.To overcome this issue, recent works applied finite blocklength transmission (FBT) [17] for resource allocation algorithm design for URLLC MEC systems.In particular, the authors in [18] studied binary offloading in single-carrier TDMA systems.However, single-carrier systems suffer from poor spectrum utilization and require complex equalization at the receiver.In [19], the authors investigated the minimization of the normalized energy consumption of an OFDMA-URLLC MEC systems.However, the algorithm proposed in [19] assumes that the channel gains of different sub-carriers are identical which may not be a realistic assumption for broadband wireless channels.Moreover, the resource allocation algorithms proposed in [19] are based on a simplified version of the general expression for the achievable rate for FBT [17].Furthermore, the existing MEC designs, such as [13], [20], do not take into account the size of the computation result of the tasks and do not consider the communication resources consumed for downloading of the processed data by the users.Nevertheless, the size of the processed data can be large for applications such as augmented reality.
We note that most resource allocation algorithms proposed for URLLC systems in the literature, such as [6], [9], [10], [21], are strictly suboptimal.In particular, the algorithms developed in [10], [21] were based on block coordinate descent techniques, while those in [6], [9] employed successive convex approximation (SCA).As a result, the performance of the resource allocation algorithms in [6], [9], [10], [21] cannot be guaranteed because the gap between the optimal and suboptimal solutions is not known.To cope with this problem, in our recent work [7], we proposed a global optimal algorithm based on the polyblock outer approximation method using monotonic optimization.However, the polyblock algorithm may suffer from slow convergence for large problem sizes.To overcome this problem, in this paper, a branch-and-bound algorithm is proposed.Different from the general branch-and-bound algorithms proposed for non-convex problems, e.g., [22], the proposed branch-and-bound algorithm exploits the monotonicity of the problem to reduce the search space for faster convergence [23].
In this paper, we study optimal joint uplink-downlink resource allocation for OFDMA-URLLC MEC systems.The main contributions of this paper are as follows: • We propose a novel joint uplink-downlink resource allocation algorithm design for multi-user OFDMA-URLLC MEC systems.To reduce the end-to-end delay of uplink and downlink transmission while efficiently exploiting the available spectrum, we propose a partial time overlap between the uplink and downlink frames and introduce corresponding causality constraints.Then, the resource allocation algorithm design is formulated as an optimization problem for the minimization of the total weighted power consumed by the base station (BS) and the users subject to QoS constraints for the URLLC users.The QoS constraints include the required number of bits computed within a maximum allowable time, i.e., the maximum end-to-end delay of the users.
• The formulated optimization problem is a non-convex mixed-integer problem which is difficult to solve.Thus, we transform the problem into the canonical form of a discrete monotonic optimization problem.This reformulation allows the application of the branch-and-bound algorithm to find the global optimal solution.The proposed branch-and-bound algorithm searches for a global optimal solution by successively partitioning the non-convex feasible region and using bounds on the objective function to discard inferior partition elements.
• To strike a balance between computational complexity and performance, we develop two efficient lowcomplexity suboptimal algorithms based on SCA and second-order cone programming (SOC).
• Our simulations show that the proposed suboptimal algorithms offer different trade-offs between complexity and performance and closely approach the performance of the optimal algorithm, despite their significantly lower complexity.Furthermore, the proposed algorithms achieve significant performance gains compared to three baseline schemes.
We note that this paper expands the corresponding conference version [1] in several directions.First, the formulated optimization problem targets joint local computing and edge offloading, while only edge offloading was considered in [1].Second, we derive the optimal resource allocation policy for OFDMA-URLLC MEC systems, whereas only a suboptimal algorithm was provided in [1].Thirdly, we propose a second suboptimal algorithm to further reduce the complexity of the suboptimal scheme proposed in [1].
The remainder of this paper is organized as follows.In Section II, we present the considered system and January 19, 2022 DRAFT channel models.In Section III, the proposed resource allocation problem is formulated.In Section IV, the optimal resource allocation algorithm is derived, whereas low-complexity suboptimal algorithms are provided in Section V.In Section VI, the performance of the proposed schemes is evaluated via computer simulations, and finally conclusions are drawn in Section VII.
Notation: Lower-case letters x refer to scalar numbers, and bold lower-case letters x represent vectors.
(•) T denotes the transpose operator.R N ×1 represents the set of all N ×1 vectors with real valued entries.The circularly symmetric complex Gaussian distribution with mean µ and variance σ 2 is denoted by CN (µ, σ 2 ), ∼ stands for "distributed as", and E{•} denotes statistical expectation.∇ x f (x) denotes the gradient vector of function f (x) and its elements are the partial derivatives of f (x).For any two vectors x, y ∈ R + , x ≤ y means x i ≤ y i , ∀i, where x i and y i are the i-th elements of x and y, respectively.x * denotes the optimal value of an optimization variable x.

II. SYSTEM AND CHANNEL MODELS
In this section, we present the system and channel models for the considered OFDMA-URLLC MEC system.

A. System Model
We consider a single-cell multi-user MEC system which comprises a BS and K URLLC users indexed by k = {1, . . ., K}, cf.Moreover, we assume a binary offloading scheme, where a task is executed as a whole either locally at the URLLC user or remotely at the MEC server.For task offloading, the user sends the task in the uplink and the edge server computes the task and sends the results back to the user in the downlink.There is an offset of τ time slots between downlink and uplink transmission.Thus, uplink and downlink transmission overlap in Ō = N u − τ time slots.The value of τ is a design parameter.On the one hand, if τ is chosen too small, the users' tasks may have not yet been computed when the downlink frame ends and hence the downlink resource is wasted.On the other hand, if τ is chosen too large, the computed bits at the BS have to wait before being transmitted to the users, which increases the end-to-end delay, see Fig. 1.The maximum transmit power of the BS is P max , while the maximum transmit power of each user in the uplink is P k,max .
In order to facilitate the presentation, in the following, we use superscript j ∈ {u, d} to denote uplink u and downlink d.
Remark 1.We note that the time and power consumed for channel estimation and resource allocation are constant and do not affect the proposed resource allocation algorithm.For simplicity of illustration, they are neglected in this paper.Furthermore, perfect channel state information (CSI) is assumed to be available at the BS for resource allocation design to obtain a performance upper bound for OFDMA-URLLC MEC systems.

B. Uplink and Downlink Channel Models
In the following, we introduce the uplink and downlink channel models for the considered OFDMA-URLLC MEC system.We assume that the channel gains of all sub-carriers are constant for all users during uplink and downlink transmission.In the uplink, the signal received at the BS from user k on sub-carrier m u in time slot n u is given as follows: where x u k [m u , n u ] denotes the symbol transmitted by user k on sub-carrier m u in time slot n u to the BS.Moreover, z u BS [m u , n u ] ∼ CN (0, σ2 ) denotes the noise on sub-carrier m u in time slot n u at the BS 2 , and h u k [m u ] represents the complex channel coefficient between user k and the BS on sub-carrier m u .For future reference, we define the signal-to-noise ratio (SNR) of user k's signal at the input of the BS's receiver on sub-carrier m u in time slot n u as follows: where to the asymptotic case where the packet length approaches infinity and the decoding error probability goes to zero [24].Thus, it cannot be used for resource allocation design for URLLC systems, as URLLC systems have to employ short packets to achieve low latency, which makes decoding errors unavoidable.For the performance evaluation of FBT, the so-called normal approximation for short packet transmission was developed in [25].For parallel complex additive white Gaussian noise (AWGN) channels, the maximum number of bits Ψ conveyed in a packet comprising L p symbols can be approximated as follows [25, Eq.
where ǫ is the decoding packet error probability, and and γ[l] are the channel dispersion [25] and the SNR of the l-th symbol, respectively, and a = log 2 (e).
In this paper, we base the joint uplink-downlink resource allocation algorithm design for OFDMA-URLLC MEC systems on (3).By allocating several resource elements from the available resources to a given user, the number of offloaded and downloaded bits of the user can be adjusted.

III. PROBLEM FORMULATION
In this section, we explain the offloading and downloading process and introduce the QoS requirements of the OFDMA-URLLC MEC users.Moreover, we formulate the proposed resource allocation algorithm design as an optimization problem.

A. Computing Modes
In this section, we explain the different computing modes of the users.First, we explain the local computing at the users.Then, we explain the steps required for offloading to the edge server.

1) Local Computing Mode:
According to [27], [28, Eq. (1)], the power consumption of the central processing unit (CPU) comprises the dynamic power, short circuit power, and leakage power where the dynamic power is much larger than the other two.As a result, similar to [28], we only consider the dynamic power for local execution.According to [27]- [29], the total energy required for computing a task of length B k bits at user k is given by: where f k denotes the CPU frequency of the k-th user, κ is the effective switched capacitance which depends on the chip architecture and is assumed to be identical for all users, c k is the number of cycles required for processing of one bit which depends on the type of application and the CPU architecture [29].A user can reduce its total energy consumption by reducing the CPU frequency.However, the task computing latency also depends on the frequency and is given as follows: Combining ( 4) and ( 5), the local power consumption at user k is given as follows: A local user can adjust its CPU frequency to minimize its local power consumption subject to a required task computing latency.Alternatively, considering the limited capability of its CPU, a user may prefer to offload its task to the edge server instead.This process is explained in the following.
2) Offloading and Downloading: The edge computing process is performed as follows.First, the user offloads its data to the edge server in the uplink.Subsequently, the edge server processes this data and sends the results back in the downlink transmission to the user.Thus, uplink and downlink transmission should satisfy the following constraints: where and Here, s j k [m j , n j ] = {0, 1}, ∀m j , n j , k, ∀j, are the sub-carrier assignment indicators.If sub-carrier m j is assigned to user k in time slot n j , we have s j k [m j , n j ] = 1, otherwise s j k [m j , n j ] = 0. Furthermore, we assume that each sub-carrier is allocated to at most one user to avoid multiple access interference.s j k and p j k are the collections of optimization variables s j k [m j , n j ], ∀m j , n j , and p j k [m j , n j ], ∀m j , n j , ∀j, respectively, and . Constraints C1 and C2 guarantee the transmission of (1 − α k )B k bits in the uplink and Γ k (1 − α k )B k bits in the downlink for user k, respectively, where parameter Γ k , ∀k, specifies the ratio of the size of the computing result and the size of the offloaded task.The value of Γ k depends on the application type, e.g., Γ k > 1 for augmented reality applications [30].Moreover, is the binary mode selection variable, where α k = 1 for local computing and α k = 0 for edge computing offloading.

B. Causality and Delay
In the following, we explain the causality and delay constraints in the considered OFDMA-URLLC MEC system.
1) Causality: Downlink transmission cannot start for a given user before all data of this user has been received at the BS via the uplink.Furthermore, according to this condition can be imposed by the following set of linear inequality constraints: As can be seen from ( 11), if user k uses sub-carrier m u in time slot n u = τ + o, then the downlink resources at and before time slot n d = o will be forced to be zero, i.e., no data is sent to user k.

2) Delay:
The delay of a computing task is limited by requiring the downlink transmission to be finished before D k − τ time slots as follows3 : The total latency of a computing task is determined by D k and τ .Note that the values of D k and τ are assumed to be known for resource allocation.

C. Total System Power Consumption
The total system power consumption includes the power consumption of the users and the BS.The power consumption of user k is given as follows [14], [31], [32]: where the first term in ( 13) accounts for the local computation power consumption in case of local computing, the second term accounts for the power consumed for offloading transmission, and the third term accounts for the constant circuit power consumption during offloading.To model the inefficiency of the power amplifiers of the users, we introduce the multiplicative constant, δ k ≥ 1, for the power radiated by the transmitter in (13) which takes into account the joint effect of the drain efficiency and backoff of the power amplifier [33].Note that, as can be seen from C1 and C2, when α k = 1, the required offloaded and downloaded data is zero, and hence, in this case, since we minimize the total power consumption, the power allocated for uplink transmission, p u k [m u , n u ], will be zero ∀m u , ∀n u .On the other hand, for offloading, i.e., α k = 0, the optimization problem formulated in the next subsection will ensure that the power consumption for local computing will be zero.Hence, there is no need to explicitly multiply the first and second term in ( 13) by α k and (1 − α k ) to ensure that the terms are zero for offloading and local computing, respectively.Furthermore, due the significant computational resources of the BS, we neglect the corresponding computation power consumption.Moreover, since in practice the BS does not only serve the MEC users considered for resource allocation but also non-MEC users, the BS circuit power consumption is also not considered for optimization.
Thus, the relevant weighted system power consumption is modelled as follows: where the second term in ( 14) represents the power consumption of the BS for downlink transmission and δ BS ≥ 1 accounts for the inefficiency of the BS power amplifier.Moreover, w k ≥ 1, ∀k, are weights that allow the prioritization of the users' power consumption compared to the BS's power consumption.

D. Optimization Problem Formulation
In the following, we formulate the resource allocation problem with the goal to minimize the total weighted network power consumption, while satisfying the latency requirements of the users' computing tasks.In particular, we optimize the uplink and downlink transmit powers, the uplink and downlink subcarrier assignment, the CPU frequency of the local CPUs, and the mode selection of each user.To this end, the optimization problem is formulated as follows: s.t.C1 − C4, C5 : Here, f , s u , p u , s d , p d , and α are the collections of optimization variables f k , ∀k, s u k , ∀k, p u k , ∀k, s d k , ∀k, p d k , ∀k, and α k , ∀k, respectively.In (15), constraints C1 and C2 guarantee the transmission of the required number of bits from user k to the BS in the uplink and from the BS to user k in the downlink, respectively, if the user offloads the task, i.e., α k = 0. Constraint C3 is the uplink-downlink causality constraint and constraint C4 ensures that user k is served such that its task meets the associated delay requirements.Constraints C5 and C6 for the uplink and constraints C7 and C8 for the downlink are imposed to ensure that each sub-carrier in a given time slot is allocated to at most one user.Constraints C9 and C11 are the total transmit power constraints of user k and the BS, respectively.Constraints C10 and C12 are the non-negative transmit power constraints.
Constraint C13 ensures that the maximum allowed delay for local computing is not exceed when α k = 1.15) is a mixed integer non-convex optimization problem.Such problems are in general NP hard and are known to be difficult to solve.However, in the next section, we propose an optimal scheme based on a branch-and-bound approach using monotonic optimization which finds the optimal solution of the considered problem.Moreover, in Section V, we propose two efficient suboptimal schemes that find close-to-optimal solutions and entail low computational complexity.

IV. PROPOSED GLOBAL OPTIMAL SOLUTION
In this section, we propose a branch-and-bound algorithm to solve problem (15) optimally.Different from the general branch-and-bound algorithms proposed for non-convex problems, e.g., [22], the proposed branch-and-bound algorithm exploits the monotonicity of the problem to reduce the search space for faster convergence [23].The purpose of finding a global optimal solution to ( 15) is twofold: (1) determining a performance upper bound for OFDMA-URLLC MEC systems, and (2) having a benchmark for the efficient suboptimal solutions presented in Section V. We first introduce some mathematical background on monotonic optimization theory.Then, we transform optimization problem (15) into the canonical form of discrete monotonic optimization.Finally, we present the optimal algorithm based on a new branch-and-bound algorithm which aims to minimize an upper bound on the objective function of (15) until convergence to the optimal solution.

A. Mathematical Preliminaries for Monotonic Optimization
In this subsection, we introduce some mathematical preliminaries for monotonic optimization [34]- [37].

Definition 1 (Increasing function). A function ψ
as a box with lower and upper corners x and x, respectively.
Definition 5.An optimization problem belongs to the class of discrete monotonic optimization problems if it can be represented in the following form [34], [35]: where Λ(x) is an increasing function on R N ×1 + in x and V is a normal non-empty closed set, which is the intersection of normal set G and co-normal set H.
The solution of monotonic optimization problem P1 lies on the boundary of the feasible set [35].As shown in [34], [35], [37]- [41], the branch-and-bound algorithm can be used to iteratively approximate the boundary of the feasible set of P1 to find the global optimum solution in a finite number of iterations.In the following, we transform optimization problem (15) into a monotonic optimization problem.Then, we propose an optimal algorithm based on the branch-and-bound technique.

B. Problem Transformation
In this subsection, we transform problem (15) into the canonical form of a monotonic optimization problem.
First, we introduce the following constraints in optimization problem (15): Based on ( 17) and ( 18) optimization problem ( 15) is transformed into the following equivalent form: where Although optimization problem ( 19) is still non-convex, it is more tractable compared to equivalent problem (15), and as is shown in the following, it can be transformed into a monotonic optimization problem.To this end, we first study the monotonicity of problem (19) in the following two lemmas.
Lemma 1. Constraints C1 and C2 are differences of two monotonic and concave functions.
Proof.The proof closely follows a similar proof in [6], and is omitted here due to space limitation.
Therefore, based on Lemma 2, by defining positive auxiliary optimization variables , ∀k, we transform non-monotonic constraints C1 and C2 into the following equivalent monotonic constraints: C2a : where V u k (P k,max ) is obtained by allocating all power available in the uplink, i.e., P k,max , to time slot n j , sub-carrier m j , and user k.V d k (P max ) is defined in a similar way.Now, optimization problem (19) can be transformed into the following equivalent form: where ζ is the collection of optimization variables ζ j k , ∀k, j.In order to find an optimal solution for (23), we perform an exhaustive search over the binary variables in α.For a given α k = ᾱk , ∀k, optimization problem (23) reduces to the following optimization problem: The optimal solution of problem ( 23) can be obtained by solving problem (24) for all 2 K possible values of α .Then, we select that α = ᾱ which minimizes the objective function of (24).Problem ( 24) is in the canonical form of a discrete monotonic optimization problem.Moreover, to facilitate the design of an optimal algorithm for solving (24), we rewrite (24) in the following form: where Φ is the objective function in (24).Set G is defined by constraints C1b, C2b, and C3-C17, and conormal set H is defined by constraints C1a and C2a.The main difficulty in solving problem (25) are the reverse convex constraints C1b, C2b, and the non-convex binary constraints C6 and C8.Moreover, for given can be solved optimally in the remaining variables as we will explain in the following.Therefore, an efficient algorithm to find the optimal solution of ( 25 Since the values of s u and s d are known, we can simply check the constraint in (26).

C. Design of Optimal Algorithm
Optimization problem ( 25) is a discrete monotonic optimization problem which can be optimally solved via the branch-and-bound algorithm as explained in the following [23], [42].To facilitate the presentation of the optimal solution, we collect optimization variables The solution of (25) lies on the boundary of the feasible set, due to the monotonicity of the objective function and the constraints.However, the boundary of the feasible set is unknown.Thus, we approach the boundary by enclosing the feasible set V = G ∩ H by an initial box where u (0) and u (0) are lower and upper bounds, respectively, for the collection of variables u.We ensure u (0) and u (0) to be contained in G \ H and H, respectively.If this condition is not satisfied, either the problem is infeasible (when u (0) is not in set G) or u (0) is an optimal solution of the problem (when u (0) is in V).Iteratively, we split certain hyperrectangles, i.e., boxes, on the optimization variables u and try to improve a lower bound and an upper bound on the optimal value of the objective function.To aid this process, a local lower bound L B is stored for each box B ∈ L, where L is the set of all available boxes.
Moreover, the current best value of the objective function obtained so far is denoted by C BV .An algorithmic description of the proposed branch-and-bound scheme is presented in Algorithm 1.In the following, we explain the algorithm in more detail.

1) Selection and Branching:
In each iteration i of the optimal algorithm, i.e., in Line 3 of Algorithm 1, we start by selecting the box B (i) that has the lowest lower bound from the set of available boxes L as follows: After selecting a box we bisect the longest edge of B (i) .We first calculate then, B (i) is partitioned into two new boxes as follows [36]: where e j ∈ R L is a vector whose j-th element is equal to one and the remaining elements are zero.The bisection rule in (29) guarantees that the branching process is exhaustive [23], [36], [43] and the algorithm converges to the optimal solution.
2) Feasibility Check: After the two new boxes 2 ] are generated, we check the lower and upper corners of each box and verify whether these boxes are feasible or not, see Lines 4-20.To do so, we first calculate local lower bounds L C BV , we check the feasibility of the box and search for better feasible points.To do so, we first check the lower corners of each box by checking the feasibility of (26).If the lower corners are feasible, then, these lower corners will be added to the set of feasible solutions S and we update the current best value C BV .
Otherwise, if this condition is not satisfied, we check if the box contains feasible solutions.The box is not feasible if u (i) / ∈ G or u (i) / ∈ H.In this case, we remove the infeasible box in the next step of the algorithm, i.e., in the pruning step.
Remark 3.Although variables ζ and f are convex variables, we branch over them.In fact, this facilitates the optimal algorithm design and reduces the total computation time needed for finding the optimal solution as it eliminates the use of convex software solvers which would contribute significantly to the overall computation time.
3) Bounding and Pruning: The bounding and pruning steps are described in the following: Bounding: The problem is to find upper and lower bounds for Φ(u) over the set G ∩ H for a given box Due to the monotonicity of Φ(•) we can obtain the upper and lower bounds as Φ(u) and Φ(u), respectively.
Pruning: In the pruning step infeasible boxes are removed.These boxes have local lower bounds greater than the current best global value, i.e., L B,b > C BV , ∀b, and the original branched box in iteration i, i.e., B (i) .This step is performed to reduce memory consumption and to achieve faster convergence.

D. Complexity Analysis
For sufficiently large number of iterations I max , Algorithm 1 is guaranteed to find the optimal solution to optimization problem (15).Its convergence can be proved using the same arguments as those in [35], [36], [42].However, the computational complexity of Algorithm 1 is exponential in the number of variables of the optimization problem.Thus, the complexity order of Algorithm 1 is O(2 L ).Due to its high complexity, the proposed optimal resource allocation algorithm cannot be used in real time applications, especially for URLLC systems.However, it provides a valuable performance benchmark for low-complexity suboptimal algorithms.Thus, in the next section, we focus on developing low-complexity resource allocation algorithms based on SCA to strike a balance between computational complexity and performance.

V. SCA-BASED SUBOPTIMAL SOLUTIONS
In this section, we propose two low-complexity suboptimal algorithms based on SCA.

A. Proposed SCA-Based Suboptimal Scheme 1
In this sub-section, we propose a suboptimal algorithm that tackles the non-convexity of (15) in three main steps.First, we use the Big-M formulation to linearize the product terms s j k [m j , n j ]p j k [m j , n j ], ∀k, m j , n j , ∀j.Then, we employ difference of convex (DC) programming and SCA methods to find a locally optimal solution of optimization problem (15).

1) Big-M Formulation: Let us first introduce the new optimization variables
Algorithm 1 Branch-and-bound algorithm ), S denotes a set of feasible solutions, and maximum iteration number I max .2: for i = 1 : I max 3: Selection and branching: Select box suppose check the feasibility of lower corner u  for each B ∈ L do end for 28: i ← i + 1 29: end for 30: Output: Optimal solution u * .Now, we decompose the product term in (30) using the Big-M formulation and impose the following additional constraints [44]: In this manner, the non-convex product term s j k [m j , n j ]p j k [m j , n j ], ∀k, m j , n j , ∀j in ( 30) is transformed into a set of convex linear inequalities.Note that constraints C16-C23 do not change the feasible set.Now, optimization problem ( 15) is transformed into the following equivalent form: where . Moreover, pj k , ∀j are the collection of optimization variables pk [m j , n j ], ∀m j , n j , and pj , are the collection of optimization variables pj k , ∀k, where j ∈ {u, d}.
2) DC Programming: The two remaining difficulties for solving problem (36) are the binary variables in constraints C6, C8, and C14 and the structure of the achievable rate for FBT in C1 and C2.To tackle these issues, we employ a DC programming approach [6], [34], [45], [46].To this end, the integer constraints in (36) are rewritten in the following DC function forms: where and Now, constraints C6, C8, and C14 are equivalently formulated in continuous form, cf.C6a, C8a, and C14a.
However, constraints C6b, C8b, and C14b are still non-convex, i.e., reverse convex constraints.In order to handle them, we introduce the following lemma.
Lemma 3.For sufficiently large constant values η 1 , η 2 , and η 3 , problem ( 36) is equivalent to the following problem: minimize Proof.Please refer to Appendix A.
Constants η 1 , η 2 , and η 3 act as penalty factors to penalize the objective function for any s j k [m j , n j ] that is not equal to 0 or 1.The remaining sources of non-convexity are the structure of the achievable rate for FBT and the non-convex objective function.In the following, we employ SCA to approximate problem (44) by a convex problem.Subsequently, we propose an iterative algorithm to find a low-complexity solution.
3) SCA: In order to tackle the remaining non-convexity of ( 44), we employ the Taylor series approximation to approximate the non-convex parts of the objective function and constraints C1 and C2.Since H j (s j ), ∀j, −V j k (p j k ), ∀j, and H α (α) are differentiable convex functions, then for any feasible points s j(i) , pj(i) k , ∀j, and α (i) , where the superscript i denotes the SCA iteration index, the following inequalities hold: H j (s j ) ≥ Hj (s j , s j(i) ) = H j (s j(i) ) + ∇ s j H j (s j(i) ) T (s j − s j(i) ), ∀j, and The right hand sides of ( 46), (47), and ( 48) are affine functions representing the global underestimation of , ∀j, and H α (α), respectively, where ∇ s j H j (s j(i) ) and k ) are the gradients of H j (s j ) and V j k (p j k ), respectively.By substituting the right hand sides of ( 46)-( 48) into (44), we obtain the following optimization problem: minimize 49) is a convex optimization problem.To facilitate the application of CVX for solving problem (49), we reformulate the cubic function f 3 k appearing in the cost function and Algorithm 2 Successive Convex Approximation 1: Initialize: Random initial points s u (1) , s d (1) , pu(1) , pd(1) , α (1) .Set iteration index i = 1, maximum number of iterations I max , and penalty factors η 1 > 0, η 2 > 0, and η 3 > 0. 2: Repeat 3: Solve convex problem (51) for given s u(i) , s d(i) , pu(i) , pd(i) , α (i) , and store the intermediate solutions transform it into two equivalent SOC constraints [47].We first define new auxiliary variables ζk , ∀k, to upper bound the cubic function as follows f 3 k ≤ ζk , ∀k.Then, as shown in [47], we can expand f 3 k ≤ ζk , ∀k, to the following equivalent SOC constraints [47]: where θk , ∀k, are new auxiliary variables.Optimization problem ( 49) is transformed into the following equivalent form: minimize and ζ and θ are the collection of optimization variables ζk , ∀k, and θk , ∀k, respectively.Optimization problem (51) is convex because the objective function is convex and can be efficiently solved by standard convex optimization solvers such as CVX [47].Algorithm 2 summarizes the main steps for solving (44) in an iterative manner, where the solution of (51) in iteration (i) is used as the initial point for the next iteration . By iteratively solving (51), Algorithm 2 produces a sequence of improved feasible solutions, which for sufficiently large I max convergence to a local optimum point of problem (44) or equivalently problem (15) in polynomial time, [48], [49].

B. Proposed SCA-Based Suboptimal Scheme 2
For suboptimal scheme 1, we have adopted the Big-M method to linearize non-convex product terms.
However, this method introduced additional optimization variables and constraints, which negatively affect the complexity of Algorithm 2. In this subsection, we reduce the complexity of suboptimal scheme 1 (Algorithm 2).To do so, we first approximate the dispersion in the high SNR regime as follows: which is accurate when the received SNR γ[i], exceeds 5 dB as is typically the case in cellular networks, especially when supporting URLLC [50]- [52].On the other hand, in the low SNR regime, by substituting , we obtain a lower bound on the achievable rate.If the lower bound is used for optimization of the resource allocation in MEC systems, the feasibility of the obtained solution is guaranteed.Hence, exploiting this approximation, we rewrite the dispersion parts for the uplink and downlink in optimization problem (15) as follows: Now, defining pj ∀k, m j , n j , ∀j ∈ {u, d}, as new optimization variables, and rewriting Ṽ j k (s j k , p j k ) in ( 54) as Ṽ j k (s j k ), optimization problem (15) can be transformed as follows: where pj k , ∀j, are the collection of optimization variables pj k [m j , n j ], ∀k, m j , n j , ∀j, pj , ∀j, denote the collection of optimization variables pj k , ∀k, ∀j, and Although F j k (s j k , pj k ) is a concave function, optimization problem (55) is not convex due to the non-convexity of constraints C1, C2, C6, C10, and C14.To deal with non-convex constraints C1 and C2, we define new optimization variables z k , ∀k, and q k , ∀k, and rewrite the constraint equivalently as follows: C2a : Constraints C1b and C2b are rewritten in this form as for the optimal solution s j k [m j , n j ] = (s j k [m j , n j ]) 2 holds.Constraints C1a, C1b, C2b, and C2b span a convex set since constraints C1b and C2b can be represented as SOCs.To deal with constraints C6, C8, and C14 and the cubic function present in optimization problem (55), we use similar techniques as in suboptimal scheme 1.As a consequence, optimization problem (55) is rewritten in the following equivalent form: C8a : and z and q are the collection of optimization variables z k , ∀k, and q k , ∀k, respectively.Optimization problem (60) is convex because the objective function is convex and the constraints span a convex set.Therefore, it can be efficiently solved by standard convex optimization solvers such as CVX [47].Algorithm 3 summarizes the main steps for solving (55) in an iterative manner, where the solution of (60) in iteration (i) is used as the initial point for the next iteration (i + 1).The algorithm produces a sequence of improved feasible solutions until convergence to a local optimum point of problem (55).Unlike Algorithm 2, Algorithm 3 does not provide a local optimum solution to problem (15) because of the approximation of the dispersion term.Nevertheless, Algorithm 3 provides an upper bound on the total system power consumption and the obtained solution is feasible for (15).Moreover, this upper bound becomes tight for sufficiently high SNR, where the approximation in (53) becomes tight, which is likely the case for URLLC applications.

C. Complexity Analysis of Suboptimal Algorithms
In this sub-section, we study the complexity of the proposed low-complexity suboptimal schemes.
1) Suboptimal Algorithm 1: Optimization problem ( 51) is a non-linear convex problem which can be • Local computation (LC): In this scheme, only local computation is employed where each user aims to minimize its local computation power by optimizing its own CPU frequency subject to its delay constraint.The resulting optimization problem is convex and can be solved optimally using convex optimization tools such as CVX [47].
• Edge Only (EO): In this scheme, all URLLC users offload their data to the edge server.The resulting optimization problem is solved using the SCA based algorithm from the conference version [1].
• Fixed sub-carrier assignment (FSA): In this scheme, we fix the sub-carrier assignment for offloading and optimize the remaining degrees of freedom via SCA.We divide the total number of sub-carriers among the users such that their delay and causality constraints are met.This can be done by solving a mixed integer feasibility problem.

B. Simulation Results
In Figs. 2 and 3, we investigate the convergence of the proposed optimal algorithm (Algorithm 1) and the suboptimal algorithms (Algorithms 2 and 3) for different numbers of sub-carriers M u , M d , and different numbers of users K for a given channel realization.We show the total sum power consumption as a function of the number of iterations.As can be observed from Fig. 2, the proposed optimal scheme converges to the global optimal solution after a finite number of iterations.In particular, the optimal scheme converges after 100000 and 170000 iterations for M T = 24 and M T = 32, respectively.For the proposed optimal scheme, the number of iterations required for convergence increases significantly with the number of sub-carriers since increasing the number of sub-carriers increases the dimensionality of the search space.On the other hand, the proposed suboptimal scheme 1 (Algorithm 2) attains a close-to-optimal performance for a much smaller number of iterations.We note that optimization problem (24) has to be solved 2 K times to find the global optimal solution, see Section IV.B.We show in Fig. 2 the solution for the best ᾱ.
In Fig. 2, we chose relatively small values for M j , ∀j, N j , ∀j, and K since the complexity of the optimal algorithm increases rapidly with the dimensionality of the problem.In Fig. 3, we investigate the convergence behavior of the proposed suboptimal schemes for larger values of these parameters.As can be observed from Fig. 3, for all considered combinations of parameter values, the proposed suboptimal schemes require a small number of iterations to converge.In particular, the proposed suboptimal scheme 1 requires at most 4 iterations to converge while the proposed suboptimal scheme 2 requires only 2 iterations.The reason for the faster convergence of the suboptimal scheme 2 is the convexity of the feasible set of the underlying optimization problem (60), while for suboptimal scheme 1, the feasible set of the corresponding optimization problem ( 51) is an approximated convex set, and thus, the algorithm requires more iteration to converge.On the other hand, suboptimal scheme 2 causes a higher power consumption compared to suboptimal scheme 1.The higher power consumption is caused by the approximation of channel dispersion in (53) used for derivation of suboptimal scheme 2 which yields an upper bound on the achievable power consumption.As expected, the convergence speeds of the proposed suboptimal schemes are less sensitive to the problem size and the number of users compared to that of the optimal scheme as they avoid the costly branching operation of branch-and-bound type algorithms.
In Figs. 4 and 5, we investigate the average system power consumption versus the task size of the URLLC users.As expected, increasing the required number of computed bits leads to higher power consumption.This is due to the fact that if more bits are to be transmitted or computed in a given frame, higher SNRs or high CPU frequencies are needed, and thus, the BS and the users have to increase their powers.
In Fig. 4, we compare the performance of the proposed schemes with SC.SC provides a lower bound for the required power consumption of OFDMA-URLLC MEC systems.However, SC cannot guarantee the required latency and reliability.This is due to the fact that, in this scheme, the performance loss incurred by FBT is not taken into account for resource allocation design, and thus the obtained resource allocation policies may not meet the QoS constraints.As can be seen, the proposed suboptimal schemes attain a closeto-optimal performance.Thereby, suboptimal scheme 1 achieves a lower average system power consumption than suboptimal scheme 2 since the latter approximates the dispersion as in (53).On the other hand, as pointed out in Section V.C, suboptimal scheme 2 entails a low computational complexity.Hence, the proposed suboptimal schemes offer different trade-offs between performance and complexity.
In Fig. 4, we chose relatively small values for K, M u , M d , N u , and N d since the complexity of optimal Algorithm 1 increases rapidly with the dimensionality of the problem, cf.Section IV.D.In Fig. 5, we investigate the performance of the proposed suboptimal schemes for larger values of these parameters.As can be seen, the proposed schemes lead to a substantially lower power consumption compared to the FSA, LC, and EO schemes.For the FSA scheme, the poor performance is due to the smaller number of degrees of freedom for resource allocation as this scheme uses a fixed sub-carrier allocation.For the LC scheme, the performance degradation is caused by the limited computation capability of the URLLC users' CPUs.
Moreover, for LC scheme, the local computation is not feasible if the task size exceeds a given value.This is due to the restriction imposed by the maximum CPU frequency f max .The proposed schemes also attain large power savings compared to the EO scheme.This is due to the joint optimization of local and edge computing, while for the EO scheme only offloading is considered.
Moreover, as can also be seen from Fig. 5, for small task sizes, suboptimal scheme 1 causes a lower power consumption than suboptimal scheme 2. This is due to the fact that for small task sizes, the users and the BS transmit with low powers leading low SNRs.In this case, the approximation in (53) which exploited for suboptimal scheme 2 is not accurate.On the other hand, large task sizes force the users and the BS to transmit with high power resulting in high SNRs such that the approximation becomes accurate and both suboptimal schemes have a similar performance.
In Fig. 6, we study the impact of the outer cell radius on the average system power consumption for different resource allocation schemes.As can be observed, increasing the outer cell radius increases the average system power consumption.This is due to the fact that the path loss increases with the distance, and as a result, more power is needed to maintain the same SNR for larger distances.For small outer radii, the performance of the proposed scheme is close to that of the EO scheme, as in this case, the proposed scheme is likely to offload the tasks of the users to the edge server because of the low transmission power needed.
However, as the outer cell radius increases, the path loss increases, and thus the local users are more likely to compute the computation tasks locally to reduce power consumption.In this case, the performance of the proposed scheme approaches that of the LC scheme.Fig. 6 also shows the impact of Γ = Γ k , ∀k, on the system power consumption.As can be seen, the total system power consumption is higher for larger Γ.This is due to the fact that as Γ increases, the size of the computation results to be transmitted in the downlink increases, and the BS has to allocate more power to satisfy the QoS constraint in the downlink.
In Fig. 7, we investigate the impact of the outer cell radius on the offloading probability for the proposed low-complexity scheme 1 and SC for different values of c = c k , ∀k, and Γ = Γ k , ∀k.As can be seen, increasing the outer cell radius reduces the probability of offloading.This is due to the fact that more power is needed to combat the path loss for larger distances, and thus, the users prefer to compute their tasks locally to reduce the total system power consumption.However, as the task complexity increases, i.e., for large numbers of required cycles c, the offloading probability increases.The reason for this behaviour is that as the number of cycles to process one bit increases, the CPU frequency must also increase to process the task within the required latency, and as a result, the local power consumption increases.Fig. 7 also reveals the impact of Γ on the offloading probability.As can be seen, as Γ increases, the offloading probability decreases.This is due to the fact that as Γ increases, the size of the computed results in the downlink becomes larger, and the BS has to allocate more power to satisfy the QoS constraint in the downlink.In this case, the users are more likely to compute their tasks locally in order to limit the total system power consumption which leads to a lower offloading probability.
In Fig. 8, we investigate the effect of different delay requirements and consider three delay scenarios.For delay scenario S 0 , all users have the same delay requirements, i.e., D k = 6, ∀k.For delay scenario S 1 , we have D 1 = D, and D k = 6, ∀k = {2, 3, 4}.For delay scenario S 2 , we have D k = D, ∀k = {1, 2, 3}, and D 4 = 6.In Fig. 8, we show the average system power consumption versus delay parameter D. As can be observed, the average system power consumption decreases with D, which is due to the fact that a larger D increases the feasible set of problem (15) and increases the flexibility of resource allocation.Moreover, the proposed suboptimal scheme attains large power savings compared to the LC scheme, especially, when the users have strict delay requirement.This is due to the limited computation capability of the users.Next, we show that any η 1 ≥ η 1,0 , η 2 ≥ η 2,0 , and η 3 ≥ η 3,0 are optimal solutions for dual problem (63), i.e., η * 1 , η * 2 and η * 3 , where η 1,0 , η 2,0 , and η 3,0 are some sufficiently large numbers.To do so, we show that Θ(η 1 , η 2 , η 3 ) min In summary, due to strong duality, we can use the dual problem (44) to find the solution of the primal problem (36) and any η 1 ≥ η 1,0 , η 2 ≥ η 2,0 , and η 3 ≥ η 3,0 are optimal dual variables.These results are concisely given in Lemma 3 which concludes the proof.

Fig. 1 .
All transceivers have single antennas.The system employs frequency division duplex (FDD) 1 .Thereby, the total bandwidth W is divided into two bands for uplink and downlink transmission having bandwidths W u and W d , respectively.The bandwidths for uplink and downlink transmission are further divided into M u and M d orthogonal sub-carriers indexed by m u = {1, . . ., M u } and m d = {1, . . ., M d }, respectively.The bandwidth of each sub-carrier is B s , leading to a symbol duration of T s = 1 Bs .The uplink and downlink frames are divided into N u time slots indexed by n u = {1, . . ., N u } and N d time slots indexed by n d = {1, . . ., N d }, respectively.Moreover, each time slot contains one orthogonal frequency division multiplexing (OFDM) symbol.Each user has one computation task (B k , D k ) that needs to be processed, where B k is the task size in bits and D k is the time required for computation in time slots.

Figure 1 .
Figure 1.Multi-user MEC system comprising a single BS with an edge server and K URLLC users.

Fig. 1 ,
uplink and downlink transmission overlap in time slot n u = τ + o or equivalently n d = o, ∀o = {1, . . ., Ō}.For the downlink, we need to ensure that for each user k, if overlapping time slot n d = τ + o is allocated to the uplink, no overlapping time slot with n d ≤ o is allocated to the downlink.Exploiting the binary nature of variables s ) can be constructed by dividing optimization variables f , s u , p u , s d , p d , and ζ into two sets.The first set contains the convex variables f and ζ and the non-convex variables p u and p d as the so-called outer variables, while the second set contains the binary variables s u and s d as the so-called inner variables.Furthermore, once p u and p d have been determined, according to (17), (18), we can obtain the values of s u and s d by comparing the values of the entries of p u and p d with zero.If the value of p k [m j , n j ] is greater than 0, this means that the corresponding s k [m j , n j ] = 1, otherwise s k [m j , n j ] = 0.Moreover, for given f , p u , p d , and ζ, problem (25) turns into the following feasibility check problem: minimize s u ,s d 1 (26) b ), ∀b = {1, 2} for B Line 7. Subsequently, we compare the values of the local lower bounds L (i) B,b , ∀b = {1, 2} with the best global value C BV obtained so far.If the local lower bound of one of the two new boxes is greater than C BV , then this box can be removed.On the other hand, if the local lower bound is smaller than

4 :
arg min B∈L Φ(u) and branch it into two new boxes B Feasibility check of the two new boxes: 5: for b = 1 : 2 6: box (L ← L \ B (i) ) and remove infeasible boxes 27:
is the uplink transmit power of user k on sub-carrier m u in time slot n u , and g u k [m u ] = user k on sub-carrier m d in time slot n d is denoted by γ d k [m d , n d ].
C14 is the mode selection constraint.Finally, constraint C15 limits the CPU frequency of the local CPUs to f max .
Remark 2. Resource allocation algorithm design for conventional MEC systems is typically based on Shannon's capacity formula, i.e., V u k (s u k , p u k ) and V d k (s d k , p d k ) in C1 and C2 are absent.The presence of V u k (s u k , p u k ) and V d k (s d k , p d k ) makes optimization problem (15) significantly more difficult to solve but is essential for capturing the characteristics of OFDMA-URLLC MEC systems.Problem ( Bounding and Pruning: Update the set of boxes L for the next iteration of the algorithm 22: sub-carriers in uplink and downlink M = M u = M d M T = 2M=64 Number of time slots in uplink and downlink N u = N d 4