Joint Computation and Communication Resource Allocation with NOMA and OMA Offloading for Multi-server Systems in F-RAN

Since mobile devices typically have limited computation resources, ofﬂoading computation tasks to fog access points (F-APs) is a promising approach to support delay-sensitive and computation-intensive applications. This paper considers joint computation and communication resource allocation for multiuser multi-server systems, which aims to maximize the number of users being served and minimize the total energy consumption subject to delay tolerance constraints. The joint computation and communication resource allocation problem is solved optimally for both non-orthogonal multiple access (NOMA) and orthogonal multiple access (OMA) schemes. The joint user pairing and fog access point assignment problem for NOMA is proved to be NP-hard. For both NOMA and OMA, heuristic and optimal algorithms based on graph matching are designed. The optimal algorithms, though of high complexity, allow NOMA and OMA to be compared at their full potential and serve as benchmarks for evaluating the heuristic algorithms. Simulation results show that NOMA signiﬁcantly outperforms OMA in terms of outage probability and energy consumption, especially for tight delay tolerance constraints and large computational tasks. Simulation results also demonstrate that our proposed NOMA and OMA schemes signiﬁcantly outperform the swap-enabled matching algorithm widely used in the literature.


I. INTRODUCTION
Modern mobile applications, such as augmented reality, face recognition, assisted driving, and interactive gaming, are increasingly computation-intensive and latency-critical [1], [2].As mobile devices have limited power and computation resources, their computation-intensive tasks need to be offloaded to powerful computing nodes for computation.However, offloading the computation tasks introduces high latency and energy consumption [3]- [5].To meet the latency requirement of those applications with a limited power budget, fog radio access network (F-RAN) is a promising network architecture to use.It consists of a cloud server equipped with high storage, computing, and signal processing capabilities [6].The cloud is connected to densely deployed fog access points (F-APs) via wireless The associate editor coordinating the review of this manuscript and approving it for publication was Cunhua Pan .fronthaul links allowing joint processing and cooperation among multiple F-APs.The F-APs are equipped with caching and computing capabilities to bring the network functions close to mobile users.By densely deploying fog access points (F-APs) with high computing capabilities at the network edge, mobile users can offload their computation tasks to nearby F-APs for fast execution and battery power saving [7], [8].
Efficient utilization of the network resources, including both computation and communication, plays a vital role in offloading the computational tasks with low latency and energy consumption.The computation resources include the CPU computational resources at the users and the F-APs.The communication resources may include the transmission power, the channel bandwidth, and the assignment of F-APs to the users in multi-server systems.Non-orthogonal multiple access (NOMA) is one of the promising communication schemes for computation offloading as it increases the spectrum efficiency and system capacity.NOMA adopts superposition coding at the transmitter and successive interference cancellation (SIC) at the receiver, which allows multiple users to transmit on the same time and frequency resource [9], [10].In contrast, OMA shares the resources via time or frequency division multiple access (TDMA or FDMA).
Computation offloading is mainly divided into two categories: full offloading (a.k.a.binary offloading) and partial offloading.In this study, we focus on computationally intensive tasks that require a high number of CPU cycles for execution, and their data bits are dependent (i.e., no partitioning is allowed).Hence, the computational tasks need to be fully offloaded for execution.Computation offloading has been widely studied for single-server and multi-server systems.There are different problem formulations, and the objectives are minimizing the energy consumption under a delay constraint, minimizing the delay with a given energy budget [11], [12], or minimizing both delay and energy [13]- [15].The key issue is to strike a balance between energy consumption and computation delay.
For single-server systems, the work in [16], [17] studied orthogonal multiple access for computation offloading.The authors in [16] focused on minimizing both the delay and energy consumption for all mobile users.The study in [17] considered minimizing the energy consumption under a delay tolerance using TDMA and orthogonal frequency division multiple access (OFDMA).The authors in [18]- [22] considered joint computation and communication resource allocation using NOMA.The resource allocation of the subchannels, the transmission power, and the computational resource are solved separately in [18].The studies in [19], [20] jointly optimized the resources, but the proposed NOMA user pairing and sub-channel assignment schemes have very high complexity.Besides, the OMA scheme discussed assumed fixed bandwidth allocation.The main weakness of the studies in [16]- [22] is that only a single server is considered, which does not reflect the real deployment of mobile networks.
For multi-server systems supporting multiple users, the authors in [23], [24] investigated the use of OFDMA for computation offloading.The work in [25]- [28] investigated the problem of jointly optimizing the radio and computation resources using NOMA.The studies in [25], [26] investigated minimizing the energy consumption under a delay tolerance constraint.However, the authors in [25] assumed that user association to the F-APs has already been done before allocating the computation resources and transmission power.This assumption degrades the system performance as jointly optimizing the association of the users to the F-APs with the computation resources, and transmission power can further reduce the energy consumption.The work in [26] focused on forming collaborative clusters of F-APs to serve the users, which significantly increases the computational complexity of the problem.The authors in [27] aimed at maximizing the energy efficiency under a delay tolerance constraint.The work in [28] focused on minimizing the total delay under the power budget at the users and base stations.Both studies incorporated a swap-enabled matching algorithm to solve the NOMA resource allocation problem.
Unlike single server systems studies, we considered multi-server systems and investigated the problem of user pairing and F-APs assignment.We tackle the problem of joint communication and computation resource allocation in a multi-user multi-server system.Our objective is to maximize the number of served users such that the total energy consumption from the users is minimized, subject to delay tolerance constraints.By exploiting the convexity of the sub-problems of the formulated problem, both optimal algorithms and low-complexity sub-optimal algorithms are proposed.Our numerical results demonstrated the outperformance of our proposed heuristic over the swap-enabled matching proposed in [22], [27], [28].Our main contribution can be summarized as follows: 1) For NOMA offloading, the joint communication and computation resource allocation is investigated and solved optimally.For pairing the users and assigning them to the F-APs, a framework based on hypergraph theory is proposed.The problem is proved to be NP-hard, and a low-complexity heuristic algorithm is proposed.Simulation results show that its energy and outage performance is close to the optimal solution obtained by binary integer programming (BIP), which has high complexity.Besides, it outperforms the swap-enabled matching algorithm used in the literature.2) For OMA offloading, a heuristic for F-AP assignment based on weighted bipartite graph matching is proposed.After associating users to F-APs, the computation resources are allocated optimally via one-dimensional search, and the bandwidth allocation problem at each F-AP is shown to be convex and can be efficiently solved.Simulation results show that our proposed method performs close to the optimal OMA solution obtained by BIP.Moreover, it outperforms the stable matching algorithm used in the literature.
3) The performance of NOMA and OMA is compared.
Simulation results show that NOMA yields much lower outage probability and total energy consumption than OMA, especially when the task delay tolerance is short and the computation requirement is high.The same conclusion holds for both optimal solutions and both heuristics.For example, if the task delay tolerance is 470 ms, the NOMA heuristic reduces the total energy consumption of the OMA heuristic by 85.9%.If the required computation is 40 × 10 7 CPU cycles, the NOMA heuristic reduces the total energy consumption of the OMA heuristic by 78%.Simulation results also demonstrate that our proposed NOMA and OMA schemes significantly outperform the swap-enabled matching and stable matching algorithms widely used in the literature.The rest of the paper is organized as follows.In Section II, we state our system model and present our uplink schemes.In Section III, we state our problem formulation and present our proposed solutions.In Section IV, we investigate the user pairing and F-AP assignment problem for NOMA and OMA uplink schemes.Section V provides our simulation model and results.Finally, we conclude the paper in Section VI.

II. SYSTEM MODEL
In this section, we first describe the network architecture and the channel model for an F-RAN.Then, we will present two multiple access schemes and their achievable rates.

A. NETWORK ARCHITECTURE AND CHANNEL MODEL
Consider the uplink transmission in an F-RAN, which consists of a cloud server, M F-APs each equipped with L antennas and a computing server, and N users each with single-antenna devices as shown in Fig 1 .Denote the index sets of F-APs, antennas and users by M {1, 2, . . ., M }, L {1, 2, . . ., L} and N {1, 2, . . ., N }, respectively.We assume that adjacent F-APs operate on different frequency bands to avoid inter-cell interference.Each F-AP has channel bandwidth W and a central processing unit (CPU) with computational capacity F CPU cycles per second.Each user n ∈ N transmits with power p n and has a maximum transmit power P max .Each user has a computational task of parameters (S n , C n , D n ), where S n is the computational task size in bits, C n is the number of CPU cycles required to complete the execution of the task, and D n is the task delay tolerance in seconds.Due to the limited computational capability and battery power of users, their computational tasks are offloaded to the F-APs for execution, where task splitting is not allowed.Since the F-APs have higher transmit power than the users and the size of a computational result is typically much smaller than the corresponding uploaded data, we ignore the download time.The execution time of a task on an F-AP is equal to C n /f n , where f n (cycle/s) is the CPU frequency allocated to user n.Thus, the sum of the offloading time and the execution time has to be no larger than D n .
We consider a time-slotted system and adopt the block fading channel model.During a time slot, all channel gains stay constant and the users offload their computational tasks in parallel to their associated F-APs.The uplink channel is thus modeled as a time-invariant Gaussian channel with additive white Gaussian noise vector z m ∼ CN (0, σ 2 I L ) at each F-AP m, where σ 2 = ℵ 0 W and ℵ 0 /2 is the noise power spectral density.The channel gain vector between user n and F-AP m is given by h m n ∈ C L×1 for n ∈ N and m ∈ M. Let the L × N matrix H m [h m n ] be the channel gain matrix at each F-AP m.For convenience, important symbols are summarized in Table 1.

B. UPLINK MULTIPLE ACCESS SCHEMES
We consider two uplink transmission schemes, NOMA and OMA, for computational task offloading.Following the 3GPP standard [29], we allow at most two users to be superposed in NOMA, which can reduce error propagation and decoding complexity of SIC.To offload the tasks from the N users, we pair them up and assign each pair to an F-AP, where an F-AP serves at most a pair of users.For comparison purpose, we also assume for OMA offloading that two users are assigned to an F-AP.
where H m [n, n ] is the submatrix obtained from H m by preserving the n-th and n -th columns with the n-th column putting as the first column and the n -th column the second column.In addition, x m ( √ p n x n , √ p n x n ) represents the transmit signals of the two users.Note that we use the notation (a 1 , a 2 , . . ., a l ) to represent a column vector of length l.
To decode their signals successfully at F-AP m, the linear decorrelator receiver with successive interference cancellation (SIC) is used, where we assume L ≥ 2 for successful decorrelator decoding.Consider the orthogonal complement of the one-dimensional space spanned by h m n .Let Q m be an (L − 1) × L matrix, the rows of which form an orthonormal basis of its.In particular, when L = 2, Q m degenerates to a row vector given by The decorrelator projects h m n onto it by Q m h m n and then uses a matched filter for detection, yielding an SNR of Q m h m n 2 p n /σ 2 .After detecting user n's signal, the receiver applies SIC to subtract it from the received signal.Therefore, the signal of user n (i.e., the weak user) can be decoded without interference.The achievable data rates of users n and n are, respectively, 2) OMA Assume users n and n are paired and assigned to F-AP m.The system bandwidth at F-AP m is divided into two sub-channels with bandwidth α n W , α n W for users n and n , respectively, such that α n + α n = 1.Each user offloads his computational task on a distinct sub-channel.Thus, the achievable data rate of any user, say n, on F-AP m is The objective of this work is to first maximize the number of served users and then minimize the total energy consumption, subject to the delay tolerance constraints.We divide the problem into two sub-problems the resource allocation problem (i.e., power and computation) and the user pairing and F-APs assignment problem.In the next section, we consider the subproblem in which a pair of users has been assigned to an F-AP, and the objective is to minimize the transmission energy.Afterward, we will discuss how the N users are paired and assigned to the M F-APs.

III. OPTIMAL PAIRWISE RESOURCE ALLOCATION
Given the user pairing and their assignment to the F-APs, the resource allocation problem for users (n, n ) on F-AP m can be formulated as follows: where p (p n , p n ), f (f n , f n ) are, respectively, the power and CPU frequency allocation vectors for paired users n and n , and ''Scheme'' refers to OMA or NOMA.When a result applies to both OMA and NOMA, we may skip the superscript for brevity.Note that E m (n, n ) is the minimum transmit energy required when assigning users n and n to F-AP m.
Lemma 1: There exists an optimal solution to (4), and at any optimal solution, C1, C2, and C3 must all hold with equality.
Proof: See Appendix VI-A.The resource allocation problem in (4) can be reduced to a power allocation problem parameterized by a single variable f n .To see this, suppose f n is a fixed value.Since C3 must hold with equality, we can set In other words, C1 and C2 reduce to where r n and r n are the minimum required data rates for task offloading.Given a value of f n , we call the minimization problem in (4) with C1 , C2 , C4, and C5 the power allocation subproblem.It is clear that C1 and C2 must hold with equality at an optimal solution.If the power allocation subproblem can be solved, then the whole problem can be solved by determining the optimal value of f n within the interval [C n /D n , F].For the NOMA case, the optimal value of f n can be obtained by convex optimization techniques while for the OMA case, it can be obtained by one-dimensional grid search.

A. NOMA RESOURCE ALLOCATION
For NOMA, the optimal solution to the power allocation subproblem can be explicitly found: ( Based on ( 4) and ( 5), we can obtain the minimum sum energy as a function of f n , denoted by Since the explicit form of E * NOMA (f n ) is known, and the function is convex, the optimal value of f n can be efficiently determined by standard convex optimization techniques.

B. OMA RESOURCE ALLOCATION
First, assume f n is given and f n = F −f n .For OMA, according to (3), the optimal power of any user, say n, is given by where α n can be optimized to minimize the total energy consumption.Assume users n and n are paired and assigned to F-AP m.According to (6), for i ∈ {n, n }, we define which is the power of user i, expressed as a function of a bandwidth sharing factor α ∈ (0, 1].The channel bandwidth optimization subproblem (OMA-OPT) can be formulated mathematically as follows: where is an arbitrarily small positive constant.Theorem 3: The OMA-OPT problem is convex.Proof: See Appendix VI-C.To jointly optimize the computation and communication resources, we may use, for example, the one-dimensional grid search to find the optimal value of f n within the interval [C n /D n , F].For each value of f n , we obtain Dn , Dn , r n and r n , from the constraints C1 and C2 in problem (4).Afterward, we can solve the bandwidth allocation problem, OMA-OPT, to optimize the allocated bandwidth and power.By Lemma 1, the energy minimization problem for the OMA scheme is reduced to minimizing the following single-valued function: where p * n and p * n are obtained for each value of f n by solving the convex subproblem OMA-OPT.Empirically, we found that E * OMA (f n ) is unimodal in f n , as shown in Fig. 2 for a particular instance.Thus, the golden section search method, which is an efficient searching technique, can be used [30].Our simulation results, to be presented in a later section, show that the golden section search has the same performance as the more time-consuming one-dimensional grid search.

IV. USER PAIRING AND ASSIGNMENT PROBLEM
In this section, we consider the user pairing and assignment optimization problem for both NOMA and OMA.For the NOMA scheme, it is related to the well-known problem called weighted 3-uniform hypergraph matching.While both NOMA and OMA can be solved optimally by binary integer programming (BIP), we propose heuristic algorithms to reduce the computation time.Throughout this paper, we assume M is proportional to N .

A. NOMA MATCHING
First, we formulate the user pairing and assignment problem for the NOMA scheme as a hypergraph matching problem.The following concept from graph theory is needed [31]: ) is a set of vertices V and a collection E of subsets of the vertex set called hyperedges with a weight w e assigned to each e ∈ E. A hypergraph is called k-uniform if each hyperedge has exactly k vertices.The degree of a vertex is the number of hyperedges to which it belongs.A matching on H is a subset of E, in which every two hyperedges are disjoint.
We construct a hypergraph H = (V, E) such that the vertex set V = N ∪ M represents the union of the set of users and the set of F-APs.The subset {n, n , m} ⊆ V, where n, n ∈ N (with n = n ) and m ∈ M form a hyperedge if users n and n can be paired and assigned to F-AP m and their computational tasks can be offloaded and executed within the delay tolerance.The associated energy is given by E m (n, n ).Let E max be the maximum energy among all hyperedges.A non-negative weight w {n,n ,m}

The decision version of the problem is formally stated as follows:
Problem: NOMA-Matching Instance: A 3-uniform hypergraph H(V, E), where V = N ∪ M and E consists of {n, n , m} if it is feasible to pair up users n and n and assign them to F-AP m under NOMA scheme with L antennas at each F-AP.
Question: Given an integer k, is there a matching with size greater than or equal to k?
We are going to show that it is NP-complete by reduction from the classical NP-complete problem called 3dimensional matching, stated below: Problem: 3D-Matching Instance: Three finite disjoint sets, X , Y, and Z are given.In addition, T , a subset of X × Y × Z, is also given.
Question: I ⊆ T is called a 3-dimensional matching if for any two triples (x 1 , y 1 , z 1 ), (x 2 , y 2 , z 2 ) ∈ I, we have x 1 = x 2 , y 1 = y 2 , and z 1 = z 2 .Given an integer k, is there a 3dimensional matching with size greater than or equal to k? Theorem 4: NOMA-Matching is NP-complete.Proof: Given an instance of 3D-Matching, let N = X ∪ Y and M = Z.Let F be very large so that the computation delay can be neglected.Furthermore, let all users have the same value for S n /D n so that they all have the same minimum rate requirement, i.e., r n = r for all n ∈ N .The sets X and Y are interpreted as the set of weak users and the set of strong users, respectively.Let L = D + 2, where D |X × M|.Furthermore, let {h m n : n ∈ X , m ∈ M} be the natural basis for the subspace of the first D dimensions, and let the (D+1)th component of each of them be 1.Divide each of them by √ 2 such that they all become unit-norm vectors.Following the second equation in ( 5), we let P max = σ 2 (2 r/W − 1) so that a weak user has just enough power to communicate with any F-AP and any two users in X cannot be paired up since their channel gain vectors are non-orthogonal.For each n ∈ Y and m ∈ M, let h m n = 1 + δ, where δ is an arbitrarily small positive constant.Note that only the norm of h m n is fixed while the exact vector is to be determined.Let the (D + 2)-th component of each of them be non-zero so that any two users in Y cannot be paired up.
For each n ∈ Y and m ∈ Z, we check whether (n , n, m) ∈ T for each n ∈ X .If so, we let h m n be any vector orthogonal to h m n , so that there is no mutual interference between them; the use of decorrelator is not needed and the setting of P max is large enough to assign the pair (n , n) to F-AP m.Otherwise, we let h m n be any vector non-orthogonal to h m n , so that the decorrelator will reduce the effective SNR of user n, making it infeasible to pair up users n and n since δ is arbitrarily small.With such an assignment, it is clear that users n and n can be paired up and assigned to F-AP m if and only if (n , n, m) ∈ T .It is also clear that the answer to the given instance of 3D-Matching is Yes if and only if the answer to the constructed instance of NOMA-Matching is Yes.Hence, NOMA-Matching is NP-complete.
Remark 1: Since the decision version of our problem is NP-complete, the corresponding optimization version, which aims to maximize the number of served users, is NP-hard.
In general, there are many solutions that can maximize the number of served users.Among those solutions, we would like to find the one that minimizes the total transmit energy.We now show that this problem can be reduced to the maximum weighted hypergraph matching problem, which aims to find a matching in a given weighted hypergraph that maximizes the sum of weights.To perform the reduction, we add a large constant value to the weights of all hyperedges.The new weight becomes w {n,n ,m} = w {n,n ,m} + Q for each {n, n , m} ∈ E. Since all hyperedges have large weights, the maximum weighted matching must be a matching of maximum cardinality.
The maximum weighted hypergraph matching problem can be formulated as a BIP problem [32] as follows: max {x e :e∈E} e∈E w e x e (8) subject to where the objective function in (8) represents the sum of the weights of the chosen hyperedges in the matching, x e is a binary decision variable indicating whether the hyperedge e belongs to the matching, and δ(v) ⊆ E is the set of hyperedges containing vertex v.The problem can be solved optimally via the branch and bound (BnB) method [32], but the computational complexity grows exponentially with the problem size.
To solve the problem with low computational complexity, we propose a heuristic algorithm to find a suboptimal solution.It works as follows: First, we assign to each F-AP the user with the best channel gain to it and remove the user from the set of users.The set A denotes the set of paired F-APs and users, while the set N denotes the set of users that have not been assigned to any F-AP.Next, we form a weighted bipartite graph G (M, N , E ).There exists an edge (m, n ) ∈ E between F-AP m and user n if user n can be paired with user n assigned before to F-AP m and both can offload and execute their computational tasks within the delay tolerance.The weight of each edge is defined similarly as the weight for each hyperedge.Afterward, we find the maximum weighted matching W for the bipartite graph.Finally, the solution to the optimization version of NOMA-Matching is given by U The pseudo-code is described in Algorithm 1.The for-loop in Step 2 runs for O(MN ) times.Constructing the weighted bipartite graph in Step 7 takes O(MN ).The maximum weighted matching in Step 8 can be optimally solved by the Hungarian algorithm [33] in O(N 3 ).Step 9 takes O(M ) times.The overall time complexity of Algorithm 1 is O(N 3 ).

B. OMA MATCHING
For the OMA scheme, the problem can also be formulated as maximum weighted hypergraph matching and solved by BIP.The weight of each hyperedge (m, n, n ), if feasible, can be obtained by allocating the computation resources via one-dimensional grid search or golden section search and optimizing the bandwidth allocation OMA-OPT.Due to its high complexity, we propose a heuristic as follows: First, we set f n = f n = F/2, α n = α n = 0.5 for all possible pairs of users (n, n ) associated with any F-AP, which is the same as dividing the computation resources and the channel bandwidth into two equal halves at each F-AP.Then, we create a set F of cardinality M , which represents a set of virtual F-APs replicating the original F-APs.A weighted bipartite graph G(N , M , E) is then constructed, where N represents the set of users and M M ∪ F represents the set of the original F-APs and the replicated F-APs.The set of edges is defined as Dn be the transmit energy associated with each edge (n, m) and E max be the maximum energy among all edges.A non-negative weight is assigned to each edge (n, m) ∈ E. To find the association of users to the F-APs such the number of served users is maximized and the total energy consumption is minimized, the weight of each edge is re-defined to be The optimal association under the bipartite graph can be obtained via maximum weighted matching in O(N 3 ).After we obtain the association of the users to the F-APs, for each association, we allocate the computation resources via one-dimensional grid search or golden section search and solve the OMA-OPT problem by applying the CVX solver to optimize the allocated bandwidth between the associated users.
Remark 2: Our proposed hypergraph framework can be generalized to group any number of users by applying kdimensional matching [34], where k − 1 is the number of users in a group.It also can be applied to assign more than one group of users to an F-AP by dividing the bandwidth of each F-AP into multiple sub-carriers and considering each subcarrier a virtual F-AP.The computation resources, in this case, can be divided equally between the sub-carriers.

V. SIMULATION MODEL AND RESULTS
In this section, simulation results are presented to evaluate the performance of our proposed schemes.We consider a single cell, in which the F-APs and users are randomly distributed inside the cell according to the homogeneous Poisson point process (PPP).The fading models and simulation parameters are listed in Table 2 [35].We consider the same task size and task delay tolerance for all users.We assume the number of users doubles the number of F-APs, i.e., N = 2M .A user is said to be in outage if his computational task cannot be offloaded and executed within the delay tolerance.Outage probability is defined as the average proportion of users that are in outage.The total energy consumption in the system is taken over those instances in which no users are in outage for all schemes.The results are obtained by averaging over 500 random realizations.The schemes proposed in [27], [28] are used for comparison with our proposed NOMA and OMA schemes, respectively.The swap-enabled matching algorithm is used in both studies to solve the user pairing and F-APs assignment problem.The main idea behind the swap-enabled matching is to find a stable matching, where the users are assigned to the F-APs such that no user prefers another F-AP than the one currently assigned to him, and the F-APs do not prefer other users.However, in NOMA, the energy consumption of users depends on the other users assigned to the same F-AP.Thus, after finding the initial stable matching, a swap stage is executed to try every possible swap between the users and accept the swap that reduces the energy consumption of both users without increasing the energy consumption of other users.
For fair comparison with [27], we set the number of sub-channels on each F-AP to one and ignore the inter-cell interference term in equation (1).The optimal power and computation resource allocation method used under our scheme is adopted.The reason is that the proposed resource allocation method in [27] relies on the non-convexity of the power allocation due to the interference term, which is not considered in our work.For fair comparison with [28], we assume that each user has only one task to be offloaded, the downlink data is set to zero, each F-AP has two subchannel, and the computation resources are divided equally between the paired users.Their proposed scheme is adopted to be compared with our OMA scheme, where the energy consumption of the users is independent of the other users.Thus, the swap stage is not needed, and the stable matching stage only is executed.
Fig. 3 shows the performance of both one-dimensional grid search and golden section search.As we can see, both searching methods have the same performance, which shows empirically that the energy minimization problem in (4) under the OMA scheme is unimodal.Fig. 4 compares the outage performance of NOMA and OMA schemes.As we can observe, both proposed NOMA schemes significantly reduce the outage probability compared with the proposed  OMA schemes.The reason is that NOMA allows each user to transmit over the whole channel bandwidth while OMA shares the channel bandwidth between the users.Thus, the users under NOMA can achieve the required data rates with lower transmit power by sending their data over a larger bandwidth.As we can see, our proposed heuristic NOMA scheme, with low computational complexity, gives a reasonably good performance compared with the optimal solution obtained by the BIP.For OMA, both fixed and optimized bandwidth and computation allocation schemes have the same outage as both use bipartite matching for user assignment.
As we can also observe in Fig. 4, the schemes in [27], [28] give high outage probability compared to our proposed schemes as the authors did not consider maximizing the number of served users in their problem formulation and proposed solution as in our work.Besides, our proposed OMA schemes and heuristic NOMA implemented weighted bipartite matching, which solves the subproblem of maximizing the number of served users and minimizing the energy consumption optimally compared to the swap-enabled matching.Fig. 5 demonstrates the task delay tolerance effect on the total energy consumption.As we can see, as the task delay tolerance increases, the required data rates are low, so the total energy decreases.As we can observe, NOMA schemes outperform the OMA schemes in minimizing the total energy consumption.Moreover, NOMA schemes are more beneficial when the task delay tolerance is short.For example, if the task delay tolerance is 0.47 sec, the BIP NOMA, the heuristic NOMA, and the swap-enabled matching schemes outperform the BIP OMA scheme by 83.5%, 79.6%, and 50% respectively.For OMA, optimizing the bandwidth and computation resource allocation gives better performance than allocating fixed bandwidth and computation resources to the users as it adapts the shared channel bandwidth and computation resource based on the channel conditions of users.Besides, our proposed optimized OMA scheme gives a reasonably good performance when compared with the optimal BIP OMA scheme.At 0.47 sec, the optimized OMA scheme increases the energy consumption by 7% only.
As we can also see in Fig. 5, although the power and computation resources are allocated optimally under both heuristic NOMA, and the scheme proposed in [27], our heuristic outperforms the swap-enabled matching algorithm.The reason is that our heuristic implemented weighted bipartite matching, which solves part of the assignment problem optimally.Similarly, under OMA, the bipartite matching solves the assignment problem optimally compared to the stable matching.
Fig. 6 shows the total energy consumption versus the F-AP computational capacity.We can observe that as the computational capacity increases, the total energy consumption decreases.This is because more CPU cycles can be allocated, which reduces the computation time, allowing more time for offloading and thus reducing transmission energy.We can also observe that both NOMA schemes outperform the OMA schemes in minimizing the total energy consumption.The reason is as explained before.Besides, NOMA offloading efficiently reduces the energy consumption when the computation capacity is limited compared to OMA offloading.Fig. 7 demonstrates the total energy consumption versus the task computation requirement.Obviously, as the number of CPU cycles required to execute a task increases, the total energy consumption increases.The reason is that more time is required to execute the task, which in turn tightens the offloading time and increases the energy required for offloading the tasks.Moreover, NOMA offloading is more beneficial when the number of CPU cycles required to execute a task is high.
Fig. 8 shows the total energy consumption versus the number of antennas.Obviously, as the number of antennas increases, the total energy consumption decreases due to the power gain obtained from the spatial diversity of antennas.Moreover, NOMA offloading is more beneficial when the number of antennas is small.

VI. CONCLUSION
For multiuser multi-server computation offloading, the problem of joint computation and communication resource allocation for maximizing the number of served users and then minimizing the total energy consumption using NOMA within delay tolerance constraints is proved to be NP-hard.An optimal scheme using binary integer programming is used as a benchmark, and a low-complexity heuristic algorithm is proposed to solve the problem.The simulation results show that our proposed NOMA heuristic performs close to the optimal solution and significantly outperforms the OMA schemes.Besides, the OMA bandwidth allocation problem is proved to be convex, and a low complexity scheme with reasonably good performance is proposed.Simulation results also demonstrated that our proposed NOMA and OMA schemes significantly outperform the swap-enabled matching algorithm implemented in the literature.
Striking a balance between minimizing the energy consumption and delay is of great interest; thus, further study on multi-objective optimization for computation is needed.Besides, investigating how to jointly optimize the communication and computation resource allocation assuming computational capability at mobile users is an interesting topic that deserves much attention in a future study.

A. PROOF OF LEMMA 1
Proof: Since the constraint set is compact and the objective function is continuous in p and f , by the Weierstrass Theorem [36, Theorem 3.1], an optimal solution exists.Now we prove by contradiction that C1, C2, and C3 must hold with equality.Let f * n , f * n , p * n and p * n be an optimal solution.Assume that C1 does not hold with equality.Since p n /R n is a strictly increasing function in p n , we can reduce p * n to a smaller value pn such that the objective function value becomes smaller without violating C1.That shows that C1 must hold with equality.By the same argument, C2 must also hold with equality.Finally, suppose C3 does not hold with equality at the optimal solution.Then we can increase both f * n , f * n , violating that C1 and C2 must hold with equality.

B. PROOF OF THEOREM 2
Proof: Since the constraints C1, C2 and C3 hold with equality, we have For the optimization problem to be feasible, we must have C n /D n < f n < F − C n /D n .Differentiating each function twice, we obtain showing that they are convex functions of f n .If ax ≥ 1, it is obvious that h (x) ≥ 0. Suppose ax < 1.We claim that (1 − ax)e ax ≤ 1.
Rewrite ( 9) as , where • denotes function composition.Since h(x) is convex and non-decreasing while R n (f n ) and R n (f n ) are both convex, each of the two terms in the right-hand side is a convex function of f n [38, p. 84].Since b, b ≥ 0 and non-negative weighted sum preserves convexity, the function E * NOMA (f n ) is convex in f n .

C. PROOF OF THEOREM 3
Proof: First, we need to check that the constraints form a convex set.The constraints C1 and C2 are linear.Thus, we need to check whether the region constrained by C3 is convex or not.Differentiating p i (α) twice, we obtain since α > 0. Hence, the feasible region is convex.Since the offloading time of tasks Dn and Dn are positive and non-negative weighted sum preserves convexity, the objective function is strictly convex.
Assume users n and n are paired and assigned to F-AP m, and user n has higher channel gain than user n , i.e., ||h m n || 2 ≥ ||h m n || 2 .The paired users (n, n ) transmit to F-AP m on the whole channel bandwidth concurrently.The received signal at F-AP m is thus

FIGURE 2 .
FIGURE 2. Empricial results show that E* OMA (f n ) is unimodal in f n .A typical instance is shown in this plot.

TABLE 1 .
Summary of notation.