Joint Optimization of Uplink Power and Computational Resources in Mobile Edge Computing-Enabled Cell-Free Massive MIMO

The coupling of cell-free massive MIMO (CF-mMIMO) with Mobile Edge Computing (MEC) is investigated in this paper. A MEC-enabled CF-mMIMO architecture implementing a distributed user-centric approach both from the radio and the computational resource allocation perspective is proposed. A multi-objective optimization problem (MOOP) for the joint allocation of radio and remote computational resources is formulated, aimed at striking an optimal balance between total uplink power minimization and sum spectral efficiency maximization, under resource budget and latency constraints. In order to solve such a challenging non-convex problem, we convert the MOOP to an equivalent single-objective optimization problem (SOOP) through the weighted sum method and propose an iterative algorithm based on alternating optimization and sequential convex programming, along with an alternative heuristic resource allocation for distributed networks. Finally, we provide a detailed performance comparison between the proposed MEC-enabled CF-mMIMO architecture with its co-located counterpart, and its small-cell implementation. Numerical results reveal the effectiveness of the proposed resource allocation scheme, under different access point selection strategies, and the natural suitability of CF-mMIMO in supporting computation-offloading applications with benefits over users' transmit power and energy consumption, the effective latency experienced, and the computation offloading efficiency.


I. INTRODUCTION
T HE RECENT evolution of wireless networks has been characterized by an impressive growth not only of the amount of conveyed data traffic, but also of computationallyintensive applications with strict latency requirements for mobile devices.Applications such as online gaming, augmented reality and video image processing not only request extreme broadband connections, but also a considerable amount of computational power at the mobile devices.A possible approach to indirectly increase the computing capabilities of This paper was supported by the Italian Ministry of Education University and Research (MIUR) Project "Dipartimenti di Eccellenza 2018-2022" and by the MIUR PRIN 2017 Project "LiquidEdge".An excerpt of this article has been published in the proceedings of the 2022 IEEE International Conference on Communications (ICC) [1].The authors are with the Department of Electrical and Information Engineering (DIEI) of the University of Cassino and Southern Lazio, 03043 Cassino, Italy (e-mail: giovanni.interdonato@unicas.it, buzzi@unicas.it),and with the Consorzio Nazionale Interuniversitario per le Telecomunicazioni (CNIT), 43124, Parma, Italy.S. Buzzi is also affiliated with Politecnico di Milano, Milano, Italy, and his work was also supported by the European Union under the Italian National Recovery and Resilience Plan (NRRP) of NextGenerationEU, partnership on "Telecommunications of the Future" (PE00000001 -program "RESTART", Structural Project 6GWINET).
the devices and prolong their battery lifetime is to (either fully or partially) delegate their computational tasks to the network, specifically to network entities known as network edge servers 1 , in charge of collecting, processing and feeding data back to the users in a centralized fashion.This approach is known as mobile edge computing (MEC) or mobile-edge computation offloading [2]- [6].
Cell-free massive multiple-input multiple-output (CF-mMIMO) is the ultimate embodiment of network MIMO [7]- [9].CF-mMIMO is a technology based on the use of several distributed low-complexity access points (APs) that jointly serve the active users in their coverage area.It inherits all the outstanding features of co-located massive MIMO [10], [11], such as nearly-optimal linear signal processing, predictable accurate performance, and simplified resource allocation and channel estimation, while providing additional key ingredients to theoretically achieve unprecedented levels of uniform data rates and ubiquitous connectivity: macro-diversity gain, intercell interference mitigation, and user proximity (see e.g.[12,Chapter 3.2] and references therein).Moreover, it is also amenable to scalable user-centric implementations [13]- [15].
In this paper, we investigate the promising marriage between CF-mMIMO and MEC which share the same principle of bringing the resources (radio and computing, respectively) closer to the user.CF-mMIMO, thanks to its dense distributed topology and user-centric architecture, may greatly facilitate the computation offloading by enabling mobile devices to delegate either all or part of their computational tasks to multiple APs, each of which may be equipped with an edge server.Moreover, the central processing unit (CPU) of a CF-mMIMO system, which is generally equipped with a more powerful server, may serve as a backup edge computing to give, in turn, computation offloading support to the APs.User proximity and the macro-diversity may significantly shorten the delay due to the computation offloading, thereby supporting stricter latency requirements, and reduce user's power consumption.Moreover, the user-centric approach ensures more uniform spectral efficiency (SE) and thereby the access to the remote computational resources may be indiscriminately granted to every user.The ability of the network to accomplish users' computation offloading depends on how the radio and remote computational resources are allocated.This coupling calls for a joint optimization which is the main subject of this study.
Related Works.Many works on MEC optimize the interplay between the amount of computational tasks to offload, the latency due to the offloading process, and the energy consumption of mobile devices.First studies on MEC assumed simplified system models by considering either single-user systems [16]- [18] or interference-free multi-user systems [19], [20], focusing either on minimizing the energy consumption under latency constraints [20] or the delay due to the computation offloading under energy consumption constraints [21].Some of these works consider a binary computation offloading model, wherein each device executes its computational tasks either remotely or locally [16], [20].Other studies assumed a more general partial computation offloading [17], [19], [22] with only a fraction of computational tasks executed remotely.An integrated framework for computation offloading and interference management in cellular networks was proposed in [23].All these works assumed single-antenna base stations (BSs).
More recently, MEC has been studied in conjunction with MIMO technologies.As an example, [4] considers a multi-cell MIMO system served by an edge server in a centralized fashion, and formulates a total energy minimization problem under latency and minimum rate constraints.Similarly, the MEC solution proposed in [24] focuses on minimizing the maximum latency of all the devices in a cloud-radio access network (C-RAN) with MIMO technology, while [25] addresses the energy minimization problem accounting for imperfect channel state information (CSI) in a single-cell MIMO system.In [26] an optimal association of mobile users to MEC resources is devised for a multi-user MIMO system with C-RAN architecture.In [27] a successive inner convexification framework to minimize the total transmit power of the devices under latency constraints is proposed.Several authors have also examined the coupling between MEC and massive MIMO.The work [28] proposes a low-complexity algorithm to jointly optimize the radio and computing resources for a massive MIMOenabled heterogeneous network with MEC.Similarly, [29], [30] presented the effectiveness of employing massive MIMO for MEC, under zero-forcing combining, aiming at minimizing the maximum delay for offloading and computing among the devices.The paper [31], instead, considers a massive MIMO system operating at the millimeter-wave (mmWave) frequency bands underlying traditional wireless local area networks with MEC.A dynamic computation offloading in MEC for ultra-reliable and low-latency communications at the mmWave frequency bands is proposed in [32].The main common conclusion of these works is that multiple users can simultaneously offload their computational tasks by leveraging the additional degrees of freedom provided by massive MIMO, and the offloading efficiency, as well as the energy saving of the mobile devices, grows with the number of antennas of the massive MIMO BS.
Finally, the performance of CF-mMIMO with MEC has been recently explored in [33] and [34].The former proposes a joint optimization of the partial offloading ratio per task and the resulting computational resources allocated at a single MEC server to minimize either the aggregated latency or the energy consumption at each user.The author in [34] investigate the successful edge computing probability (SECP) for a target computation latency by using queuing theory and stochastic geometry, and by considering a random computation latency model.The system model in [34] consists of APs equipped with independent MEC servers, a CPU with a central MEC server (CS), and devices performing offloading either to the CS or to one of their serving APs, with some successful computing probability.The joint decoding of the offloaded user data at one of the serving APs/CS is, however, not recommended due to extra delays caused by the fronthaul communications.In fact, the considered architecture implements a specific instance of cell-free massive MIMO, namely a small-cells network.Moreover, in [34] both uplink transmit powers and allocated computational resources are fixed rather than optimized.Following on this track, in this paper we explore the potential benefits of jointly optimizing radio and computational resources in a MEC-enabled CF-mMIMO system.
Our problem formulation is characterized by a multiobjective function and aims at striking an optimal balance between the minimization of the total uplink transmit power and the maximization of the sum uplink SE.Multi-objective optimization (MOO) is a mathematical framework to deal with optimization problems with multiple conflicting objective functions [35]- [37].A survey of MOO applied to signal processing in wireless networks, with emphasis on massive MIMO systems and conflicting metrics such as SE, energy efficiency, coverage and total transmit power, was given in [38].A conventional optimization approach consists in converting some of the objectives into constraints, whereas the fundamental approach of MOO consists in considering multiple objectives at once.There are two main methods: (i) computing the sample points on the Pareto frontier upon which making subjective decisions a posteriori, or (ii) a priori converting the MOO problem (MOOP) to a single-objective optimization problem (SOOP) by combining the objectives into a suitable goal function (the most common is the weighted sum) which reflects a subjective trade-off between metrics of interest.In the latter, the objectives are conventionally combined by using coefficients whose values reflect the subjective weight given to each metric.
Contributions: Our technical contribution can be summarized as follows.
• We propose a MEC-enabled CF-mMIMO architecture implementing a user-centric approach both from the radio and the computational resource allocation perspective.Unlike prior studies investigating computation-offloading implementations in distributed networks [27], [28], [33], [34], our model considers that users' computational tasks can be divided into independent subtasks which can be remotely executed in a distributed fashion and in parallel at the MEC servers of properly selected APs and at the MEC server of the CPU.• We formulate an optimization problem for jointly allocating users' transmit powers and the remote computational resources for offloading.Unlike prior works [27], [28], [30], [34] we formulate a MOOP that optimizes the tradeoff between total uplink transmit power minimization and sum SE maximization, under latency and resource budget constraints.• For efficiently solving the non-convex MOOP, we formulate an equivalent SOOP by using the weighted sum method and devise a framework including alternating optimization and successive convex approximation (SCA) which, unlike prior works, accounts for: (i) the usercentric cooperation clustering framework; (ii) a distributed allocation of the computational resources; (iii) a general formulation for any combining scheme and arbitrary correlated fading channels.As there is no unique solution for this SOOP, we provide its sub-optimal Pareto frontier, namely the set of objectives corresponding to the Pareto sub-optimal solutions obtained by iteratively solving the SOOP for several values of the weights.Finally, we show how the weights in the SOOP indirectly determine the effective latency of the offloading process experienced by the users.• We propose an alternative low-complexity approach to the proposed joint resource allocation, which consists in heuristically allocating the MEC server computational resources to the users, and then optimizing with respect to the uplink powers.• Since the final solution achieved by SCA-based methods may depend on the feasible solution initialization, we present a method to properly initialize the proposed iterative optimization algorithm, and to provide a rigorous assessment on the problem feasibility.• For benchmarking purposes, we extend our joint optimization strategy to a multi-cell co-located massive MIMO system, and to a small-cell implementation of CF-mMIMO.The latter constitutes a deterministic variant of the framework described in [34] wherein the task offloading model hinges on the knowledge of the users' computational demands and MEC servers' available computing resources.• We provide a comprehensive simulation campaign to highlight the improvements introduced by the proposed MEC-enabled CF-mMIMO system in terms of: (i) users' transmit power and energy consumption, (ii) offloading latency, (iii) amount of allocated remote computational resources, and (iv) computation offloading efficiency.We also study its performance under different strategies of AP selection for providing the communication service.• A further insight about the effectiveness of the proposed joint uplink power and computational resource allocation (JPCA) scheme is provided by evaluating the interplay between energy consumption, allocated remote computational resources and offloading latency.

II. SYSTEM MODEL
We consider a CF-mMIMO system operating in timedivision duplexing (TDD) mode and at sub-6 GHz frequency bands.A set of L APs, equipped with M antennas each, are geographically distributed and connected through a fronthaul network to a CPU.The APs coherently serve K singleantenna users in the same time-frequency resources, with LM ≫ K.The conventional block-fading channel model is considered, and let τ c denote the channel coherence block length.In TDD mode, each coherence block accommodates uplink training, uplink and downlink data transmission, such that τ c = τ p +τ u +τ d , where τ p , τ u and τ d are the training duration, the uplink and the downlink data transmission duration, respectively.
Borrowing the notation of [15], the channel between the k-th user and the l-th AP is denoted by the M -dimensional vector h lk , with h lk ∼ CN (0, R lk ), and R lk ∈ C M×M being the spatial correlation matrix.The corresponding largescale fading coefficient is defined as β lk = tr(R lk )/M .The channel between the k-th user and all the APs in the system is obtained by stacking the channel vectors h lk , ∀l as The channel vectors of different APs are reasonably assumed to be independently distributed.As a consequence, we have

A. Centralized Uplink Training
During the uplink training all the K users synchronously send a pre-determined pilot sequence of τ p samples.The pilot sequences are drawn by a set of τ p orthonormal vectors.Specifically, √ τ p ϕ k ∈ C τp denotes the pilot sent by the kth user, with ϕ k = 1.Whenever K is larger than τ p , the same pilot must be assigned to more than one user, causing pilot contamination.The pilot signal observed by AP l is Y p,l = K j=1 √ τ p p p,j h lj ϕ T j + Ω p,l ∈ C M×τp , where p p,j is the transmit power of the uplink pilot symbol, and Ω p,l is a matrix of additive noise whose elements are independently distributed as CN (0, σ 2 ).For any user k, the l-th AP projects Y p,l along the k-th pilot sequence, which yields: where the second term captures the interference due to pilot contamination.Assuming that the channel estimation is performed by the CPU, in each coherence interval, each AP needs to send the vector y p lk to the CPU.Upon a prior knowledge of the channel correlation matrices, the CPU performs linear minimum-mean square error (MMSE) estimation of the kth user channel h lk as ĥlk = √ τ p p p,k R lk Ψ −1 lk y p lk , where The estimation error is independent of the estimate, and given by hlk = h lk − ĥlk .It is distributed as hlk ∼ CN (0, C lk ), with Collecting all the channel estimates of user k in a vector, we have

B. User-Centric Uplink Data Transmission
The uplink data signal received by AP l is y l = K i=1 h li s i + n l , with s i being the data symbol transmitted by user i, E |s i |2 = p i , ∀i, and n l ∼ CN (0, σ 2 I M ) being the additive noise vector.In a practical and scalable user-centric implementation each user is served by a subset of APs selected among those ensuring the best channel conditions.Let M k be the set of the indices of the APs serving user k, and D lk ∈ C M×M , ∀l, ∀k be a diagonal matrix such that D lk = I M , if l ∈ M k , D lk = 0 M , otherwise.Under centralized uplink operation, the CPU computes the estimate of the transmitted data symbol s k as where v lk ∈ C M is the receive combining vector for the pair AP l-user k, 2) can be rewritten as T ∈ C ML being the collective noise vector.The first term in (3) is the desired signal over the known partially estimated channel, the second term is the selfinterference due to the (unknown) estimation error, the third term is the multi-user interference, and, lastly, the fourth term is the noise.An achievable uplink SE (bit/s/Hz) for user k, with centralized operation, is obtained by treating the last three terms as uncorrelated noise at the receiver: with This achievable SE holds for any combining scheme and arbitrary correlated fading channels, and accounts for user-centric data detection, channel estimation error, pilot contamination and estimation overhead.Hereafter, we consider the so-called Partial MMSE (P-MMSE) combining [15] which guarantees scalability and an excellent trade-off between performance and computational complexity.For an arbitrary user k, P-MMSE suppresses only the strongest interference contributions which are caused by the users whose indices are in the set S k = {i : D k D i = 0 LM }.The P-MMSE collective combining vector is given by

III. COMPUTATION-OFFLOADING AND LATENCY MODEL
We assume that both the APs and the CPU 2 offer computational facility to the users.Each user has a set of computational tasks to offload to multiple distributed MEC servers on some APs and/or to the MEC server at the CPU.In particular, we denote by G k the set of MEC servers at the APs and CPU that can provide computational offloading service to user k.
Let O k ⊆ G k be the set of APs where user k's computational tasks can be offloaded 3 .In a small system with one CPU, as it is the case for this paper, it is also reasonable that O k coincides with the set of all the APs in the system.We assume that user k needs to execute one or more computational tasks within a maximum tolerable latency L k .All the relevant information on these computational tasks can be encoded into b k bits, and their execution requires a total of w k computation cycles, that can be decomposed in T k computational subtasks, consisting of w k,1 , w k,2 , . . ., w k,T k computation cycles, with AP l has a computational capability of f AP l computation cycles per second (computational rate).While, the CPU can execute up to f CPU computation cycles per second.The fractions of computational resources assigned to subtask i of user k by the generic AP l and by the CPU are denoted by f AP l,k (i) and f CPU k (i), respectively 4 .Accordingly, it holds The total amount of remote computational resources assigned to subtask i of user k is given by represents the computational time needed to execute w k,i cycles (computational latency).Since the T k computational subtasks for user k are executed in parallel, the resulting computational latency for the task of user k is max While, the amount b k /R k is the time needed to transmit b k bits to the APs (transmission latency) over the wireless channel supporting a rate R k = B × SE k , with B being the transmission bandwidth and SE k being the instantaneous SE, i.e., the value attained by (4) with no expectation.Lastly, an additional latency contribution (fronthaul latency) is due to the forwarding of the b k bits from all the APs in the set M k to the CPU over the fronthaul network, which, assuming synchronous transmission across the APs, amounts to 2b k M ξ/C FH , where ξ denotes the number of bits used to quantize both real and imaginary parts of the uplink data signal y, and C FH is the fronthaul capacity of the link between any AP and the CPU, expressed in bit/s.Hence, the computational offloading must fulfill the following latency constraint [27] where we assume that L k includes any delay related to the signalling between AP and CPU, and the time needed to send the computational output back to the user.The latency constraint in (7) clearly couples radio and computational resources.Fig. 1 illustrates the signalling diagram of the computational offloading process for an arbitrary user k.As a first step, user k sends on the air interface a computational offloading request followed by the symbols encoding the data to be processed, the program to be executed remotely, and, also, the details on the subtasks which the main computational task is composed of.It is assumed that the overall computational offloading message has a length of b k bits.These b k bits are treated as normal information data, so they are sent from the user k during the uplink data transmission phase; the corresponding received signals are locally processed at the APs serving user k (i.e., the APs in the set M k ), and sent to the CPU over the fronthaul network for the centralized receive combining and the data decoding.The CPU, based on the received computational requests from all the users in the network, and on the knowledge of the estimated uplink channels, and of the available computing power at the CPU itself and at the APs, allocates each batch job, represented by an entry of the set {w k,i : k = 1, . . ., K; i = 1, . . ., T k }, either to itself or to one AP.The allocation of the computational resources between CPU and APs is subject to optimization as detailed in the next subsection.Once received the computational output of each batch job from the APs, the CPU sends the combined computational output back to the APs in the set M k , which finally perform a joint downlink data transmission to user k.

A. Joint Power and Computational Resource Allocation(JPCA)
We jointly allocate the users' transmit powers and the remote computational resources assigned to the users aiming at simultaneously minimizing the total uplink transmit power and maximizing the sum SE.These are two conflicting objective functions of a MOOP which will be treated through the classical scalarization technique.In particular, the MOOP is converted into a SOOP designing a single goal function that reflects a pre-determined trade-off between the objectives.To this end, let us first introduce the vector of the uplink powers p = [p 1 • • • p K ] T and the set F containing the allocated computational resources, that is The SOOP for the proposed JPCA can be formulated as In the above problem, Constraints (9d) and (9e) ensure that the allocate computational cycles do not exceed the computational capacity of the CPU and the APs, respectively, while constraint (9f), where u(•) denotes the unit-step function, ensures that each computational subtask is executed in one processing unit only, i.e. either at the CPU or in one of the APs.Constraint (9h) ensures that the allocated cycles are positive real numbers, and possible noninteger optimal solutions are then rounded (i.e., continuous relaxation).Moreover, where k } are constants denoting reference instantaneous SEs attained by a pre-determined uplink power allocation (e.g., setting p k = p max , ∀k) and ω p , ω se ∈ [0, 1] so that each term of the objective function (9a) is dimensionless and takes on values in the interval [0, 1].These weights determine a tradeoff between the minimization of the total trasmit power and the maximization of the sum SE, hence they indirectly act over the effective offloading latency, which is given by the l.h.s. of the constraint (9b).Problem ( 9) is clearly non-convex with respect to p due to the non-convexity (non-concavity) of the latency constraint (9b) and the minimum SE constraint (9c).Moreover, the non-linear constraint (9f) makes the optimization problem a mixed integer one.To solve the above problem, we resort to the alternating optimization approach, i.e. we first set some initialization values for p and ν and solve the problem with respect to F , and then for the obtained value of F , we solve the problem with respect to p and ν.The process is iterated until the value attained by the objective function converges and/or a maximum number of iteration has been reached.Notice that, since at each step the value of the optimized variable is updated only if it leads to a smaller value of the objective function, and since the objective function is bounded from below, the procedure provably converges.
1) Optimization with respect to F : We focus on the problem of determining the frequencies in F for fixed values of p and ν.First of all, notice that the objective function in (9) does not depend on F .The purpose of the optimization with respect to the computational cycles is thus to make the second term in (9b) as small as possible so that to make the latency constraint as loose as possible, and the subsequent optimization with respect to p and ν can be done on a wider search domain.Based on the above reasoning, we consider the following problem: , ∀k, ∀j = 1, . . ., T k , (11b) (9d), (9e), (9f), (9h) . (11c) In the above formulation, constraint (11b) descends from, and is equivalent to, constraint (9b).An equivalent formulation for problem (11) is the following: Problem ( 12) is a feasibility program; specifically, the goal is to find the optimal frequencies in F that minimize the value of t and such that the problem ( 12) admits a nonempty feasible set.This can be accomplished through the bisection method [39].In particular, feasibility should be first checked assuming that t is unboundedly large, i.e., which makes constraints (12b) inactive.If the problem reveals to be feasible with t → +∞, then the bisection method can be applied; specifically, elaborating on the constraints (12b) and (12c), the following values can be used as the start and the end of the initial search interval on t: Notice that without constraint (9f), ( 12) would be a simple linear program which could be solved with any off-the-shelf optimization routine.The presence of (9f), instead, requires further efforts.For a preassigned value of t, namely for each iteration of the bisection algorithm, we consider the following associated problem to ascertain the feasibility of the constraints in (12): , ∀k, ∀j = 1, . . ., T k , (14b) (9d), (9e), (9f), (9h) .
(14c) Problem ( 14) aims at finding both the optimal user-to-MECserver association for the computation offloading and the optimal amount of remote computational resources to be assigned.This problem can be shown to be cast as a multiple knapsack problem.Firstly, we introduce ḟℓ,k (j), ℓ ∈ G k , to denote the computational resources that can be allocated to user k for offloading its subtask j on MEC server ℓ.Hence, ḟℓ,k (j) is equal to either f CPU k (j) or f AP l,k (j), with l ∈ O k .Then, we introduce X = {x ℓ,k,j ∈ {0, 1} : k = 1, . . ., K; j = 1, . . ., T k ; ℓ ∈ G k } to denote the set of the binary variates {x ℓ,k,j } mathematically handling the user-to-MEC-server association.Hence, the amount of computational resources effectively allocated to user k for its subtask j is given by ℓ∈G k ḟℓ,k (j)x ℓ,k,j , with x ℓ,k,j ∈ {0, 1} and ℓ∈G k x ℓ,k,j = 1, since each subtask is executed either at the CPU or in one of the APs.Therefore, problem ( 14) can be rewritten as max s.t.
and F is redefined as Secondly, we avoid optimizing the amount of computational resources in (15) by setting ḟℓ,k (j) to a constant as By doing so, constraint (15b) becomes inactive, and the amount of computational resources to assign to each user will be eventually determined at the end of the bisection algorithm, being possibly dependent on the smallest value of t.While, for each iteration of the bisection algorithm, we only optimize the user-to-MEC-server offloading association.Hence, problem (15) can be reformulated as the following multiple knapsack problem: where ḟℓ denotes the computational capacity of the MEC server ℓ, and constraint (16c), originally with equality, has been relaxed.In the above problem, the knapsacks are the MEC servers at the APs and at the CPU, the knapsacks' capacity is the number of CPU cycles available at each MEC server, the objects to be put in the knapsacks are the computational loads to be offloaded, and, finally, the weight (and profit) of the generic computational task is given by ξ k,j (t, p).Problem ( 16) can be solved using some of the methods available in the literature, see for instance [40,Chapter 6].Once the problem has been solved, the values of the output variables in X provide the sought allocation of the CPU cycles to the tasks.For instance, if, for certain values t = t * , ℓ = ℓ * , k = k * and j = j * we have x ℓ * ,k * ,j * = 1, this means that subtask j * of user k * is to be executed at the MEC server ℓ * , using ξ k * ,j * (t * , p) cycles per second.Hence, the optimal F is obtained upon the optimal X and value of t at the last iteration of the bisection method as 2) Optimization with respect to p and ν: Let us assume now that the elements in F are fixed and let us solve problem (9) with respect to p and ν.Basically, we have to minimize the objective function in (9) with respect to p and ν and with constraints (9b), (9c) and (9g).The problem is not convex due to the spectral efficiency expression SE k (p) in constraints (9b) and (9c).We thus resort to sequential convex programming, that is an iterative optimization framework wherein in each iteration we optimize a related convex approximation of the original problem.Notice that the uplink instantaneous SE for user k can be expressed as where num k (p) and den k (p) describe the numerator and the denominator of (5), respectively, which are functions of the power coefficients.As log 2 (•) is increasing and the summation preserves concavity, the r.h.s. of ( 17) is the difference of two concave functions.Recalling that any concave function is upper-bounded by its Taylor expansion around any given point p (0) , a concave lower-bound of SE k (p) is obtained as Hence, constraints (9b) and (9c) can be approximated and convexified by taking for any feasible choice of p (0) .The arguments above hold if the receive combining vector is independent of the uplink powers.This is not true in general, as for the P-MMSE combining scheme in (6).In this case, SE k (p, p (0) ) is still a non-linear function of the uplink powers.This issue can be tackled by treating the combining vectors at the n-th iteration of the SCA optimization framework as constant with respect to the current uplink transmit powers p (n) , and being exclusively function of p (n−1) .Hence, the problem to be solved at the nth iteration of the proposed SCA method can be formulated as minimize where ̟ p and ̟ se are set as in (10) with SE ) emphasizes that the receive combining vectors, involved in the expression of the SE, are constant with respect to the current transmit powers p (n) (i.e., the optimization variables), and are computed upon the optimal values of the transmit powers at the previous iteration of the SCA algorithm, namely p (n−1) .For any iteration n of the SCA method, SE k (p (n) , p (n−1) ) ) is a suitable convex approximation of SE k (p (n) ), as the following properties are fulfilled [41]: According to the theory in [41], by virtue of the properties (20a), (20b), the sequence of the values attained by the objective function (9a) at the optimal points of each iteration of the SCA algorithm is monotonically decreasing with the increase of the iteration number and converges to a finite limit.Moreover, due to the property (20c), the optimal solution of the SCA algorithm at convergence satisfies the Karush-Kuhn-Tucker (KKT) conditions of problem (9).
Algorithm 1, to be run at the CPU, summarizes the proposed alternate maximization strategy for sub-optimally solving problem (9).

B. Algorithm initialization
We now discuss on how to initialize with a suitable power vector p (0) the alternating optimization procedure in Algorithm 1.The procedure that we adopt is the following: first of all, we start by considering an allocation of the computational resources in F minimizing the maximum SE requirement, so that the load of the latency constraint on the air interface of the system is minimized; next, we compute the transmit powers needed to achieve the found minimum SE requirement for all the users.The obtained values of the transmit power will be used to initialize the proposed JCPA algorithm.To begin with, we notice that constraint (9b) can be written as Algorithm 1 Alternating optimization for problem (9) Input: Any choice of feasible transmit powers p (0) , Mmax, ǫ ; Check feasibility of ( 16) with t = +∞; if (16) is unfeasible then Exit procedure and declare the problem unfeasible; Set t0 and t1 as in (13); 10: repeat %% Bisection algorithm 11: t ← (t0 + t1)/2; 12: Solve problem ( 16) with current value of t; 13: if ( 16) is unfeasible then t0 ← t; else t1 ← t; Set F as resulting from the solution of ( 16) with t = t1; %% Updating p and ν 16: Initialize n ← 1; p (0) ← p (n) ; 17: Initialize ν (0) ← [SE1( p (0) ), . . ., SEK ( p (0) )] T ; 18: repeat %% SCA algorithm 19: Let p ⋆ , ν ⋆ be the optimal solutions of problem (19); 20: until convergence 22: end if
The above problem has the same structure as (12) and can be solved following the same procedure outlined in the previous subsection.Let us denote by F ⋆ the solution to problem (23).We are now ready to compute the transmit vectors that fulfill the minimum SE requirement.Assuming that the computational rates in (7) are those corresponding to F ⋆ , the latency constraint requires that Algorithm 2 Standard power control assuming P-MMSE Input: {Υ}, {C k }, {D k }, { ĥk }; 1: Initialize n = 0, χ = 1; p (0) ; v(p (0) ); 2: while χ > 0.005 do if G is non-negative and ρ < 1 then 6: n ← n + 1; 7: Compute v k (p (n) ) according to (6), ∀k; else Exit procedure and declare the problem unfeasible; 11: end if 12: end while Output: p ⋆ ← p (n) ; which represents a QoS requirement.The above inequality translates to the following instantaneous SINR requirement where .
Hence, the instantaneous SINR requirement in ( 25) can be rewritten as the vector inequality where Υ= diag( γ 1 , . . ., γ K ), , and Hence, a set of nonnegative uplink powers can be determined capitalizing on the requirement in (26).The set of SINR targets { γ k } is feasible if and only if all the diagonal elements of G are nonnegative 5 , and the Perron-Frobenius eigenvalue of the matrix ΥG −1 Z, denoted by ρ, is real and nonnegative, and ρ < 1 [42].If these conditions are satisfied, then I(p) = ΥG −1 (Zp+σ 2 u) is a standard interference function [43], and an optimal solution for the uplink transmit powers, is obtained iteratively through the standard power control algorithm [43] as p (n) = I(p (n−1) ), for any given initial choice p (0) . 6lgorithm 2 specifies the steps of the standard power control algorithm based on sequential optimization, and assuming P-MMSE.If Algorithm 2 converges to an optimal solution, say p ⋆ , then this solution can be used as initial feasible choice for Algorithm 1, that is p (0) = p ⋆ .Whenever a set of feasible uplink transmit powers is found (i.e., line 7 of Algorithm 2), the receive combining vectors {v k } must be updated accordingly, such that the matrix ΥG −1 Z at the next iteration is properly computed.If this two-stage procedure fails to find a feasible set of uplink powers and computational rates, then a non-empty feasible set for problem (19) can anyhow be enforced by a proper admission control [44], [45].

A. Co-located Massive MIMO System Model and Resource Allocation
A co-located massive MIMO system can be seen as a special case of a CF-mMIMO wherein each user is served by only one of the few deployed base stations (BSs), each of which is equipped with many antennas.An achievable uplink SE for user k served by BS l, is given by SINR for an arbitrary receive combining vector v lk ∈ C M .The effective uplink SINR, SINR lk , is maximized by using the Local-MMSE7 (L-MMSE) receive combining, which is L-MMSE is not scalable but can be used as a benchmark, since it constitutes the optimal combining scheme for cellular networks [9].For co-located massive MIMO, we can reasonably assume that only the serving BS offers computational facility to its user terminals.Hence, using the same notation as in (9), we can formulate the JPCA problem for cellular networks as minimize where SE lk represents the instantaneous SE, that is the value attained by (28) without expectation and L cell k denotes the maximum tolerable latency for user k in the cellular setup.This latency value is presumably larger than its cell-free counterpart as it does not include the delay produced by the fronthaul signalling (i.e., steps 2-5 in Fig. 1).Moreover, in (31), f BS l indicates the computational capability of BS l, ζ denotes a K ×1 vector of optimization variables, where its k-th element, ζ(k), represents the amount of computational resources allocated to user k by its serving BS l, that is is an auxiliary binary vector, where the k-th element is 1 if BS l serves user k, and 0 otherwise.Lastly, ν k represents the minimum instantaneous SE for user k.Problem (31) can be convexified via sequential optimization, using a similar methodology as in (18).Hence, the optimization problem at the n-th iteration of the SCA method can be formulated as minimize where ̟ p is set as in (10), and SE lk (p (n) , p (n−1) ) is a concave lower-bound of SE lk (p (n) ) around the point p (n−1) , obtained by using the same methodology as in (18).The SCA algorithm is run in a centralized fashion by a network entity, e.g., one of the BSs, an its convergence is guaranteed, as SE lk (p (n) , p (n−1) ) is a suitable convex approximation of SE lk (p (n) ).As per the feasibility, problem (32) admits a non-empty feasible set if , ∀k, and where R lk = B×SE lk , is the uplink instantaneous rate of user k served by BS l, and K l is the set of the users served by BS l.The conditions in (33) are necessary but not sufficient due to the interference-limited scenario that makes simultaneously maximizing the per-user SEs intractable.

B. Heuristic Resource Allocation for Distributed Network Topology
In this section, we propose an alternative approach to the JPCA for cell-free massive MIMO which consists in heuristically allocating the MEC server computational resources to the users according to a pre-determined metric, and then optimizing with respect to p and ν as described in Section III-A.Hence, unlike the JPCA, such a heuristic resource allocation does not jointly optimize the uplink power consumption and the allocated computational resources.However, it represents a low-complexity solution with respect to the JPCA which requires solving a multiple knapsack problem-an NP-hard problem in strong sense [40].The psuedo-code of the proposed heuristic allocation of uplink powers and computational resources is reported in Algorithm 3. As for this heuristic scheme, we assume that a subtask offloaded by any user can either be processed at the CPU MEC server or at one of the AP MEC servers.Firstly, the subtasks are sorted in descending order by the metric . ., T k , which represents the computational demand of user Algorithm 3 Heuristic allocation of uplink powers and computational resources Input: {µ k,j }, { ḟℓ }; 1: Initialize ḟℓ,k (j) = 0, ∀ℓ, ∀k, ∀j; 2: for each task in descending order by µ k,j do 3: else Exit procedure and declare the problem unfeasible; end if 8: end for 9: ϑ ℓ = ḟℓ / K k=1 T k j=1 ḟℓ,k (j), ∀ℓ : ∃ ℓ, k, j. ḟℓ,k (j) = 0; 10: ḟℓ,k (j) = ϑ ℓ ḟℓ,k (j), ∀ℓ ∈ G k ; 11: Initialize n ← 1; p (0) ← p (n) ; 12: Initialize ν (0) ← [SE1( p (0) ), . . ., SEK( p (0) )] T ; 13: repeat %% SCA algorithm 14: Let p ⋆ , ν ⋆ be the optimal solutions of problem ( 19); 15: k for subtask j related to its latency requirements and uplink rate, conditioned to a pre-determined power allocation.Then, each task is offloaded at the MEC server with more available computing resources, either at one of the APs or at the CPU.The fractions of computational resources assigned to each user's subtask are further scaled so as to saturate the computational capabilities of those APs and (possibly) the CPU involved in the offloading process.Once the set F is determined, Algorithm 3 concludes by solving problem (19) as described in Section III-A.Hence, this scheme heuristically allocates the remote computational resources and only optimizes with respect to p and ν, unless the computational capabilities at the MEC servers are insufficient, i.e., there exists at least a subtask j of user k such that µ k,j > max ℓ∈G k fℓ , with fℓ being the "online" available computing resources at the CPU and at each of the AP MEC server.Notice that if the computational resource allocation problem in Algorithm 3 is feasible, then the necessary but not sufficient condition for the feasibility of problem (19)

C. Small-Cell Implementation and Resource Allocation
With the terminology small-cell we indicate an instance of cell-free network where each user receives communication service from only one of the APs and computational offloading service from either one AP or the CPU.With respect to an arbitrary user k, it holds |M k | = 1 and |G k | = 1, and not necessarily M k coincides with G k .A similar MEC-enabled architecture was advocated in [34] whose task offloading model is random and arbitrarily hinges on an offloading probability.Moreover, in [34] both uplink transmit powers and allocated computational resources are fixed rather than optimized.Conversely, we herein consider a deterministic, heuristic task offloading model which accounts for the user computational demands and available remote computing resources.Specifically, the set M k comprises the AP with the best average channel gain towards user k.While, the set G k comprises the MEC server with more available computing resources according to Algorithm 3 (up to line 15).Importantly, since each user receives computational offloading service from only one MEC server, the offloading process is carried out on a task basis rather than on a subtask basis, namely for an arbitrary user k it holds T k = 1.As for the small-cell implementation, an achievable uplink SE for user k served by AP l is given by equations ( 28)-( 29), which is maximized by the LMMSE combining scheme in (30).Since data decoding is performed locally at the AP, there is no need for the AP to forward the uplink data signal y to the CPU through the fronthaul network, but it can transmit the b k bits directly to the MEC server in charge of the offloading process.As b k /C FH ≪ 2b k M ξ/C FH , and the former is very small, we assume that L k = L k for the small-cell implementation.

V. SIMULATION RESULTS
We consider a coverage area of 1 km 2 served by a total number of antennas N = LM = 400.For the co-located massive MIMO setup, we choose L = 4 BSs, equipped with M = 100 antennas each, and deployed as a regular grid with intersite distance equal to 500 m.For the CF-mMIMO setup, we select L = 100 APs, equipped with M = 4 antennas each, and deployed as a regular grid with intersite distance equal to 100 m.For all the setups a wrap-around simulation technique is used to remove the edge effects of the (nominal) coverage area.All the systems operate at 2 GHz carrier frequency, over a communication bandwidth B = 20 MHz.The receiver noise power is conventionally set to -94 dBm, while the maximum transmit power per user is p max = 100 mW.The TDD coherence block is τ C = 200 samples long, τ d = 0, and τ p = 5 samples is the uplink training duration.All the setups serve the same set of K = 20 users, that are uniformly distributed at random over the coverage area.A random realization of users' locations defines a network snapshot, and determines a set of large-scale fading coefficients.These are computed according to the 3GPP Urban Microcell model defined in [46,.The channel correlation matrices {R lk } are generated by using the popular local scattering [9, Sec.2.5.3]model assuming half-wavelength spaced ULAs, and jointly Gaussian angular distributions of the multipath components around the nominal azimuth and elevation angles.The random variations in the azimuth and elevation angles are assumed to be independent, and the corresponding angular standard deviations (ASDs) are equal to 15 • , which represents strong spatial channel correlation.
As τ p < K, pilots are to be re-assigned across users.To this end, we resort to the joint pilot assignment and AP (BS)user association described in [9,Sec. 5.4], so as to ensure that users served by the same set of APs (same BS, for the co-located setup) are given orthogonal pilots.Concerning the power control, we assume that the initial choice for the feasible transmit powers of the SCA algorithm follows, for all the setups, a fractional power control strategy given by Concerning the computation-offloading and latency model, for the cell-free setup we assume f CPU = 10 10 cycles/s, while f AP l are uniformly distributed random integers from the interval [2, 4] × 10 9 cycles/s.The latency requirements are L k = 0.2 s ∀k.The fronthaul capacity is C FH = 10 Gbps, and the number of bits for quantization is set as ξ = 16.As per the co-located setup, we select f BS l = L AP 1=l f AP l +f CPU /L BS , where L AP is the number of APs in the cell-free setup, while L BS is the number of BSs in the co-located setup, that is 100 and 4, respectively.This choice ensures the same amount of available computational resources over the simulation area for both the setups.Lastly, the latency requirements for the users in the co-located setup is L cell k = 0.3 s ∀k.Common to all the setups, the computational bits, {b k }, are uniformly distributed random integers from the interval [1,4] Mbits, and the number of computation cycles needed to run the task itself is set as a linear function of b k , that is w k = α b k , with α = 50 cycles/bit [27].Finally, the number of subtasks any user's task is divided into is an integer drawn uniformly at random from the interval [1,4].An instance of the multiple knapsack problem, involved in Algorithm 1, is sub-optimally solved in polynomial time (with respect to the total number of subtasks) by using the Lagrangian Relaxation technique [40, Section 6.2.2] combined with the cross-entropy optimization method [47].

A. Performance Comparison between Network Architectures
Firstly, we focus on the radiated power consumption.In Fig. 2(a), we show the cumulative distribution function (CDF), obtained over 200 network snapshots, of the uplink transmit power per user, expressed in mWatt, being the solution of the Algorithm 1 and the SCA problem (32) for cell-free and co-located massive MIMO, respectively.With the label "Cell-free, Alg.3" we refer to the framework wherein uplink powers and computational rates are heuristically allocated according to Algorithm 3. As for the smallcell implementation, we consider two cases: (i) the label "Small-cell" indicates the framework wherein uplink powers and computational rates are not optimized, as in [34].The uplink powers result from (34), while the computational rates are assigned according to the approach described in lines 1-15 of Algorithm 3, with T k = 1; (ii) the label "Small-cell + Alg.3" indicates the framework wherein uplink powers and computational rates are heuristically allocated according to Algorithm 3, with T k = 1.In Fig. 2(a), we consider the configuration: ω p = 1, ω se = 0.5, which applies to all the scheme but "Small-cell".
Numerical results reveal a dramatic transmit power saving for the CF-mMIMO users as compared to the co-located massive MIMO and the small-cell users.Assuming ω p = 1, ω se = 0.5,-which is the configuration that prioritizes the power saving over the transmission latency-at high percentiles, where the transmit power consumption is more significant, we indeed observe that the CF-mMIMO users can considerably reduce their transmit power as compared to the co-located massive MIMO and small-cell users.In co-located massive MIMO, those users with worse channel conditions, presumably at the cell-edge, need to employ more power to receive the required computational offloading service.In small-cell implementations instead, the users would benefit from a power optimization rather than employing a fixed power control strategy, such as fractional power control.To achieve excellent performance in small-cell implementations, the proposed Algorithm 3 should be employed for a proper resource allocation.Fig. 2(a) also highlights that the proposed heuristic allocation in Algorithm 3 performs as well as the proposed JPCA in Algorithm 1, in terms of power consumption.At low percentiles, presumably corresponding to the users with better channel conditions and exiguous computational demands, the performance gap between co-located massive MIMO and CF-mMIMO (including its instance "Small-cell + Alg.3") reduces.Importantly, our JPCA scheme in CF-mMIMO is able to guarantee fairness among the users in terms of transmit power consumption.In Fig. 2(b) we consider configurations giving equal and more weight to the SE with respect to the power consumption, with ω p = ω se = 1, and (iii) ω p = 0.5, ω se = 1, respectively.The levels of transmit power are larger than those attained by the previous configuration to guarantee larger SEs and thereby reducing the latency of the offloading process.Interestingly, the performance gap between CF-mMIMO and co-located massive MIMO increases as a higher SE is required.The macro-diversity gain enables CF-mMIMO to provide comparable SE levels to those of colocated massive MIMO, yet with lower uplink power consumption.
To better motivate the previous performance, we now focus on the amount of computational resources allocated to the users, that is {f k (i)}.In Fig. 3(a), we show the CDF of the computational resources allocated to the single user, expressed in GHz (10 9 ×cycles/s).While, Fig. 3 the CDF of the total allocated computational resources, computed as As we experienced negligible performance differences between the three considered weight configurations, we only report the results achieved with ω p = ω se = 1.The amount of computing resources allocated by Algorithm 3 is significantly larger than that allocated by Algorithm 1 for CF-mMIMO and small-cell implementations.Indeed, the fine-tuning of the allocated computational rates described by lines 16-19 of Algorithm 3 is carried out to utilize all the residual available resources after a first, conservative, feasible allocation based on the metric µ k,j (the latter constitutes the "Small-cell" resource allocation approach).Allocating more computing resources entails reducing the computational latency and enables to increase the transmission latency as a result of lowering the uplink powers, while meeting the latency constraint.This motivates why "Smallcell + Alg.3" allows higher levels of power saving than "Cell-free, Alg.1", and how the performance gap, in terms of power consumption, between the nearly-optimal Algorithm 1 and the heuristic Algorithm 3 is filled.Moreover, we recall that the effective latency constraint for the CF-mMIMO setup is stricter than that of its small-cell counterpart due to the fronthaul latency contribution, and this entails a higher power consumption in CF-mMIMO to further reduce the transmission latency.In co-located massive MIMO, the MEC servers at the BSs offer huge computational power which is fully exploited by the users, especially those with higher computational demands and poor channel conditions, for which drastically reducing the computational latency is the only way to fulfill the latency requirement.Changing perspective, Fig. 4(a) shows the amount of computing resources allocated per MEC server, including APs and CPU.As already mentioned earlier, the "Small-cell" resource allocation is conservative and leads to a misuse of the computational resources, which in turn results to an uplink power waste.Notice that the "Smallcell" approach attains the minimum energy consumption at the MEC servers-which is proportional to the cube of the computational rates-in line with the objective in [34].The resource allocation for CF-mMIMO via Algorithm 1 leads to excellent uplink power savings with a relative small amount of allocated computational resources per MEC server.On the other hand, the uplink power savings achieved by Algorithm 3, both for CF-mMIMO and small-cells, can only be obtained by increasing the computational rates, hence the energy consumption, at the MEC servers.As per the co-located setup, the MEC servers basically work at full processing capacity to guarantee the latency requirements.The effective latency experienced by the users due to the offloading process is another relevant aspect to measure the effectiveness of the JPCA scheme, and it is shown in Fig. 4(b).First, we remind that the latency requirements of the cell-free users account for the delay of the data forwarding over the fronthaul network, i.e., step 2 of Fig. 1, thus are effectively stricter than those of the co-located and small-cell users.CF-mMIMO with Algorithm 1 is able to fulfill the latency requirements by a large margin compared to "Small-cell" and col-located massive MIMO showing the potentiality to support even stricter requirements.On the other hand, as Algorithm 3 allocates way more computing resources to the users, the computational latency can be remarkably reduced so as to minimize the overall latency experienced by the users.In this regards, the performance of CF-mMIMO and small-cells are almost equivalent when Algorithm 3 is employed.Lastly, notice that the "Small-cell" strategy is designed upon satisfying the latency constraint with equality.The choice of the parameters {ω p , ω se } clearly affects the effective offloading latency.The latency increases when the uplink power minimization is prioritized over the SE maximization.This suggests that the transmission latency is dominant over the computational latency in this scenario.Importantly, CF-mMIMO combined with Algorithm 1 can simultaneously guarantee significant transmit power saving and low offloading latency, despite the additional delay due to the transmissions over the fronthaul.Fig. 5(a-b) show the transmit energy consumption in J/Mbit/user, given by E k = p k /(B ×SE k ), and the computation offloading efficiency (OE), respectively.We define the OE as (35) where L eff k denotes the effective latency experienced by user k due to the offloading process, i.e., the LHS of the latency constraint in (7) and (31b) for cell-free and co-located massive MIMO, respectively.While, L req k denotes the latency requirement for user k, which is equal to L k for CF-mMIMO and small-cells, and equal to L cell k for co-located massive MIMO.This metric relates the optimization variables of our interest to each other, and measures the amount of uplink transmit power needed for 1 GHz of computational resources allocated at the MEC servers, also accounting for how much shorter the effective latency is as compared to the requirement.Hence, the larger this metric is, the more efficient the offloading process is.We observe that cell-free users can save a significant amount of transmit energy with respect to both the co-located and the small-cell users.This confirms the outstanding ability of CF-mMIMO (regardless of the resource allocation algorithm) of simultaneously guarantee low transmission latency and significant transmit energy consumption savings.Not least, fairness among the users is ensured unlike in small-cell and colocated massive MIMO.The energy consumption gap between CF-mMIMO and small-cell is due to the macro-diversity gain provided by the former and increases as we prioritize the power minimization over the SE maximization.Importantly, the OE attained by CF-mMIMO combined with Algorithm 1 is far superior than any other considered approach.This confirms the nearly-optimal nature of the proposed JPCA strategy over a disjoint radio and computational resource allocation (i.e., Algorithm 3) as well as over different network setups, namely small-cells and co-located massive MIMO, and despite the stricter latency requirements.Clearly, the OE increases when prioritizing the SE maximization over the uplink power minimization.
Finally, we investigate the user transmit power and the effective offloading latency as the ratio ω p /ω se varies.By increasing this ratio, the SCA approach solving the optimization with respect to p and ν prioritizes the minimization of the peruser uplink power, which is clearly shown by the monotonic decreasing behaviour of the curves in Fig. 6(a).The uplink power per user achieved in co-located massive MIMO is quite sensitive to the ratio ω p /ω se and attains values below 10 mW only for ω p /ω se ≥ 1, while the power saving in CF-mMIMO and small-cell via Algorithm (3) in the region ω p /ω se < 1 is remarkable.As per the effective offloading latency experienced by the users, Fig. 6(b) shows that CF-mMIMO and smallcell implementations perform equally well.while the gap with respect to the co-located setup becomes tremendous.Colocated massive MIMO is quite sensitive to the offloading latency as ω p /ω se increases, while for CF-mMIMO and smallcell the effective latency increases softly.

B. Pareto Frontier of the proposed MOOP
As we already mentioned in Section III-A, there is no unique solution for the SOOP in (9), but there exists a set of bounded trade-off Pareto optimal solutions, that is a Pareto optimal that enables improvements in some objectives with bounded trade-offs in others.We first reformulate the objective of problem (9) , where ω = ω se /ω p and const = p max /max k SE (0) k are obtained from (10).Notice that the constant factor ̟ p has no effects on the minimization, thus it can be removed from the objective.Finally, a sub-optimal Pareto frontier is obtained by iteratively solving the SOOP according to Algorithm 1 for several values of ω and plotting the corresponding objective values separately, as shown in Fig. 7 (black curve).The results in Fig. 7 are obtained by considering the setup in Section V, for one random realization of APs' and users' locations and averaging over two hundreds random realizations of the small-scale fading.The Pareto frontier reveals the trade-off between the total transmit power minimization and the sum SE maximization according to the selection of ω which, in turn, affects the effective latency experienced by the users (red curve).The value of the design parameter ω that provides the best trade-offs between power saving and latency can be easily identified by inspection from Fig. 7.

C. Impact of the AP Selection Strategy
The simulation results shown in this section aim at highlighting the impact of the AP selection strategy on the JPCA scheme in CF-mMIMO.In section Section V-A, we assumed the AP-to-user association described in [9,Sec. 5.4], so as to ensure that users served by the same set of the best APs are given orthogonal pilots.Hence, if τ p = 5, each AP participates to the service of up to 5 users.We refer to this scheme as dynamic cooperation clustering (DCC).From an energy efficiency viewpoint such an AP selection strategy results to be costly as many APs, even those bringing negligible contribution to the performance, are involved and active both in the radio communication and computational offloading service of a user.We next give a qualitative study of the energy consumption at the server side, by considering alternative AP selection strategies.An AP selection strategy establishes a different fraction of APs involved in the communication service.We consider a fixed cooperation clustering (FCC) scheme, wherein each user, upon the associations established by the DCC scheme, is only served by the best (channelwise) G APs.In addition, we consider the large-scale-fadingbased AP selection (LSFBS) [48], wherein each user, upon the associations established by the DCC scheme, is only served by the APs that contribute to the 95% of its channel gain.Fig. 8 shows the probability density function (PDF) of the number of APs per user and the number of users per AP, for different AP selection strategies, assuming K = 10, τ p = G = 5.The LSFBS involves a handful of APs per user with high probability, while the DCC scheme selects many APs per user, with a non-negligible probability of selecting all the APs.The FCC scheme always select G = 5 APs per user.Changing perspective, 60% and about 38% of the APs is off the communication service with FCC and LSFBS, respectively, while the DCC always selects τ p = 5 users per AP.As we can observe in Fig. 9, selecting a fixed number of APs per transmit bits of the computetional task user is not convenient, as each user needs a tailored number of cooperating APs for achieving the cell-free experience.Hence, the FCC AP selection strategy achieves an uplink SE slightly lower than DCC and LSFBS which, in turn, results in a longer effective latency.While, DCC and LSFBS strategies provide equivalent per-user SE and effective latency, although with different levels of uplink transmit power per user.To counterbalance the lack of macro-diversity gain when selecting a very few APs, LSFBS requires the users to use more transmit power than DCC, SE being equal.Conversely, when the DCC selects too many APs, a user needs to use higher uplink powers to guarantee a good receive combining at the furthest APs.Lastly, the AP selection strategy only concerns the communication service, thus it has no impact on the computing resource allocation, not shown herein for brevity.

D. JPCA: A Closer Outlook
In this section, we explore more in detail the effectiveness of the proposed JPCA, evaluating the interplay between uplink power, SE, allocated computational resources, and effective offloading latency.In Fig. 10, we present the simulation results of one network snapshot, where the final values are averaged over 200 channel realizations.For all the users, whose indices appear on the x-axes, we report: computational task size b k in Mbits; effective average SNR computed as l∈M k β lk /σ 2 and converted to dB; uplink SE in bit/s/Hz; uplink transmit power in mW; the allocated computational resources per user, namely { T k i=1 f k (i)} in GHz; and the effective offloading latency consisting of transmission, computational and fronthaul latency.We consider two simulation setups: blueish bars refer to the cell-free setup described in Section V but with {b k } ∼ U(1, 3), which we call "loose" setup for brevity; reddish bars refer to a "strict" cell-free setup with higher user computational demands, that is {b k } ∼ U(3, 5) Mbits.The latter is of particular interest because highlights how the JPCA operates under stricter constraints.The propagation scenario is in common to both the setups, as we fixed the simulation seed in order to obtain the same channel conditions (black bars).By inspecting Fig. 10, we observe that user 2 is in the adverse conditions of poor SNR and high computational demand.The JPCA naturally needs to allocate more power and computational resources to this user than others in order to reduce its transmission latency (by increasing its SE) and computational latency.As a comparison, user 9 has the same computational demand but better SNR, hence its latency requirements can be more easily fulfilled by solely reducing its transmission latency through allocating slightly more uplink power.In the "strict" setup (reddish bars) the fronthaul latency is longer as it is proportional to the user's task size, and the overall effective latency almost equals the user's requirements of 200 ms.For the "loose" setup (blueish bars) we obtained different but equally interesting results.The fronthaul latency is less pronounced due to the lower user's computational demands.Importantly, we observe a more uniform allocation of the uplink power over the users as compared to the "strict" setup case.The computational latency is dominant over the transmission latency for those users experiencing good channel conditions as high SEs can be achieved with small amount of transmit powers.Conversely, the transmission latency is dominant for those users experiencing bad channel quality.Interestingly, we observe that the effective offloading latency is far below the latency requirement of 200 ms, which results from the choice of achieving a fair balance between transmit power saving and offloading latency by setting ω p = ω se = 1.

VI. CONCLUSION
The problem of jointly allocating the uplink powers and network computational resources subject to latency constraints in a MEC-enabled CF-mMIMO system was considered in this paper, with the aim of minimizing the total transmit power and simultaneously maximizing the uplink sum SE, and thereby providing an excellent trade-off between user power consumption and effective offloading latency.For efficiently solving such a non-convex problem, a framework based on alternating optimization and successive convex approximation along with an alternative low-complexity heuristic approach were proposed.A detailed performance comparison between the proposed MEC-enabled CF-mMIMO architecture, its colocated and small-cell counterparts was also provided.Simulation results revealed that CF-mMIMO provides far superior computation offloading efficiency than other network architectures, and constitutes a promising candidate to suitably and flexibly support MEC applications.The proposed joint resource allocation strategy is effective in simultaneously guaranteeing to the users low offloading latency, fairness and significant transmit power saving by distributing the computational workload over multiple MEC servers.Devising a low-complexity JPCA algorithm based on learning [49], [50] and/or non-convex optimization (e.g., differential evolution [51]) is an appealing research direction for future works, as well as extending this study to a partial computational offloading model.

Fig. 2 .
Fig.2.CDF of the per-user uplink transmit power for cell-free and co-located massive MIMO, assuming three configurations of {ωp, ωse}.K = 20, and pmax = 100 mW for all the users.The x-axis in Fig.2(a) is in logarithmic scale.

Fig. 3 .
Fig. 3. CDF of the per-user (a) and total (b) computational resources allocated remotely, for cell-free, co-located massive MIMO and small-cell implementations.f BS l =

Fig. 7 .
Fig. 7. Sub-optimal Pareto frontier of the JPCA problem in (9) and resulting latency per user.Results obtained by considering the setup in Section V, for one network snapshot and averaging over 200 channel realizations.

Fig. 8 .
Fig. 8. PDF of the number of (a) APs per user, and (b) users per AP, for different AP selection strategies.K = 10, τp = G = 5.

Fig. 10 .
Fig.10.Simulation results for one network snapshot, averaged over 200 channel realizations.x-axes report the user index.Blueish bars refer to the cell-free setup described in Section V, but with {b k } ∼ U(1,3).Reddish bars refer to a cell-free setup with {b k } ∼ U (3, 5) Mbits.Black bars refer to both the setups.