Massive MIMO for Serving Federated Learning and Non-Federated Learning Users

With its privacy preservation and communication efficiency, federated learning (FL) has emerged as a promising learning framework for beyond 5G wireless networks. It is anticipated that future wireless networks will jointly serve both FL and downlink non-FL user groups in the same time-frequency resource. While in the downlink of each FL iteration, both groups jointly receive data from the base station in the same time-frequency resource, the uplink of each FL iteration requires bidirectional communication to support uplink transmission for FL users and downlink transmission for non-FL users. To overcome this challenge, we present half-duplex (HD) and full-duplex (FD) communication schemes to serve both groups. More specifically, we adopt the massive multiple-input multiple-output technology and aim to maximize the minimum effective rate of non-FL users under a quality of service (QoS) latency constraint for FL users. Since the formulated problem is highly nonconvex, we propose a power control algorithm based on successive convex approximation to find a stationary solution. Numerical results show that the proposed solutions perform significantly better than the considered baselines schemes. Moreover, the FD-based scheme outperforms the HD-based scheme in scenarios where the self-interference is small or moderate and/or the size of FL model updates is large.


I. INTRODUCTION
The use of mobile phones and wearable devices enables continuous collection and transfer of data [1], [2], which has been the main driving force behind the explosive increase in data mobile traffic in recent years.Also, due to a constant growing interest in new features and tools, the computational power of these devices is increasing day by day.Thus, in many applications, part of data processing is carried out at user's wireless devices.In this context, questions over Muhammad Farooq and Le-Nam Tran are with the School of Electrical and Electronic Engineering, University College Dublin, Ireland (e-mail: muhammad.farooq@ucdconnect.ie;nam.tran@ucd.ie).
the transmission of private information over wireless networks naturally arise.To preserve data privacy, a potential solution is to store the data on local servers and move network computation to the edge [3], [4].In fact, data privacy has drawn significant interest in developing new machine learning techniques that can ensure data privacy and exploit the computational resources of users at the same time.One such a promising technique is knowns as Federated Learning (FL) which was first introduced in [5].FL is a decentralized form of machine learning that allows edge devices to learn from a shared prediction model and to keep the data samples on device without exchanging them.Due to this data privacy attribute, FL has been used in a wide range of real-world digital applications e.g., Gboard, FedVision, functional MRI, FedHealth, etc. [6]- [8].
FL has also gained growing attention from the wireless communications research community recently due to its privacy protection and resource utilization features [3], [9]- [14], mainly from the viewpoint of implementing FL over wireless networks.These pioneer studies can be classified as "learning-oriented" or "communication-oriented".The learning-oriented category aims to improve the learning performance (e.g., training loss, test accuracy) subject to inherent factors in wireless networks such as thermal noise, fading, and estimation errors [10], [11].Specifically, in [10], Chen et al. considered user selection to minimize the FL training loss function under the presence of network constraints.Amiri et al. in [11] optimized the test accuracy to schedule devices and allocate power across time slots.The communication-oriented category, on the other hand, focuses on enhancing the communication performance (e.g., training latency, energy efficiency) in the framework of FL [12]- [14].For example, in [12], Yang et al. considered the problem of minimization of the total energy consumption to train the FL model under a latency constraint.Vu et al. in [13] focused on minimizing the training latency under transmit power and data rate constraints.In [14], Tran et al. investigated the problem of optimizing the computation and communication latencies of mobile devices subject to various trade-offs between the energy consumption, learning time, and learning accuracy parameters.All above-mentioned works take into account serving only FL users (UEs).However, it is certain that future wireless networks will need to serve both the FL and non-FL UEs and thus if FL is to be realized, which calls for novel communication designs.We address this fundamental problem in this paper.
The main challenge in jointly serving FL and non-FL UEs is that it requires both the uplink and downlink transmission between UEs and the central server occur simultaneously, which has not yet been studied in the existing literature.To understand this, let us briefly describe a communication round of an FL iteration in the presence of only FL UEs, which consists of four steps: (i) The central server transmits the global update of an ML model to FL UEs; (ii) FL UEs calculate their local model updates based on their local data set; (iii) The local model updates are sent back to the central server; and (iv) The central server calculates the global update by aggregating the received local model updates [15].It is clear that problems arise when there are non-FL UEs that need to be served in the downlink.First and most importantly, in Step (iii), the base station needs to set up a two-way communication channel to implement the uplink of FL UEs and the downlink of non-FL UEs.Second, efficient resource allocation approaches are required at all the steps to control the inter-user interference among FL UEs and non-FL UEs to satisfy their different service requirements.
There are two types of communication schemes that are possible to serve the two-way communication between the central server and UEs, namely half-duplex (HD) and full-duplex (FD).Each of these communication schemes has its own advantages and disadvantages [16].
The main draw back of the FD scheme is the self-interference (SI) between transmit and receive antennas of the BS can cause significant performance degradation, which does not appear in the HD communication.However, for small or moderate SI, the FD communication can approximately double the spectral efficiency compared to the half duplex (HD) scheme [17].
Both HD and FD schemes are popular in the literature of massive multiple-input multiple-output (MIMO) networks [18]- [20].However, they cannot be straightforwardly applied to the massive MIMO systems that serve both FL and non-FL UEs.
In this paper, we follow the communication-oriented approach and propose a novel network design for jointly serving FL and downlink non-FL UEs1 at the same time.First, we propose a communication scheme using massive MIMO and let each FL communication round be executed in one large-scale coherence time. 2 Because of the high array gain, multiplexing gain, and macrodiversity gain, massive MIMO provides a reliable operation of each FL communication round as well as the whole FL process [13].Here, in the first step of each FL communication round, both groups are jointly served in the downlink by the central server and in the third step, either of the HD and FD schemes is considered to serve the uplink transmission of FL UEs and the downlink transmission of non-FL UEs.Next, we formulate an optimization problem that optimally allocates power and computation resources to maximize the fairness of effective data rates for non-FL UEs, while ensuring a quality-of-service time of each FL iteration for FL UEs.
A successive convex approximation algorithm is then proposed to solve the formulated problem.
In particular, our contributions are as follows: • We propose HD and FD communication schemes to jointly serve both FL and non-FL UEs in a massive MIMO network, which has not been studied previously.In the proposed HD scheme, the total system bandwidth is divided equally between the FL and non-FL groups in the uplink of each FL iteration such that both groups are served at the same time in different bandwidths.In the FD communication scheme, both FL UEs and non-FL UEs transmit and receive data in the same time and bandwidth resource under the presence of SI.
• We propose a new performance measure, called the "effective data", which is defined as the amount of data received by the non-FL UEs, per unit latency time taken by FL UEs.Then, we formulate an optimization problem to maximize the minimum effective data subject to a QoS constraint on the execution time for FL UEs.Due to the nonconvexity of the formulated problem, we propose a successive convex approximation (SCA) algorithm to find a stationary solution.
• We provide an extensive set of simulation results to compare the proposed HD-based and FD-based schemes with two baseline schemes: The first baseline scheme makes use of the frequency division multiple access (FDMA) approach to serve each user independently in an allocated bandwidth, while the second baseline scheme considers an equal power allocation (EPA) approach to find the power control.It is observed that the proposed HD and FD schemes provide significantly better solution than two considered baseline schemes.
Numerical results also show that the FD scheme is a better choice than the HD scheme when the size of the model updates is large and/or when the SI is small or moderate.
Notations: Bold lower and upper case letters represent vectors and matrices, respectively.
The notations R and C represent the space of real and complex numbers, respectively.• represents the Euclidean norm; | • | is the absolute value of the argument.CN (0, a) denotes a complex Gaussian random variable with zero mean and variance a. X T and X H stand for the transpose and Hermitian of X, respectively.The operators E{•} and Var{•} represent expectation and variance of the argument, respectively.

A. System Model
We consider a massive MIMO system where a BS serves simultaneously non-FL UEs and FL UEs.We assume that the non-FL UEs are only those receiving data in the downlink transmission.
All FL and non-FL UEs are equipped with a single antenna, while the BS has M transmit antennas and M receive antennas.
To serve FL UEs, the BS acts as a central server.There are four main steps in each iteration of a standard FL framework, i.e., global update downlink transmission, local update computation at the UEs, local update uplink transmission, and global update computation at the BS [5], [14], [21].To serve non-FL UEs, as mentioned above, the BS constantly transmits downlink data to the non-FL UEs at the same time when all four steps of each FL iteration are executed.Thus, the transmission protocol of our considered system can be summarized as the following four steps in each FL iteration: In Step (S3), we need to serve both FL and non-FL UEs.In this regard, there are two types of possible communication schemes: HD and FD.In the HD scheme, the FL and non-FL groups are served in different frequency bands, while in the FD scheme, both groups are served in the same time and frequency resource.During Step (S4), the BS computes its global update after receiving all the local update, the delay of computing the global update is negligible since the computational capability of the central server is much more powerful than those of the UEs.
Therefore, the downlink amount of data received by the non-FL UEs during the fourth step is not considered in the rest of the paper.

B. Proposed Transmission Schemes
We propose to use a scheme in [13] to support FL iterations as in Fig. 1(a).We assume that each FL iteration is executed within a large-scale coherence time.All the FL UEs start each step of their FL iterations at the same time, and wait for others to finish their steps before starting a new step.The global and local updates in Steps (S1) and (S3) are transmitted in multiple (small-scale) coherence interval, as shown in Fig. 1(b).Each small-scale coherence interval in Step (S1) or (S3) includes two phases: channel estimation and downlink or uplink transmission.
In the following, we will provide details of our proposed transmission protocol for both HD and HD modes at the BS in Step (S3).

1) Step (S1):
In this step, the BS wants to send the global updates to all FL UEs via a downlink transmission while simultaneously sending the payload data to all the non-FL UEs.

Channel estimation:
The BS estimates the channels by using uplink pilots received from all the UEs with a time-division-duplexing (TDD) protocol and exploiting channel reciprocity.Let √ ρ p ϕ ℓ ∈ C τ d,p ×1 , where ϕ ℓ 2 = 1, be the dedicated pilot symbols assigned to the ℓ-th FL UE, and √ ρ p φk ∈ C τ 1,p ×1 , where φk 2 = 1, be the pilot sequence assigned to the k-th non-FL UE, where ρ p is the normalized signal to noise ratio (SNR) of each pilot vector.In addition, τ d,p and τ 1,p are the corresponding pilot lengths.We assume τ d,p , τ 1,p ≥ L + K, and the pilots of non-FL UEs and FL UEs are pairwisely orthogonal i.e. ϕ H ℓ φk = 0, ∀ℓ, ∀k, ϕ H ℓ ϕ ℓ ′ = 0, ∀ℓ ′ = ℓ and φH be the channel matrices from the BS to the FL and non-FL groups in Step (S1), respectively.Here, g d,ℓ represents the channel vector from the BS to the ℓ-th FL UE, while h 1,k is the channel vector between the BS and non-FL UE k in Step (S1).We assume Rayleigh fading, i.e., g d,ℓ ∼ CN (0, β ℓ I M ) and h 1,k ∼ CN (0, βk I M ), where β ℓ and βk represent large-scale fading.The minimum mean square error (MMSE) estimate of g d,ℓ can be written as ǧd,ℓ = σ d,ℓ z d,ℓ , where z d,ℓ ∼ CN (0, I M ), and From the property of MMSE estimation, we have that ǫ d,ℓ , ǧd,ℓ , ǫ 1,k , and ȟ1,k are independent, and hence,

Downlink transmission for both FL and non-FL UEs:
The BS encodes downlink data desired for non-FL UE k into the symbol s 1,k ∼ CN (0, 1), ∀k ∈ K, and the global training update intended for the FL UE ℓ into symbol s d,ℓ ∼ CN (0, 1), ∀ℓ ∈ L. Note that the global update is the same for all FL UEs but we use different coding schemes for different UEs.The zero-forcing (ZF) precoding scheme is then applied to precode the symbols for FL and non-FL the signal transmitted at the BS in Step (S1) is given by The transmitted power at the BS is required to meet the average normalized power constraint, i.e., E{|x 1 | 2 } ≤ ρ d , which can be expressed as: (1) The received signal vector collected from all FL UEs is given by where Following [22,Sec. 3.3.2],the effective SINR at the ℓ-th FL UE is given by Similarly, the received signal vector combined from all the non-FL UEs in Step (S1) is given by where n 1 ∼ CN (0, I K ) is the additive noise.Using the fact that Since all the FL UEs starts and end a step together, the achievable rate (bps) of each FL UE is the minimum achievable rate of the FL group, i.e., where B is the bandwidth and τ c is the coherence interval.The achievable rate of non-FL UE k is given by Downlink delay of the FL group: Let S d (bits) be the data size of the global training update of the FL group.The transmission time from the BS to FL UE ℓ ∈ L is given by Amount of downlink data received at the non-FL UEs: The amount of downlink data received at non-FL UE k ∈ K is 2) Step (S2): After receiving the global update, each FL UE ℓ computes its local training update on its local dataset, while each non-FL UE k keeps receiving data from the BS.
Local computation: Each FL UE executes N c local computing rounds over its data set to compute its local update.Let c ℓ (cycles/sample) be the number of processing cycles for a UE ℓ to process one data sample [14].Denote by D ℓ (samples) and f ℓ (cycles/s) the size of the local data set and the processing frequency of UE ℓ, respectively.To provide a certain synchronization in this step, we choose f ℓ = D ℓ c ℓ f Dmaxcmax , where D max = max ℓ∈L D ℓ , c max = max ℓ∈L c ℓ , and f is a frequency control coefficient.The computation time at all the FL UEs of the FL group is the same t C (f ), which is given by [13], [14] Channel estimation for non-FL UEs channel: ] T be the power control coefficients for non-FL UEs.The transmitted power at the BS is required to meet the average normalized power constraint which can be expressed as: The achievable downlink rate (bps) of non-FL UE k, ∀k ∈ K, is given by [22, (3.49)] where SINR 2,k (ζ 2 ) is the effective SINR given as The above equation is similar to (7) except that there is no interference induced by FL UEs in Step (S2).Thus, the total amount of downlink data received at non-FL UE k is 3) Step (S3) using HD: In Step (S3), FL UEs' local updates are transmitted to the BS while data is kept being sent from the BS to the non-FL UEs.To serve both the FL and non-FL UEs, two types of duplex communication schemes are possible: HD and FD operations.Using HD in Step (S3), the system bandwidth is equally divided between the FL and non-FL groups.
, and ȟ3,k are independent.
Uplink transmission of FL UEs: After computing the local update, all FL UEs transmit their local updates to the BS.The signal transmitted from FL UE ℓ is where s u,ℓ ∼ CN (0, 1) is the data symbol, η u,ℓ is the power control coefficient chosen to satisfy the average transmit power constraint, i.e., E {|x u,ℓ | 2 } ≤ ρ u , which can be expressed as The received signal vector at the BS is then given as where η u = [η u,1 , . . ., η u,L ] T and n u ∼ CN (0, I M ) is the additive noise vector.
After receiving signals from all the UEs, the BS applies a ZF decoding scheme for detecting the FL UEs' symbols.With ZF, signal used for detecting s u,ℓ is given by where is the zero-forcing decoding vector.For synchronization, we choose the rates of FL UEs to be the same as the minimum achievable rates in the FL group, i.e., where 1/2 appears in the pre-log factor of the rate comes from the fact that the system bandwidth is equally divided between the FL and non-FL groups, and The above equation is then computed as Downlink transmission for Non-FL UEs: Denote by s 3 = [s 3,1 . . .s 3,K ] T the vector of K symbols intended for K non-FL UEs, and the ZF precoding matrix.Then, the transmitted signal from the BS to the non-FL UEs is given as and ζ 3,k the power control coefficient allocated for non-FL UE k chosen to meet the average normalized power constraint at the BS, i.e., E{| x 3 | 2 } ≤ ρ d , which can be expressed as: For the k-th non-FL UE, the received signal can be written as In the above equation, the term ǫ 3,k is independent of ζ u s 3 .Thus, under HD in Step (S3), the effective SINR for the downlink payload at non-FL UE k is and the achievable downlink rate for non-FL UE k, ∀k ∈ K, is Uplink delay: Denote by S u (bits) the data size of the local training update of the FL group.
The transmission time from each FL UE to the BS is the same and given by Amount of downlink data received at the non-FL group: The amount of downlink data received at the non-FL UE k, ∀k ∈ K, in Step (S3) using HD is and are given by σ 2 SI = β SI σ 2 SI,0 , where β SI represents the pass loss from a transmit antenna to a receive antenna of the BS due to their physical antenna seperation and σ 2 SI,0 is the power of the residual interference at each BS antenna after the SI suppression, respectively.Similar to the HD scheme, the baseband signal is subjected to the average transmit power constraint (17).The received signal vector at the BS in case of FD communication is expressed as where n u ∼ CN (0, I M ) is the vector of additive noise components.Note that SI is caused from transmit antennas of the BS to receiving antennas and thus, the effective noise has an additional SI term caused by the downlink transmission to non-FL UEs.After receiving signals from all the UEs, the BS applies a ZF decoding scheme for detecting the FL UEs' symbols.The detected signal for the ℓ-th FL UE is given by The SINR for the ℓ-th FL UE in case of FD communications is given as Proposition 1.The SINR for the ℓ-th FL UE in case of FD communications given in (31) can be approximated by Proof: Proof of (32) is provided in Appendix A.
For synchronization, we again choose the rates of FL UEs to be the same as the minimum achievable rates in the FL group.
Above equation is similar to (20) except that FL UEs make use to the full bandwidth in the FD communication.
Downlink transmission for Non-FL UEs: In FD, non-FL UEs continue receiving data from the BS in the downlink channel in the presence of FL UEs which simultaneously send the local updates to the BS in the uplink channel.Therefore, the received signal at each non-FL UE contains the inter-group interference (IGI) from the group of FL UEs.To approximate the SINR in this case, the transmitted power at the BS is constrained to meet the average normalized power constraint ( 23) similar to the HD scheme.
The received signal for the k-th non-FL UE can be written as where H IGI ∈ C L×K be the inter-group channel matrix whose elements are modeled as h IGI,kℓ = β 1/2 IGI,kℓ hIGI,kℓ , where β IGI,kℓ is the large-scale fading and hIGI,kℓ ∼ CN (0, 1) is the small-scale fading of the inter-group channel.After the channel estimation, the first term in the above equation can be broken into the estimation term and the error term and thus, the above equation can be rewritten as The effective SINR in the downlink payload at non-FL UE k is given as Note that in the above equation, η u s u } simplifies to ρ u i∈L η u,i β IGI,ki .Thus, the effective SINR can be rewritten as Now, the achievable downlink rate for non-FL UE k, ∀k ∈ K, is Uplink delay: Denote by S u (bits) the data size of the local training update of the FL group.
The transmission time from FL UE ℓ to the BS is the same and given by The above equation is similar to (27) except that the transmission time now depends on power control coefficients from both FL and non-FL UEs.
Amount of downlink data received at the non-FL group: The amount of downlink data received at all non-FL UE k, ∀k ∈ K, in Step (S3) is This equation is also similar to (28) while the only difference is that the downlink rate of k-th non-FL UE and the transmission time of the FL UEs depend on both η u and ζ 3 .

5) Step (S4):
After receiving all the local update, the BS (i.e., central server) computes its global update.since the computational capability of the central server is much more powerful than those of the UEs, the delay of computing the global update is negligible.

III. PROBLEM FORMULATION AND PROPOSED SOLUTION
The problem of fairness among the non-FL UEs in terms of effective data received is one of the key challenges in wireless communications.In this section we first define a new performance metric which is referred to as the effective data rate of non-FL UEs and then formulate the optimization problems to achieve the max-min fairness of non-FL UEs subject to a QoS constraint on the execution time of FL UEs.

A. Effective data rate of non-FL UEs
From the discussions in the preceding section, the data rate of each non-FL UE is changed for different steps.Thus, it is practically reasonable to use the average data rate accounting for all steps as a representative data rate for the system design purposes.More specifically, the total amount of data received by the k-th non-FL UE in Steps (S1)-( S3) is D 1,k +D 2,k +D mode 3,k , where mode ∈ {HD, FD}.Also, the time of each step is determined by the FL UEs.It is obvious that the total time of the three steps is t d + t C + t mode u .Thus, we define the effective data rate for the k-th non-FL UE as In the following we use this definition of the effective data rate for non-FL UEs to formulate max-min fairness problems for HD and FD approaches.

B. HD Scheme 1) Problem Formulation for HD Scheme:
The considered problem for the HD communication scheme can be mathematically expressed as follows: (1), ( 13), ( 17), (23), where The constraint (42d) is introduced to ensure that the time taken by the FL UEs is bounded by t HD QoS .
In the sequel, we denote xHD(n) to be the value of xHD after n iterations.We first note that constraints (45b), (45c), (46d)-(46f) are of the same type in the sense that concave lower bounds of the involving rate expressions are required to obtain convex approximate constraints.To this end we recall the following inequality where x > 0, y > 0 [23, (76)].Applying the above inequality we obtain the following inequalities where , and R HD 3,k (ζ 3 ), respectively.The expressions of these lower bounds are given in (76) in Appendix C. Consequently, in light of SCA, (45b), (45c), (46d)-(46f) are approximated by the following convex constraints To proceed further we note that constraints (47b)-(47d) and (48b) are of the same type.To deal with these, let us recall the following equality Since we need a convex upper bound of the term xy, a simple way is to linearize the term (x − y) 2 .In this way we arrive at the following inequality where x ≥ 0, y ≥ 0, and x (n) and y (n) are the values of x and y at the n-th iteration, respectively [13].Thus, using (53) we can approximate (47b)-( 47d) and (48b) by the following convex n ← n + 1 We now turn our attention to (46b) and (46c).It is obvious now we need to derive convex upper bounds of the rate functions present in these two constraints.To this end we resort to the following inequality where x > 0, y > 0, and x (n) and y (n) are the values of x and y at the n-th iteration, respectively [24, (75)].Using this inequality we can approximate constraints (46b) and (46c) by the following convex constraints where In summary, at iteration n+1, problem (44) is approximated by the following convex problem: where F HD {(1), (13), ( 17), ( 23), (42b), (42c), (43d), (45a), (45d), (46g), (47a), (51), (54), (56)}.
We outline the main steps to solve problem (48) in Algorithm 1.
Remark 1. Algorithm 1 requires a feasible point to start the iterative procedure.In general, it is difficult to find a feasible solution to (48).We now describe a practical way to overcome this issue.It is not difficult to see that by randomly generating and properly the variables in x HD we can meet (1), ( 13), ( 17), ( 23), (42b), (42c).The remaining variables in xHD can be found by letting the corresponding inequality constraint (48) be binding (i.e., occur with equality).If (43d) is satisfied, then we can use this initial point to start Algorithm 1.When the requirements are high (e.g., when t HD QoS is small), it is likely that (43d) is not met.In such cases, we introduce a slack variable s and replacing (57) by the following problem max s≤0,x HD z HD + αs (58a) Intuitively, s represents the violation of (43d) and α > 0 is the penalty parameter.It is easy to see that (58c) is met if s is sufficiently small, and thus (58) is always feasible.On the other hand, the maximization of the regularized objective in (58a) will force s to approach 0 when the iterative process progresses.Thus, when Algorithm 1 converges and if |s| is smaller than a pre-determined error tolerance, we will take xHD * as the final solution.Otherwise, we say that (42) is infeasible.
Remark 2. Similar to Algorithm 1, Algorithm 2 requires a feasible point to (61) which is not trivial for find, especially when the SI is high.To overcome this issue we follow the same procedure as described in Remark 1. Specifically, if scaling randomly generated variables cannot produce a feasible solution, we introduce add a slack variable s and consider the following Then, problem (65) is solved iteratively until convergence.If |s| is smaller a pre-determined error tolerance, we will take xFD as the final solution.Otherwise, (61) (and thus (59)) is said to be infeasible.
where d ℓ ≥ 35 m is the distance between UE ℓ and the BS, z ℓ is a shadow fading coefficient which is modeled using a log-normal distribution having zero mean and 7 dB standard deviation. We

B. Results and Discussions
Since there are no other existing works that study massive MIMO networks for supporting both FL and non-FL groups, we compare our proposed scheme with two baseline schemes as follows.
• BL1: Steps (S1) and (S3) of this scheme have the same designs as shown in the proposed scheme.In Step (S3), the uplink transmission for the FL group and the downlink of the non-FL group are executed using a frequency-division multiple access (FDMA) approach for transmission.In particular, we divide the frequency band into all UEs such that each FL UE or non-FL UE has one single bandwidth slot for its transmission.This FDMA scheme is widely used in FL literature (e.g., [12], [27]).The uplink and downlink rates same mathematical structure as that of the proposed scheme.Therefore, it can be solved by slightly modifying Algorithm 1 using the same approximations.schemes.The convergence plot is shown in Fig. 2. It can be observed that both algorithms converge in less than 30 iterations for both channel realizations.Further, we note that for both channels, FD-based solution provides a better objective than the HD-based solution.
Next, in Figs. 3 and 4, we compare the minimum effective rate of the non-FL UEs by the two proposed schemes and the two considered baseline schemes.As seen clearly, both proposed schemes offer a better performance than the baseline counterparts.The figures not only demonstrate the significant advantage of a joint allocation of power and computing frequency over the heuristic scheme BL2, but also show the benefit of using massive MIMO.Thanks to massive MIMO technology, the data rate of each non-FL UE increases when the number of antennas increases, which then leads to a significant increase in the minimum effective data rates.interference from FL UEs.In BL1, R u,ℓ is very small due to its prelog factor 1 L+K , which leads to a large t u .When the weight t u becomes dominant compared to t d and t C , R k of BL1 is close to R u,ℓ which is much lower than R k of the proposed schemes.As L increases, R u,ℓ decreases further and hence, R k also decreases.
We now investigate the effect of high SI on the performance of the FD-based solution to understand when the FD-based algorithm is superior to the HD counterpart.For this purpose, the minimum effective rate of non-FL UEs is plotted in Fig. 5 for different values of σ 2 SI,0 /N 0 .We also introduce a hybrid scheme which selects the approach that has the better objective among the two.For low values of SI (i.e., upto 65 dB), the FD-based approach performs better, which is expected and thus, the hybrid scheme is the same as the FD-based scheme.On the other hand, for large values of SI (i.e., beyond 65 dB), the effectiveness of the FD-based approach starts to decrease due to the increased SI between the FL and non-FL groups.Especially, the HD-based scheme outperforms the FD-based approach when the SI is around than 80 dB.Thus, the hybrid scheme is equal to the HD-based scheme for very large SI as can be seen clearly in Fig. 5.
In the final numerical experiment, we plot the minimum effective rate of non-FL UEs against

(
S1) The BS sends a global update through the downlink channel to FL UEs.At the same time, non-FL UEs also receive downlink data from the BS.(S2) The FL UEs update their local training model based on the global update and solve their local learning problems to obtain their local updates.During this time duration, non-FL UEs continue receiving downlink data from BS. (S3) The locally computed updates are sent by FL UEs to the BS in the uplink channel while the downlink data is still being sent from the BS to non-FL UEs.(S4) The BS computes the global update by aggregating the received local updates.

Fig. 1 :
Fig. 1: (a): Illustration of FL iterations over the considered massive MIMO network with two groups of FL and non-FL UEs, and two UEs in each group.(b): Detailed operation of one FL iteration of the FL group.
η d and D ζ 1 are diagonal matrices with the elements of η d and ζ 1 on their diagonal, respectively, where the ℓ-th element of η d denoted by η d,ℓ and the k-th element of ζ 1 denoted by ζ 1,k are the power control coefficients associated with the ℓ-th FL UE and k-th non-FL UE, respectively.

) 4 )
Step (S3) using FD: Step (S3) involves transmission in both directions.This motivates us to consider the FD communications to serve both groups of UEs simultaneously.Specifically, FL UEs send their local updates to the BS in the uplink channel and at the same time, non-FL UEs receive the downlink data from the BS.The proposed FD scheme is detailed in what follows.Uplink transmission of FL UEs: In the FD communications, channel coefficients are estimated similarly to what was done in case of the HD communications.FL UEs transmit the locally computed updates to the BS in the presence of non-FL UEs which are receiving the downlink data.Therefore, SI is present between the receiver and transmit antennas of the BS which is denoted by G SI ∈ C M ×M .The elements of matrix G SI are modeled as i.i.d random variables

Algorithm 2
Algorithm for solving (59) 1: Input: Set n = 0 and choose an initial point xFD(0) ∈ FFD We consider a D × D m 2 area where the BS is at the centre, while L FL UEs and K non-FL UEs are randomly distributed.The large-scale fading coefficients are modeled in the same manner as [25, Eq. (46)]:

2 Fig. 3 :
Fig. 3: Minimum Effective rate of non-FL UEs for different values of number of BS antennas.Here L = K = 5.

Fig. 4 :Fig. 5 :
Fig. 4: Minimum Effective rate of non-FL UEs for different values of number of FL UEs.Here K = 5, and M = 50.Moreover, Figs. 3 and 4 also confirm that in each frequency band used for each group, serving all the UEs simultaneously is better than serving them using the EPA approach.Specifically, the proposed approaches outperform BL2 in almost every case.The gap between the proposed