Combined DL-UL Distributed Beamforming Design for Cell-Free Massive MIMO

We consider a cell-free massive multiple-input multiple-output system with multi-antenna access points (APs) and user equipments (UEs), where the UEs can be served in both the downlink (DL) and uplink (UL) within a resource block. We tackle the combined optimization of the DL precoders and combiners at the APs and DL UEs, respectively, together with the UL combiners and precoders at the APs and UL UEs, respectively. To this end, we propose distributed beamforming designs enabled by iterative bi-directional training (IBT) and based on the minimum mean squared error criterion. To reduce the IBT overhead and thus enhance the effective DL and UL rates, we carry out the distributed beamforming design by assuming that all the UEs are served solely in the DL and then utilize the obtained beamformers for the DL and UL data transmissions after proper scaling. Numerical results show the superiority of the proposed combined DL-UL distributed beamforming design over separate DL and UL designs, especially with short resource blocks.


I. INTRODUCTION
In cell-free massive multiple-input multiple-output (MIMO) systems, multiple access points (APs) are connected to a central unit (CU) and jointly serve all the user equipments (UEs), eliminating the inter-cell interference and providing a uniform service across the network [1], [2].In this regard, network-wide beamforming designs based on mean squared error (MSE) minimization or zero forcing, which require global channel state information (CSI) of all the APs at the CU, outperform local beamforming designs based on maximum ratio transmission/combining, which require only local CSI at each AP [2], [3].However, optimizing the network-wide beamformers at the CU calls for extensive CSI exchange via backhaul links, which poses severe challenges in terms of delays, backhaul bandwidth, and scalability of the network.
In coordinated cellular systems with time-division duplexing (TDD), the beamformers for the multi-antenna APs and UEs can be locally designed using iterative bi-directional training (IBT) with pilot-aided uplink (UL) and downlink (DL) channel estimation [4].Remarkably, IBT applied to cell-free massive MIMO systems enables fully distributed beamforming design at each AP by incorporating effective CSI from the other APs via over-the-air (OTA) signaling [5], [6].Typically, the beamforming design for UEs served in either the DL or UL requires separate IBT procedures.However, it was shown in [7] that using a common IBT procedure for the DL and UL beamforming designs may reduce the IBT overhead and thus enhance the effective DL and UL rates.Nonetheless, [7] was limited to a centralized design (i.e., at the CU) and assumed all the UEs to be served in both the DL and UL.
In this paper, we consider a cell-free massive MIMO system with multi-antenna APs and UEs, where the UEs can be served in both the DL and UL within a resource block.We optimize the DL precoders and combiners at the APs and DL UEs, respectively, together with the UL combiners and precoders at the APs and UL UEs, respectively.In contrast to prior works, we consider a general system model with partially overlapping DL and UL UEs, and propose fully distributed beamforming designs enabled by IBT.To reduce the IBT overhead, we carry out the distributed beamforming design by assuming that all the UEs are served solely in the DL.Then: for the DL data transmission, the DL precoders for the UEs served only in the UL are discarded and each AP's transmit power is redistributed among the precoders for the DL UEs; for the UL data transmission, the DL combiners at the UL UEs are utilized as UL precoders after proper scaling to satisfy the per-UE transmit power constraints.To further reduce the number of precoders at each AP (and thus the IBT overhead), UEs served only in the DL can be paired with UEs served only in the UL, where each pair is assigned a common DL multicast precoder during the IBT.Numerical results demonstrate the superiority of the proposed combined DL-UL distributed beamforming designs over separate DL and UL designs, especially with short resource blocks.

II. SYSTEM MODEL
We consider a cell-free massive MIMO system operating in TDD mode with channel reciprocity, where a set of APs B, each equipped with M antennas, serves a set of UEs K, each equipped with N antennas.Let H b,k ∈ C M ×N denote the UL channel matrix between UE k ∈ K and AP b ∈ B. We assume that both DL and UL data transmissions take place within a resource block.In this context, let K DL and K UL be the sets of DL and UL UEs, respectively, with K DL ∪ K UL = K.The UEs served in both the DL and UL are referred to as DL-UL UEs and are included in the set K DL-UL ≜ K DL ∩ K UL .Likewise, the UEs served only in the DL (resp.UL) are referred to as DL-only (resp.UL-only) UEs and are included in the set K DL-only ≜ K \ K UL (resp.K UL-only ≜ K \ K DL ).Without loss of generality, we assume a single DL and/or UL data stream per UE.An example of the system model is illustrated in Fig. 1. at UE k is where where with gk ,k ≜ b∈B H H b, kw UL b,k .Finally, the DL and UL sum rates (measured in bps/Hz) are given by R x ≜ k∈K x log 2 (1 + γ x k ), with x ∈ {DL, UL}.These rates represent upper bounds on the system performance (see [5], [8]) and are used as main performance metrics in Section V. Next, we present the proposed combined DL-UL distributed beamforming designs with perfect CSI and IBT in Sections III and IV, respectively.

III. BEAMFORMING DESIGN WITH PERFECT CSI
The proposed combined DL-UL distributed beamforming designs are based on the following principles: 1) For each DL-UL UE, each AP adopts the same beamformer (up to a scaling) as precoder for the DL data transmission and as combiner for the UL data transmission.Similarly, each DL-UL UE adopts the same beamformer (up to a scaling) as combiner for the DL data transmission and as precoder for the UL data transmission.This approach exploits the structural similarity between the DL and UL beamformers resulting from separate DL and UL designs [7].
2) Adopting the same beamformers (up to a scaling) for the DL and UL data transmissions enables using a single IBT procedure for all the UEs.This leads to a reduced IBT overhead compared with separate DL and UL designs at the cost of extra interference due to the DL-only or UL-only UEs.
To obtain a fully distributed beamforming design at the APs, we consider the minimization of the sum MSE, which provides some in-built fairness among the UEs (and can be further tuned by means of UE-specific weights) [8].In the rest of this section, we present the proposed DL-UL distributed beamforming design assuming perfect global CSI.The case with imperfect CSI obtained via IBT is described in Section IV.

A. UE-Specific Beamforming Design
Assuming that all the UEs are served solely in the DL, the precoders {w DL b,k } at the APs and the combiners {v DL k } at the UEs are obtained by solving where ρ AP is the maximum transmit power of each AP and b,k } for fixed {v DL k } and vice versa, we use alternating optimization to solve (5) (as in [5]).
Optimization of w DL b,k .For fixed {v DL k }, the optimal w DL b,k is computed by setting ∇ w DL b,k k∈K MSE DL k = 0, which yields where k , and ξ b,k ≜ b∈B\{b} Φ b bw DL b,k .Furthermore, to ensure the global convergence of ( 5) with respect to w DL b,k , each AP updates w DL b,k in ( 7) via a best-response (BR) method, while the term ξ b,k needs to be acquired from the other APs as detailed in [5].
Incorporating all the UEs into the DL beamforming design facilitates the reuse of the DL beamformers for the UL data transmission.Specifically, for a given k ∈ K UL , (6) can be reused as UL precoder at the UE and (7) can be reused as UL combiner at AP b.However, by this approach, the APs unnecessarily aim to cancel interference due to the UL-only UEs in the DL and the DL-only UEs in the UL (reflected by the summations over K UL-only and K DL-only , respectively, in Φ b b).Likewise, the UEs unnecessarily aim to cancel the effective interference due to the UL-only UE beamformers in the DL and the DL-only UE beamformers in the UL (reflected by the summations over K UL-only and K DL-only , respectively, in (6)).
The DL beamformers in ( 7) and ( 6) can be respectively enhanced for the DL data transmission and reused for the UL data transmissions to satisfy the transmit power constraints at the APs and UEs.For the DL data transmission, (7) corresponding to UE k ∈ K DL is scaled to meet the AP transmit power constraint after discarding the UL-only UEs, i.e., a b w DL b,k , with a b ≜ ρ BS / k∈K DL ∥w DL b,k ∥ 2 .For the UL data transmission, (6) corresponding to UE k ∈ K UL is reused as UL precoder and scaled to meet the UE transmit power constraint, i.e., , where ρ UE is the maximum transmit power of each UE.

B. Beamforming Design with UE Pairing
To further reduce the number of precoders at each AP (and thus the IBT overhead), DL-only UEs can be paired with ULonly UEs and each pair is assigned a common DL multicast precoder during the IBT.Let G ≜ 1, . . ., max(|K DL |, |K UL |) and let P g ≜ (a g , b g ) denote an ordered pair of DL and UL UEs, with g ∈ G, a g ∈ K DL , and b g ∈ K UL .In this setting, we have the remaining DL-only or UL-only UEs are paired with a phantom UE indicated by the index 0.In the example of system model in Fig. 1, we have P 1 = (1, 1), P 2 = (2, 4), P 3 = (3, 3), and P 4 = (5, 0).By this approach, the precoders {w DL b,g } at the APs and the combiners {v k } at the UEs are obtained by solving the following virtual multicast problem: with b,g } for fixed {v DL k } and vice versa, so we use alternating optimization to solve (8) (as in [8]).
Optimization of v DL k .For fixed {w DL b,g }, the optimal v DL k is computed as with . This implies that extra interference exists in the DL (resp.UL) when a UL-only (resp.DL-only) UE is paired with a phantom UE.
Optimization of w DL b,g .To reduce the IBT overhead, we use the gradient method to obtain the optimal w DL b,g for a fixed {v DL k } (as in [8]).Accordingly, let us define where ϱ b,g ≜ b∈B\{b} k∈K h b,k h H b,k w DL b,g needs to be acquired from the other APs.Then, the corresponding gradient update of w DL b,g at iteration i is where α GB is the step size.The precoders at AP b are then scaled to meet the per-AP transmit power constraints, i.e., w b,g = āb wDL(i) b,g , such that āb = ρ AP /∥ g∈G wDL(i) b,g ∥ 2 .While this distributed beamforming design offers the advantage of reducing the IBT overhead and considering interference only from other UE pairs, it also incurs a beamforming loss due to unnecessary multicasting in the presence of UE pairs with DL-only and UL-only UEs.Finally, v DL k in ( 9) and w DL b,g in ( 12) are scaled for the DL and UL data transmissions as discussed in Section III-A.

IV. BEAMFORMING DESIGN WITH IBT
The distributed beamforming design described in Section III-A can be carried out at each AP and UE by means of IBT.At each IBT iteration, the DL precoders and combiners for all the UEs are updated at each AP and UE, respectively, via precoded pilots.

A. UE-Specific Beamforming Design
Let p k ∈ C τ ×1 be the pilot assigned to UE k, such that To enable the computation of v DL k in (6) at each UE, each AP b transmits the precoded DL pilots The received signal at UE k from all the APs is where where (•) † is the pseudoinverse operator.Note that the above v DL k converges to (6) as τ → ∞.
Computation of w DL b,k .To enable the computation of w DL b,k in (7) at each AP, each UE k transmits the precoded UL pilot where β is a scaling factor that ensures the per-UE transmit power constraint (equal for all the UEs).The received signal at AP b from all the UEs is where Z UL-1 b is the AWGN at AP b.Then, to reconstruct ξ b,k in (7), each UE k transmits an extra OTA signal obtained by precoding where is the AWGN at AP b.From ( 15) and ( 16), w DL b,k in ( 7) is approximated as which is updated via the BR method [5].As τ → ∞, w DL b,k in (17) converges to (7).

B. Beamforming Design with UE Pairing
The distributed beamforming design with UE pairing detailed in Section III-B can be carried out in a similar way as in Section IV-A.Let p g ∈ C τG×1 be the pilot assigned to pair g ∈ G, such that p H g p ḡ = τ G if g = ḡ and p H g p ḡ = 0 otherwise.Computation of v DL k .To enable the computation of v DL k in (9) at each UE, each AP b transmits the precoded DL pilots X DL b ≜ g∈G w b,g p H g .The received signal at UE k from all the APs is Based on (18), UE k in Note that the above v DL k converges to ( 9) as τ G → ∞.Computation of w DL b,g .The computation of w DL b,k in ( 7) with UE pairing requires the gradient in (10) at each AP.To allow AP b to reconstruct ϱ b,g in ( 11), each UE k in P g transmits the precoded UL pilots Then, each UE k transmits an extra OTA signal as in Section IV-A and the corresponding received signal at AP b is Finally, from (20) and ( 21), AP b obtains after which it computes wDL b,g as in (12) and obtains w DL b,g as in Section III-B.Note that the above δϵ b,g converges to (11) as τ G → ∞.The distributed beamforming design with UE pairing can be implemented using pair-specific pilots, which allows to reduce the IBT overhead compared to the UE-specific design, especially when the number of DL-only and UL-only UEs is large.However, the UE pairing may introduce extra interference, as both paired UEs share a compromised beamforming strategy.

C. Implementation Details
We adopt a strategy involving a single update of the combiners at all the UEs using a DL signal, and a single update of the precoders at each AP using the UL-1 and UL-2 signals per resource block (the DL signals and the UL-1 and UL-2 signals can be co-located with the DL and UL data, respectively).This approach allows for the proposed distributed beamforming design to be an integral part of the IBT process, as depicted in Fig. 2.However, in the first resource block, there is an extra UL-1 signaling, where the UEs initialize the combiners for the precoded pilot transmission.Afterward, each resource block contains the DL, UL-1, and UL-2 signals for IBT.The implementation of the proposed combined DL-UL distributed beamforming design is summarized in Algorithm 1.

V. NUMERICAL RESULTS AND DISCUSSION
We consider B = 25 APs, each equipped with M = 8 antennas, serving |K| = 32 UEs (unless otherwise stated), each equipped with N = 4 antennas.The APs are placed on a square grid with an area of 100 × 100 m 2 and distance between adjacent APs of 20 m.We assume a time-varying channel model where the channel at resource block t + 1 is related to the channel at resource block t as (see Fig. 2) [9]  We consider a carrier frequency of 2.5 GHz, a resource block duration of 5 ms, and κ = 0.967, where the latter corresponds to a UE mobility of 5 km/h (walking speed).The maximum transmit power for both pilots and data is ρ AP = 30 dBm at the APs and ρ UE = 20 dBm at the UEs, whereas the AWGN power at both the APs and UEs is fixed to σ 2 AP = σ 2 UE = −95 dBm.Finally, the average effective DL-UL sum rate (in bps/Hz) at resource block t is computed as R(t) = (1 − r IBT /r tot ) R DL (t) + R UL (t) /2.Here r IBT and r tot are IBT resources and the total number of resources for data and IBT in a resource block, respectively.The minimum number of orthogonal resources for a single IBT iteration for the proposed and reference methods are given in Table I.
Fig. 3    b ) signaling is referred to as sep.OTA and sep.local, respectively, and they exhibit inferior performance due to increased use of IBT resources.However, the IBT resources for these methods are independent of |K DL ∩ K UL |.The beamforming design with global CSI at the CU, referred to as centralized, assumes a backhaul delay of 5 ms for both the DL and UL beamformers, and it also requires UE antenna-specific pilot resources in the UL and UE-specific pilot resources in the DL, resulting in worse performance than the IBT methods.However, the performance of the centralized design can be improved by optimizing the use of backhaul resources and CSI prediction techniques in temporally correlated channel scenarios.OTA and comb.local, resulting in higher performance gains.If the fraction is one, the overhead is the same for all the combined DL-UL methods.However, comb.OTA converges faster than the gradient-based comb.paired OTA, leading to somewhat better performance.The separate DL and UL IBT methods incur high IBT overhead, thus their performance is inferior to the proposed combined DL-UL distributed beamforming designs for any fraction of DL and UL UEs.
where g k represents the index of the pair containing UE k and y DL-grp k ≜ b∈B ḡ∈G H H b,k w DL b,ḡ d DL ḡ + z DL k .Similar to (5), the objective in (8) is convex in {w DL

Fig. 4
plots the effective DL-UL sum rate against the resource block size for t = 10, |K| = 44, |K DL | = |K UL | = 32, and |K DL-UL | = 20.The proposed combined DL-UL distributed beamforming design with UE pairing is referred to as comb.paired OTA.The case where Y UL-2 b is ignored in (22) is referred to as comb.paired local.These designs require fewer IBT resources than comb.OTA and comb.local, resulting in superior performance for r tot ≤ 350.However, for r tot ≤ 150, comb.

Fig. 5 :
Fig. 5: Effective DL-UL sum rate vs. fraction of UEs in the DL and UL.

Table I :
Minimum orthogonal resources required for a single IBT iteration in each resource block for the proposed and reference methods.