Uplink Spectral Efficiency of Large, Distributed Antenna Systems With MMSE Processing

The spectral efficiency of a representative uplink in a wireless network with spatially distributed antennas and linear minimum-mean-square-error processing is found to approach an asymptote with a simple form as the number of antennas increases. These results generalize prior work which was applicable to systems with small numbers of base stations with large numbers of antennas each, to systems with large numbers of spatially distributed access points with small numbers of antennas each, and systems between these extremes. Additionally, these results are applied to systems where mobiles have multiple transmit antennas, and are used to characterize systems with both disjoint and user-centric clustering of cooperating base stations. Among other conclusions, these results indicate that with the same density and number of antennas cooperating to serve a mobile user, the uplink spectral efficiency with all antennas randomly distributed is several-fold higher than the case when the antennas are concentrated at one base station. These findings help improve our understanding of the tradeoffs involved in distributed antenna systems which have the potential to significantly increase data rates, but at a higher cost.


I. INTRODUCTION
Mobile communications systems where antennas serving a given mobile are not co-located at a particular base station (BS) have received significant attention in the literature recently including in Cloud-Radio Access Networks (CRAN), Networked Multiple-Input Multiple-Output (MIMO), and cell-free massive MIMO networks [1]- [13].
With some exceptions such as [5] and [6], most works in the literature do not analytically model the distribution of mobiles and antennas in space, which can provide additional insight into system performance such as the impact of antenna density, and the benefits of distributing antennas in space compared to clustering them at base stations. Of the works directly related to this work, [4] considers maximalratio-combining and zero-forcing receivers and [14] considers a networked-MIMO system with zero-forcing receivers and employed a number of approximations on the spatial user model as well as interference distribution to handle the complexity of analyzing networked MIMO systems. On the other hand, linear minimum-mean-square error (MMSE) The associate editor coordinating the review of this manuscript and approving it for publication was Wei Feng . processing can provide significant performance increases over zero-forcing and maximal-ratio combining, as it is the optimal linear processor to maximize Signal-to-Interference Ratio (SIR), as observed by [15] in the context of massive MIMO, which considers the spatial distribution of antennas and users via simulations. Additionally, [12] considers cell-free massive MIMO systems using stochastic geometry and analyzes the channel hardening effect when matched-filters are used. [16] and [17] considered the uplink of cooperative base station systems with MMSE processing, and provided a simple analytical expression for the uplink spectral efficiency when a small number of cooperative base stations with large numbers of antennas each are used, for Poisson distributed mobiles in [16] and general mobile distributions in [17]. The general framework developed in [16] was applied to two models for base station architectures, namely a user-centric clustering approach, and a hexagonal-cell architecture where the spectral efficiency of a cell-edge mobile was considered, and asymptotic expressions for the spectral efficiency provided for up to six co-operating base stations at hexagonal lattice sites.
In this work, we consider a system similar to [16] and [17]. However, the receiver antennas in our system are arbitrarily distributed within a bounded region and do not have to be clustered at a small number of base stations. In other words, [16] and [17] considered systems depicted on the left of Figure 1. In contrast, this work considers systems with large numbers of antennas, regardless of whether they are located at a small number of base stations (depicted on the left of Figure 1), a small number of antennas per access point (AP) at a large number of APs (depicted on the right of Figure 1), or anywhere in between the two extremes. This difference in assumptions results in a different structure of the random channel vectors and matrices that determine link SIRs in this work as compared to [16], and requires us to use different random matrix techniques to derive the main results. We provide further details on regarding this difference in Section II.
In addition, two further generalizations in this work compared to [16] and [17] are that we allow multiple transmit antennas at each mobile, and we use a slightly more general fading model compared to Rayleigh fading used in [16] and [17]. We also applied the framework developed here to analyze the performance of user-centric clustering, where each mobile is served by its nearest K base stations (sometimes referred to as cell-free systems), antennas at hexagonal lattice sites as well as a disjoint-clustering system, where space is divided into cells and each cell contains K antennas at random positions. In contrast, [16] and [17] consider usercentric clustering and base stations at hexagonal lattice sites, both for small numbers of BSs.
Note that like [14], we assume that the channel state information (CSI) required for the beamformer design is known at the processors. While channel estimation errors do play a significant role in large MIMO systems, our focus here is to characterize the spectral efficiency that can be achieved if the CSI is known. Hence, these results can be used as a bound for systems where CSI errors are significant, or with an appropriate reduction in the spectral efficiencies calculated to account for training time and inaccuracies.
In addition, we make the simplifying assumption that the distributed antennas in this system are sufficiently well synchronized in time as is done in [14], [15], [18] and oth-ers. While practical systems will require synchronization algorithms such as those described in Section 3.3. of [8], we believe that analyzing systems assuming distributed timing synchronization is helpful to understand the performance that could be expected with a well synchronized system.
Further, modern communication systems for human-tohuman communication use efficient error-correcting codes, including capacity-achieving codes such as turbo-codes. As noted in [19] and others, the spectral efficiency (computed using the Shannon formula in this paper) is good approximation for rates achievable in practical modern communications systems. Therefore, we believe that analysis of the spectral efficiency is a useful approximation to the rates achievable in practical systems.
As such, the results in this work apply to distributed antenna systems ranging from systems with a large number of access points each with a small number of antennas (or just one antenna), to systems with a small number of base stations, each with large number of antennas and to systems between these extremes. Thus, the results presented in this work enable us to quantify the performance gains of completely distributing antennas, versus clustering antennas at a small number of base stations. Such a characterization is very helpful for system designers given the significant costs and complexities associated which such systems, and the potentially different costs associated with distributing antennas over a large number of access points versus at a small number of base stations.
In summary, the main contributions of this work are as follows: • The introduction of an approach for analyzing the spectral efficiency of a system with distributed antennas and optimal linear processing when the number of antennas is large, which is a generalization of prior work which is only applicable to systems with a small number of cooperating base stations, with large numbers of antennas each [17]. In particular, this work enables us to characterize the spectral efficiencies achievable when antennas are completely distributed in space, versus concentrated at a small number of base stations. VOLUME 10, 2022 • A method to characterize the spectral efficiency of a system with distributed antennas and optimal linear processing when mobiles have multiple transmit antennas.
• An application of these approaches to both disjoint and user-centered clustering of BS/APs which indicates that under these assumptions, for the same total number of antennas, several fold increases in spectral efficiency are possible if antennas are distributed completely randomly versus concentrated at one base station.

II. SYSTEM MODEL
Assume that there are K antennas at arbitrary positions in a planar network, which are connected to a central processor which jointly processes the signals received at the antennas. Assume that the distance of these antennas to the origin is bounded from above by D. At the origin of the network is a mobile transmitter, called the test transmitter with M antennas. The K antennas jointly detect the transmitted signal from this test transmitter. Other interfering mobiles are also located on the plane such that there is no mobile co-incident with an antenna, and with probability 1 (w.p.1), ∀x ∈ R 2 .
where #B(x, d) is the number of mobiles in a radius d disk centered at x. Thus, ρ is the area density of the mobiles. Note that the results of this work do not depend on the specific model for how the mobiles are distributed in space as long as (1) holds. The reason for this is that when the number of antennas used to detect the signal from the test mobile is large, the MMSE algorithm places deep nulls on the signals from the mobiles that are close to the antennas. Hence, the residual interference at the output of the MMSE receiver is dominated by faraway mobiles. Since signal power decays rapidly with distance according to the inverse powerlaw model, the specific locations of mobiles far away do not influence the spectral efficiency significantly. Note that this result was already shown for systems with a small number of base stations with large numbers of antennas each in [17]. Here, we prove it for the more general model used here, and verify it by simulations detailed in Section VI.
For ease of exposition, in the subsequent analysis, we shall consider the case where M = 1 in all sections except Section IV where we develop the extension to the M > 1 case, and Section VI-D where simulation results for M > 1 are provided.
The channel vector from the ith mobile to the K Rx antennas is h i ∈ C K ×1 : Here, α > 2 is the path-loss exponent, g i,k are independent and identically distributed (i.i.d.) zero-mean, unit-variance random variables and r i,k is the distance from the ith mobile to the kth Rx antenna. Hence r − α 2 i,j g i.j is the flat fading coefficient between the i-th mobile and the j-th antenna. Note that as in [12] and [14], we do not consider shadow fading in this work. Suppose that the distance of the test mobile to the K antennas is bounded from below, i.e. r 0,k > D min > 0. A system diagram showing the test mobile, with just one interferer and just two antennas is shown in Figure 2, where we have suppressed showing additional interferers and antennas in the interest of clarity.
As in [16] and [17] we assume an interference-limited system and neglect noise. Note that if we do incorporate a constant, non-zero noise level in the analysis, in the asymptotic regime as K → ∞, the system will become noise limited at a finite K . As such, the effect of interference will be negligible, resulting in a system that is asymptotically identical to an interference-free network. Since this happens at only an extremely large K , the results will not be useful for practical systems where interference is significant. Therefore, we do not incorporate noise into the analysis of the system, since our focus is on systems with significant interference. However, we do incorporate noise in numerical simulations provided in Section VI.
We write the signal received at the K antennas y ∈ C K as where the test mobile has index 0, and x i is the zero-mean, unit variance symbol transmitted by the i-th mobile. Define the channel matrix H ∈ C K ×n between the n mobiles closest to the origin and the antennas serving the test mobile as: The signal from the test mobile is estimated using a linear MMSE estimator The SIR at the output of the MMSE estimator is Note that the channel vector between mobile i and the K antennas serving the test mobile is given in (2). For systems with a fixed number of base stations and large numbers of antennas which were analyzed in [16], there is a fixed and finite number of distinct r i,k terms. With this restriction, the asymptotic SIR was analyzed in [16] using random matrix techniques developed for Code-Division-Multiple-Access (CDMA) systems with a finite and constant number of antennas in [20]. However, in the case analyzed in this paper, the r i,k terms can all be different and the approach developed in [20] can no longer be used as it relied fundamentally on there being a constant, finite number of distinct r i,k terms. As such, in this work, we used different random matrix approaches including results from [21] and [22] to characterize the SIR in the asymptotic regime assumed here.
For convenience, we have provided a table with the list of the key symbols and notation used in this paper in Tables 1 and 2, respectively.

III. MAIN RESULTS
The SIR of the test link grows with the number of antennas K . We find that a normalized version of the SIR converges in the limit as K → ∞ as described below.
This result is proved using results characterizing the eigenvalue distributions of random matrices whose size go to infinity, for which most techniques involve matrices that go to infinity in both dimensions. On the other hand, since our system involves an infinite number of mobiles, the channel matrices constructed for our system would need to start with an infinite number of columns, making it challenging to directly apply techniques from infinite random matrix theory. To utilize these techniques, we first consider a network with only the n mobiles closest to the origin, which results in the channel matrix H[n] having K rows and n columns. We then take n and K together to ∞ such that n/K = c > 2, which enables us to use approaches from infinite random matrix theory. We will then show that the limit for the normalized SIR when n and K are taken to infinity together, equals the limit of the SIR for an infinite network of mobiles (i.e. n = ∞ to start), in the limit as K → ∞.
To this end, note that the SIR of the test link for the system model with only the first n mobiles is given in (5). A normalized version of SIR[n] converges according to the following lemma which is at the heart of the proof of the main result.
Proof: Please see Appendix A. The next lemma shows that (8) continues to hold i.p. even if we start with an infinite network (i.e. n → ∞), and then take K → ∞. This lemma is very close to Lemma 1 of [16], but has been modified to account for differences in the assumptions in this work.
Lemma 2: The following holds i.p.: VOLUME 10, 2022 Proof: Please see Appendix B. Combining Lemmas 1 and 2 completes the proof. Note thatP, defined in Theorem 1, is the limit of the average path loss from the test mobile to the K antennas used to detect its signal. When compared to prior results from [16] and [17], we see that the limit of the normalized SIR for the distributed antenna system here is equal to the limit of the normalized SIR for a system with a single base station and K antennas, with path loss equal toP.
Applying steps used to prove Theorem 2 of [16] to (7), we have the following in probability as K → ∞ Note that in addition to providing the scaling behaviour of the spectral efficiency with K , (10) can be used to approximate the spectral efficiency when the number of antennas is fixed but large. Further, while (10) holds for a fixedP, if we treatP as a random variable, when K is large, we can approximate the cumulative distribution function (CDF) of the spectral efficiency using the CDF of P = 1 K K i=1 r −α 0,i , which we denote as F P . Using this approach, by direct substitution, we get the following approximation for the CDF of the spectral efficiency with randomly distributed antennas Hence, (11) can be used to characterize the spectral efficiency of the test link under different system parameters, including the statistical properties of P, which is highly dependent on how the antennas serving the test mobile are distributed in space. We apply this expression with different choices of F P in Section V which are then supported by simulations in Section VI.

IV. MULTIPLE TRANSMIT ANTENNAS
Now, assume that each mobile has M ≥ 1 transmit antennas, and that each mobile transmits an independent data stream from each of its transmit antennas. Suppose that x 0 , x 1 , · · · , x M −1 are the M symbols transmitted by the test mobile at a given sampling time. Similarly x M , · · · , x 2M −1 are the symbols transmitted by Transmitter 2, and so on. We shall treat the signals from the interfering mobiles as if they were coming from separate mobiles in the singletransmit-antenna model above. (1) is still satisfied but with the effective density of mobiles of M ρ. We can write the following expression for the received signal y: is the vector of transmitted symbols from the test mobile. The channel vector between antenna i − i/M M of mobile i/M and the K antennas cooperating to detect the signals from the test mobile is given by The matrix H 0 given below is the channel matrix between the antennas of the test mobile and the K antennas cooperating to detect its signal and is given by Additionally, let's define a channel matrix corresponding to all the interfering signals from the first n mobiles as follows and the covariance matrix of all the interfering signals as We define the spectral efficiency η M to equal the mutual information between the transmitted and received signals from the test mobile. Note that with Gaussian code-books used at all mobiles, the mutual information corresponds to an achievable rate with an arbitrarily low probability of error. Hence, With these definitions, we can write the following theorem on the spectral efficiency with multiple transmit antennas. Theorem 2: With M ≥ transmit antennas per mobile, the following holds in probability 23202 VOLUME 10, 2022 If the locations of the antennas serving the test mobile are random, when the number of antennas is large enough that the spectral efficiency is close to the asymptote given in Theorem 2, the CDF of the spectral efficiency can be approximated by In the results described above, we have not assumed much about how the antennas serving the test mobile are distributed in space, as long as there is a large number of them.
In the subsequent discussion, we shall assume that there are N AP/BSs serving the test user, and each AP/BS has L antennas such that K = NL. We consider regimes where L is large and N is small (cooperative BSs with large numbers of antennas per BS), N is large and L is small (large number of APs) and the cases in between the extremes. Here, we assume that the path losses from the antennas of each AP/BS to the test mobile are the same. In other words . . . r 0,K −L+1 = r 0,K −L+2 = · · · = r 0,K Then, we have Note from (10) that with fixedP, the spectral efficiency of the test mobile primarily depends on K and not the specific values of N and L. On the other hand, ifP is not fixed, how the AP/BSs are distributed in space could have different effects on the spectral efficiency. Suppose that the AP/BS locations are deterministic and the test mobile is served by the N closest AP/BSs to it. P, which we use to approximateP with large K , is therefore non-decreasing with N , since it is the average of the path losses from the AP/BSs to the test mobile, which can only decrease if we add farther away AP/BSs. Therefore, increasing the number of APs, while keeping L constant, results in diminishing returns.
However if P is a random variable, the CDF of P depends significantly on N , L, and how the AP/BSs serving the test mobiles are distributed in space. For instance for a fixed total number of antennas K and a fixed density of antennas, if the APs are distributed independently in space, using a large number of access points will be more favorable than using a small number of BSs, because the probability of the test mobile being far away from all the antennas serving it will be lower when a large number of APs are used. On the other hand, the cost of connecting a large number of AP/BSs that are distributed in space can be prohibitive, making it helpful to understand how much performance improvement can be gained by using a large number of access points. To get further insight into these effects, we need to apply specific models of how the AP/BSs are distributed in space.
We adopt two models for how the BS/APs are distributed in space to characterize the spectral efficiency of the test mobile. We consider a disjoint clustering approach, where the AP/BSs are divided into spatially disjoint co-operating clusters as done in [14], and a user-centric cooperation strategy, where the test mobile is served by the N closest APs/BSs to it. The latter system is also known as a cell-free system. By varying the number of AP/BSs N , with a constant total number of antennas K , we can then characterize the performance implications of concentrating a large number of antennas in a small number of BSs and vice-versa under these two models. In the subsequent discussion, we shall assume that the number of antennas K is large enough that the asymptotic expressions hold.

B. DISJOINT CLUSTERING
In disjoint clustering, cooperating AP/BSs are distributed in fixed clusters. For such systems, we assume that the plane is divided into congruent hexagonal cells, and within each cell N AP/BSs are distributed uniformly randomly and independent of each other. Each of the N AP/BSs has L antennas such that the total number of antennas used to detect the test mobile's signal is K = NL. The N AP/BSs cooperate to detect the signal of any mobile that falls within their cell. Each cell has a processor which is connected to all the AP/BSs within that cell. This approach is illustrated in Figure 3a. We use two models for the location of the test mobile, a randomly located test mobile and a test mobile at the cell edge. In the randomly distributed mobile case, the test mobile is assumed to be distributed uniformly randomly in its cell, independent of the locations of the APs/BSs in its cell. Note that while technically we require a minimum distance between the test mobile and the antennas serving it, we do not impose this restriction in the derivations of the CDF of P in order to simplify the resulting expressions. Note that situations where the test mobile gets very close to an antenna will result in large spectral efficiencies and are not in the lowoutage-probability regime which is the regime of interest. Hence, not enforcing a minimum distance in our numerical evaluations does not influence the spectral efficiencies in the regimes of interest.
To make the analysis tractable, following [14], we approximate the hexagonal cells using circles with a radius R, such that the area of the circle equals that of a hexagonal cell. The resulting CDF for the random variable P is given by the following Lemma Lemma 3: with Proof: Please see Appendix C. This expression is used with (11) to approximate the CDF of the spectral efficiency for the disjoint clustering system.
When the test mobile is at the cell edge, the CDF of P is given by the following, which follows directly from the proof of Lemma 3.
Numerical results, including simulations for this case are provided in Section VI.

C. USER-CENTRIC CLUSTERING
With user-centric clustering, a mobile is served by a set of BS/APs which have favourable channels to it (e.g. the closest by Euclidian distance). Such a system is depicted in Figure 3b, where the APs illustrated in bold serve the mobile. Note that in this setup, APs need to be connected to several processors (or all APs need to be connected). In the example depicted in Figure 3b, the dashed bold lines indicate links that would need to be added to the system (as compared to a disjoint-clustering system) in order to serve the mobile shown. As such, user-centric clustering typically results in larger infrastructure and overhead requirements. On the other hand, with user-centric clustering, there is no notion of a celledge, which is the worst-case location for a mobile. Note that if we model the APs in the user-centric scenario as being randomly distributed, the performance of a given link can be quite different depending on whether we use a large number of APs with small numbers of antennas vs. a small number of BSs with large numbers of antennas each. The reason for this is that the probability of being far away from any antenna is relatively smaller in the former case, compared to the latter case.
A commonly used model (e.g. [14]) is to assume that the access points/base stations form a homogeneous Poisson Point Process (HPPP) on the plane. Here we adopt the HPPP model with density ρ b BS/unit area. The test mobile then connects to the N closest BS/APs to it, where K = LN . With this model, the CDF of P is given by the following lemma proved in [16].
Lemma 4: For user-centric clustering with BS/APs distributed as a HPPP with density ρ b , the CDF of P is The coefficients A i,j are defined recursively as follows is the upper incomplete gamma function. N = K /L, is the number of access points or base stations. Assuming that the asymptotic approximation holds, we can approximate the CDF of the spectral efficiency combining (24) and (11). Note that while (24) is complicated, it can be evaluated numerically efficiently, as described in the following section.

A. FIXED ANTENNAS
To verify the accuracy of the analytical results, we simulated a system with AP/BSs at fixed locations serving a test mobile at the origin. We considered a range of values of K and N , and 1 antenna per AP/BS. Note that low numbers of antennas per base station is the regime of interest as systems with small numbers of base stations, with large numbers of antennas each have already been analyzed in [16]. We used a relative density of antennas to users of 20.
Further, in order to verify that as long as (1) holds, the specific distribution model of the mobiles does not impact the spectral efficiencies when the numbers of antennas are large, we consider three different models of how the interfering mobiles are distributed. As a baseline, we consider the HPPP model where the locations of the mobiles are completely independent of one another. To simulate mobile distributions where mobiles tend to form clusters, we use the Matern Cluster Process (MCP) with 10 mobiles per cluster (note that the MCP is a special case of the Neyman-Scott Process [23]). In the disjoint clustering case, the access points within each cluster are all connected to a central processor. In the user-centric case, the mobile is served by base stations/access points close to it (with the bold illustration). Hence, some access points will have to be connected to multiple processors, or all access points have to be connected to a single central processor. The additional fronthaul links required to serve the mobile in the user-centric clustering case as compared to the disjoint clustering approach are illustrated using the bolded dashed lines in (b). We also consider a repulsive model for mobile distribution, namely the Matern type-I hard-core (MHC) process with a repulsion distance of 10 units. In all cases, we set the effective density of mobiles to 10 −3 mobiles per unit area. Note that the HPPP, MCP and MHC models are commonly used in the literature on stochastic geometry for wireless communications (see e.g. [24]). The specific definitions of these processes can be found in references such as [23]. Figure 4 shows a normalized histogram of the spectral efficiency for different numbers of antennas with one antenna per access point. The lines without the markers represent simulations using HPPP mobiles, and the lines with the markers represent the MCP model for the mobiles. Note that the reduction in the standard deviation of the spectral efficiency indicates that the spectral efficiency converges in mean-square, and hence in probability, which is the assertion of the main results of this work. Additionally, for the HPPP case, we observe that while the convergence is not very fast, even with 50 antennas, the standard deviation of the spectral efficiency is below 10% of the asymptotic spectral efficiency, and with 200 antennas, the standard deviation of the spectral efficiency is below 5% of the asymptotic spectral efficiency. Note here that while this convergence is not fast in this case where P is deterministic, when P is random due to the random relative positions of the antennas to the test mobile, the approximation to the CDF of the spectral efficiency given in (11) is quite accurate for small numbers of antennas as shown in the remainder of this section. The reason for this observation is that for even moderately large numbers of antennas, the primary source of variation that impacts the spectral efficiency is the positions of the antennas relative to the test mobile. Figure 5 shows a normalized histogram of the VOLUME 10, 2022 FIGURE 6. CDF of the spectral efficiency for K = 200 and different Ls, for disjoint clustering with cells approximated as circles. The access points/base stations were distributed uniformly randomly within each circular cell. The test mobile is assumed to be distributed uniformly randomly in a circular cell.
spectral efficiency with lines without markers representing the HPPP model, and the lines with the markers representing the MHC model for the mobiles. In both Figures 4 and 5, the distribution of the spectral efficiencies of the MCP and MHC models approach that of the HPPP model, indicating that the specific model of mobile distribution is not significant when the number of antennas is large. Additionally, we expect, the MCP model (i.e. with clustering of mobiles) has a larger standard deviation than the HPPP model. Similarly, for the MHC model (where mobiles are more spaced out and hence more spatially regular), the standard deviation of the spectral efficiency is lower as compared to the HPPP model. Figures 4 and 5 also provide some insight into how the accuracy of the asymptotic approximation changes with increasing K , with the only source of randomness (aside from fading) arising from the mobile distributions. The standard deviation for the HPPP case decreases from 1.38 to 0.87 as the number of antennas K increases from 20 to 400. Additionally, since the asymptotic spectral efficiencies increase with K , the ratio of the standard deviation to the asymptote plays a significant role as well. This quantity varies from 0.17 to 0.05 as the number of antennas increases from 20 to 400. For the MCP, these values range from 0.22 to 0.11, and for the MHC process, they range from 0.14 to 0.04, As such, the asymptotic approximation to the spectral efficiency is moderately accurate in predicting the spectral efficiency when the locations of the antennas serving the test mobile are fixed. On the other hand, as noted above and discussed subsequently, this result is still very useful in predicting the spectral efficiency when the antennas serving the test mobile are randomly located, since the ratios of the standard deviation to the asymptotic spectral efficiencies are small. The mean spectral efficiencies on the other hand are quite well predicted by the asymptotic spectral efficiency, with the deviation between the mean and asymptotic spectral efficiency being less than 1 b/s/Hz with 50 antennas and above, going down to less than 0.1 b/s/Hz at 400 antennas for all three models of mobile distributions.

B. DISJOINT CLUSTERING
We simulated the disjoint clustering scenario described in Section V-B where we placed N AP/BS independently and uniformly randomly in a circular cell, serving a test mobile. We varied the number of AP/BS, N , and the number of antennas per BS, L, such that the total number of antennas serving the test mobile was constant at K = NL = 200. The test mobile was either distributed uniformly randomly in the cell or placed at the cell edge with the radius of the cell set so that the antenna density is 20 times that of the mobile density. 4000 interferers were then placed randomly in the plane with density 0.001 interferers per unit area, in order to simulate Poisson distributed interferers. A path-loss exponent of α = 4. Single transmit antennas per mobile were used in these simulations. Figure 6 shows the simulated spectral efficiencies (with markers) and the theoretical spectral efficiencies (with solid lines) obtained from evaluating (11) together with Lemma 3, where we used a noise power of -96dBm and transmit power of 100mW. From the graphs, it is clear that the theoretical predictions match the simulations well when the number of antennas per AP/BS is 4 or larger, but even when the number of antennas per BS L = 2 or L = 1, the graphs still predict the spectral efficiency moderately well. At an outage probability of approximately 0.1, the theoretical predictions match the simulations to within 0.4 b/s/Hz when L = 2 and within 0.6 b/s/Hz when L = 1.
For a fixed total number of antennas, the convergence of the spectral efficiency CDF to its asymptote is slow when the number of antennas per AP/BS is low. The reason for this difference is that the rate of convergence of the SIR (and hence, the spectral efficiency) is dependent on the convergence of the empirical distribution function of the path losses between the interfering mobiles and the antennas serving the test mobile. If the antennas are all co-located, the convergence of the empirical distribution function of the path-losses is faster with K as compared to the case when the antennas serving the test mobile are distributed randomly in space.
Hence, the convergence of the spectral efficiency when the number of antennas per AP/BS is low is slower with K . The portions of the derivation of Theorem 1 that directly impact this appear in Appendix E and with a note of explanation at the end of that section.
These results indicate that the asymptotic theoretical results are useful approximations to the spectral efficiency for reasonable (in the massive MIMO regime) system parameters. Further, observe that distributing all antennas randomly within the cell results in approximately 2.5 times the spectral efficiency compared to concentrating all 200 antennas at a single base station, at an outage probability of 0.1. Figure 7 depicts the spectral efficiencies of users at a cell edge of a cell with radius such that the ratio of antenna to mobile density is 20. We used K = 200 antennas, while varying N and L such that K = NL, as indicated in the legend. The theoretical predictions match simulations between 0.1 b/s/Hz at L = 50 to 0.6 b/s/Hz at L = 2. The convergence is slower for the test mobile at the cell-edge than a randomly located mobile in the cell because at the cell edge, there is more variation in the distances (and hence path losses) between the test mobile and the antennas serving it as compared to a mobile randomly located in the cell.

C. USER-CENTRIC CLUSTERS
As described in Section V, in the user-centric scenario, the test mobile is served by the BS/APs closest to it. For this scenario, we conducted simulations where a test mobile was placed at the origin of the network and the K /L closest access points to it were used to detect the signal from the test mobile. The AP/BSs were randomly distributed according to a Poisson Point Process with density of antennas equal to 20 times that of the mobiles. Note that this value corresponds to an access point density of 0.1 times the mobile density, if the 200 antennas were located at a single access point. The path-loss exponent, α = 4. Single transmit antennas per mobile were used in these simulations. Figure 8 shows the simulated and theoretical CDF of the spectral efficiencies for K = 100 antennas total, and L = 1, 4, 20, 100 antennas per access point, noise power of -96dBm and transmit signal power of 100mW. The simulated CDFs (illustrated by the markers) were generated from 10000 trials of a circular network with 10000 mobiles. The theoretical prediction using (24) and (11) is plotted with the solid lines. Note that the asymptotic prediction matches the simulations well (within 0.3 b/s/Hz at an outage probability of 0.1), confirming the accuracy of the asymptotic analysis.
The results indicate that at an outage probability of 0.1, placing all antennas at a single base station has approximately 3.4 times lower spectral efficiency compared to completely randomly distributing the antennas. The decrease in spectral efficiency that results from concentrating all antennas at a single base station is due to the fact that the probability of the test mobile being far away from all antennas is higher when they are concentrated at a single base station (L = 100) as compared to distributing them in space (L = 1).  Additionally, to verify that the approach introduced here works for systems where mobiles have different distributions, we plotted the CDF of the simulated spectral efficiencies and theoretical predictions for the system with mobiles distributed as MCPs and MHC processes in Figure 10. The parameters for the spatial distribution of antennas are the same as that described in the first paragraph of this section and the parameters of the MCP and MHC processes used to for the mobiles are as defined in Section VI-A. In both cases, we picked the mobile closest to the center of the network as the test mobile, and translated the origin to be located at the test mobile. At an outage probability of 0.1, the simulations for the MCP (depicted using the 'x' marker) and the simulations for the MHC (depicted using the 'o') marker differ by at most 0.2 b/s/Hz from the theoretical predictions in the simulations we conducted. Additionally, we note that as expected, for low outage probabilities, the spectral efficiencies for the MCP mobiles are slightly lower than that of the MHC process because the MCP involves clustering of mobiles as compared to the MHC process where mobiles are more spaced apart which reduces the likelihood of other mobiles appearing close to the antennas serving the test mobile (which are close to the test mobile). These results verify the utility of the proposed framework for a variety of different modeling assumptions on the mobile distributions.    To illustrate how the accuracy of the asymptotic predictions changes as K increases, we plotted the simulated spectral efficiency of systems with user-centric clustering and different numbers of antennas K , using one antenna per AP/BS in all cases. We used noise level of -96dBm and a transmit power of 100mW for these simulations which are illustrated in Figure 11. At one extreme, with K = 20, the simulated spectral efficiency and its asymptotic prediction differ by 2 b/s/Hz at an outage probability of 0.1. This discrepancy diminishes to 0.1 b/s/Hz for K = 200.

D. MULTIPLE TRANSMIT ANTENNAS
We additionally simulated systems where each mobile has multiple transmit antennas. We considered systems with usercentric clustering, and Poisson distributed AP/BSs as done for the single-transmit-antenna per mobile system described in Section VI-C. We consider M = 1, 2, 3, 4 transmit antennas per mobile. We use low numbers since it is unlikely that mobile devices will have large numbers of antennas. We illustrate two cases, when APs have 4 antennas and when BSs have 100 antennas, with the total number of antennas equaling 200 in all cases in Figure 12. The simulations agree well with the theoretical predictions from (19) for M = 1, 2, 3 transmit antennas per mobile, but less so for M = 4. Even for L = 4 and M = 4, the theoretical predictions match the simulations within 2 b/s/Hz at an outage probability of approximately 0.1, indicating that the theoretical asymptotic predictions are moderately useful in predicting the spectral efficiencies when the mobiles have a small number of transmit antennas.
We further note the diminishing benefit of increasing the number of transmit antennas per mobile since the multiplexing gain obtained by using multiple transmit antennas at the test mobile is offset by the increased burden on suppressing interference since each multiantenna interferer appears as multiple single antenna interferers because we have assumed that independent transmissions are used per transmit antenna. For example, in the 4 antennas per access point case, going from one to 4 transmit antennas per mobile increases spectral efficiency by approximately a factor of 3.5 at an outage probability of 0.1. However, when the antennas are concentrated at a few base stations, e.g. at 2 base stations with 100 antennas each, the spectral efficiency with 3 or 4 transmit antennas are both approximately 1.5 times that of just using a single antenna. The reason for the smaller increase in spectral efficiency in this case is because even with the same total number of antennas, when the antennas are concentrated at a few base stations, the probability of the test mobile being far away from all antennas is larger compared to the case where there are a large number of access points with a few antennas each, distributed randomly in space. Consequently, the probability of the test mobile being in the low SIR regime is higher when antennas are concentrated at a few BSs. When the SIR is low, multiplexing gain with multiple transmit antennas is known to be reduced.

VII. SUMMARY AND CONCLUSION
We have presented an asymptotic technique to analyze the uplink spectral efficiency of a distributed antenna system with linear MMSE processing and a large number of antennas. The system model is general enough that it applies to systems with a large number of single-antenna access points, to systems with a small number of base stations, each with a large number of antennas, and systems between those extremes. Hence, the results of this work generalize the findings in [17] which only apply to the latter case. We applied the approach developed here to a user-centric distributed antenna system with Poisson distributed access points and mobiles, where a mobile is served by the 200 closest antennas to it. We found that with a fixed antenna density, a factor of about 2.8 improvement in spectral efficiency is achieved when the antennas are fully distributed (1 antenna/access point) as compared to 200 antennas all placed at a single base station. Similar increases were also found in the case of disjoint clustering of antennas, where space is divided into cells with a fixed number of antennas distributed independently and uniformly randomly in each cell.
Hence, the analysis presented here can be used by system designers to trade off between placing fewer number of antennas on a larger number of access points, vs. placing larger number of antennas in fewer access points. Given the potentially different cost implications associated with different allocations of antennas and access points, and the overall high cost associated with distributed antenna systems in general, we expect that the approach presented here to be helpful for system designers to trade off cost and complexity vs. spectral efficiency in distributed antenna systems.
Proof: Please see Appendix D.
Next, consider the following where the limit is taken into the expectation by the Dominated Convergence Theorem. The limit in (28) goes to zero from Theorem 3.4 of [21] and the fact that convergence w.p.1. implies convergence i.p.. Hence, from (26) and (28) the following holds i.p.
where the empirical distribution function (e.d.f.) of the eigenvalues ofR is F K (τ ), so dF K (τ ) is a measure with a mass of 1 K at each eigenvalue ofR.
The following lemma shows that (29) converges to the desired form as K → ∞, completing the proof.
Lemma 6: Proof: Please see Appendix E. Note that the specific form of the limit in (65) originates from the convergence of the e.d.f. of appropriately scaled path-losses from the interferers, which was originally derived for systems with small number of base stations in [17], as described in Appendix G.

APPENDIX B PROOF OF LEMMA 2
We use a sandwiching argument here, with the general approach similar to the proof of Lemma 1 from [16], but with details differing due to the more general system assumptions in this work. Let SIR[n] be the SIR when the MMSE estimator constructed for a network with just the first n interferers is applied to the network with all interferers. Recall that SIR[n] is the SIR of a system with just the first n interferers in it. Hence, We have where β is a scale factor on the weight vector which does not impact the SIR as it scales the signal and the interference by the same value. From (4) and (5) Consider the limits as K → ∞, and c → ∞ of the summation term in the denominator of the previous expression.
We will subsequently show that this term goes to zero in probability.
The following lemma helps simplify the summation term in (34), in the limit.
Lemma 7: Consider an L × L, non-negative definite, Hermitian matrix B L , whose minimum eigenvalue is bounded from below byλ > 0 with probability 1. Suppose that a ∈ C L×1 and b ∈ C L×1 comprise zero-mean, uncorrelated random variables with variances σ 2 a 1 , · · · , σ 2 a L and σ 2 b 1 , · · · , σ 2 b L respectively which are bounded ∀L. Then, for some L 0 > 0, for L > L 0 , with probability 1 where the expectation is with respect to a and b. Hence, the following holds in probability Proof: Please see Appendix F. Consider the expectation of the summation term in (34) scaled by K − α 2 .
where the expectation goes into the sum due to positivity of the summands, and q m is comprises independent random variables. Since r i,j are bounded from below, the variance of these variables is bounded from above. Hence for m sufficiently large that r m > D, ≤ max{r α 0,1 , · · · , r α 0,K }(r m − D) −α , (39) with probability 1. Thus, for n sufficiently large that r n+1 > D, with probability 1, we can bound (37) as where the inequality in (40) follows from Lemma 7. Note that the expectation above is over g 0 , and h m . For large enough n such that r n+1 > D, the summation of (r m − D) −α is known to be bounded with probability 1, and since r 0,1 , · · · r 0,K is bounded, and with probability 1, lim K →∞ λ min (R −1 ) is bounded by a strictly positive number from Lemma 5, Therefore (34) converges in probability to zero, and the following holds in probability Thus, the upper and lower bounds of SIR are equal in the limit, completing the proof.

APPENDIX C PROOF OF LEMMA 3
Note that To find the CDF of r −α 0,1 + r −α 0,2 + · · · + r −α 0,N , we first condition on the test mobile being at distance d from the center of the circular cell. Since the AP/BSs are uniformly randomly distributed within the circular cell, the PDF of the distance from the mobile to one AP/BS is equal to the PDF of the distance between a random point in a disk of radius R to another point in the disk at distance d from the center of that disk. Using geometric arguments one can find this PDF to equal (22). Note that this PDF can be found in different forms in references such as [25] and others.

APPENDIX D POOF OF LEMMA 5
First let's defineK = c 1 K , where c 1 is a constant, with 1 < c 1 < c. Since n = cK >K , by the Weyl inequality (see e.g. [27]), Additionally, Since r i,j < rK +1 + D for i = 1, 2, · · · ,K and j = 1, 2, · · · , K with probability 1, the matrix resulting from rightmost sum of (50) is non-negative definite with probability 1. Hence, by the Weyl inequality (see e.g. [27]) Note that the following is known to hold with probability 1 By assumption, we have the following with probability 1 Combining (52) and (54), we have with probability 1. Combining this expression with (51) and (49), and rearranging yields the following in probability Since λ min (D −1 ) ≥ D α min Additionally, it is known from [29] that for sufficiently large K , with probability 1, the matrixR = D − 1 2 K α/2−1 H[n]H † [n] D − 1 2 has no eigenvalues outside the support of its limiting eigenvalue distribution. Therefore for large enough K , with probability 1, no eigenvalue ofR will be less than or equal to the RHS of (57), which is bounded from below by the LHS.

APPENDIX E PROOF OF LEMMA 6
Define the following vectors for i = 1, 2, · · · where r i is the distance of the i-th mobile from the origin. Hence, the vectorh i is equivalent to the channel vector between the i-th mobile and the K receive antennas, if the antennas were all co-located at the origin instead of their actual locations.
The limit of F bs K (τ ) is given by the following Lemma Lemma 8: The following hold w.p.1. and Proof: Please see Appendix G. The following lemma shows that as K → ∞, F K (τ ) does not depend on the three RHS terms in the parentheses of (63), and completes the proof when combined with (66).
Proof: Please see Appendix H. Note that if all antennas were co-located at a given location, all the r i,j terms in the vectorsĥ i in (59) would be equal, resulting in faster convergence to zero of the terms involving H [n] in equation (63).

APPENDIX F PROOF OF LEMMA 7
Denote the i-th entries of the vectors a and b by a i and b i respectively, and the ij-th entry of Terms in the summation with odd numbers of a i 1 and b j 1 terms will go to zero, due to the fact that a i and b i are zero mean, uncorrelated random variables. Therefore, for all the remaining terms, with i 1 = i 2 and j 1 = j 2 , we have Since λ max B −2 L = 1 (λ min (B L )) 2 , for L > L 0 , with probability 1, (35) is proved. The second part follows because (35) converges to zero as L → ∞, and the fact that mean-square convergence and convergence with probability one both imply convergence in probability.
For the next part of the Lemma, using ↑ to denote approach from below we have Additionally from Theorem 6.1 of [21], as n, K → ∞ with n = cK , and c > 2, where e(z) is the unique solution to 1 + 1 c τ e(z) where H (τ ) is the limit of the e.d.f. of K α 2 r −α 1 , K α 2 r −α 2 , · · · , K α 2 r −α K . Additionally, this convergence is uniform in z for any z < 0. Therefore, from Theorem 7.11 of [30], Suppose that r 0,1 = r 0,2 = · · · = r 0,K = 1, which corresponds to the case of a single base station at unit distance from the test mobile. Then, it is known from [31] for Poisson distributed mobiles, and [17] for the general mobile distribution assumed here, that Comparing with (77), we conclude that when it is not necessarily the case that r 0,1 = r 0,2 = · · · = r 0,K = 1, we have The lemma is proved from the definition ofP and for some 0 < δ < 1, define the following matrices Note that B = B a1 + B b1 + B a2 + B b2 + B a3 + B b3 . The following lemmas will be used to show that the matrices above have a diminishing effect on F K (τ ) as K → ∞.
Lemma 10: The following holds i.p. for i = 1, 2, 3 Proof: Please see Appendix I. Lemma 11: The following holds i.p. for i = 1, 2, 3 Proof: Since B b1 is a sum of n 1−δ rank 1 matrices, its rank is ≤ n 1−δ . Hence, Since δ > 0 and n = cK , the upper bound above converges to zero as K → ∞ completing the proof LetR =R bs + R a + R b , where From Lemmas 10 and 11, lim K →∞ 1 K ||R a || 2 F = 0 and lim K →∞ 1 K Rank (R b ) = 0 i.p., respectively. Hence, from Exercises 2.4.3 and 2.4.4 of [22], the e.d.f. of the eigenvalues ofR converges to the e.d.f. for the eigenvalues ofR bs i.p., completing the proof of the first part. The second part follows from Theorem 25.8 of [32] and Lemma 5.
With this notation, we have Lemma 12: With expectation w.r.t. the fading variables: Proof: From (1), we have Additionally, since the antennas serving the test user are contained within a bounded area, for = 1, 2, · · · , K : Taking the variance of (92) w.r.t. g i, and g i,m , and substituting (93) and (94)