Comments on “On Favorable Propagation in Massive MIMO Systems and Different Antenna Configurations”

It is shown that the condition of <xref ref-type="theorem" rid="theorem1">Theorem 1</xref> in (X. Wu, N. C. Beaulieu, D. Liu, “On favorable propagation in massive MIMO systems and different antenna configurations,” <italic>IEEE Access</italic>, vol. 5, pp. 5578–5593, May 2017) never holds in practice and that <xref ref-type="theorem" rid="theorem2">Theorem 2</xref> is incorrect under the stated condition. Extra assumptions or/and modifications are needed to make the conclusions of <xref ref-type="theorem" rid="theorem1">Theorem 1</xref> and <xref ref-type="theorem" rid="theorem2">2</xref> above valid, which are provided below.


I. INTRODUCTION
In this note, we show that the key condition of Theorem 1 in [1] never holds in practice and that Theorem 2 in [1] is incorrect under the stated condition. This is done by identifying key gaps in their proofs and via a counter-example. Following this, we show that the conclusions of Theorem 1 and 2 do hold if additional assumptions and modifications are introduced. While these additional assumptions are restrictive, they are justified by the physics of radio wave propagation and do hold in many practical scenarios.
Unless stated otherwise, we use the same assumptions and notations as in [1], and assume that all expectation terms exist and bounded; all expectations are with respect to the distribution of all random variables inside of the expectation operator.

II. THEOREM 1 IN [1]
For completeness, we state below Theorem 1 in [1], where M and L are the number of BS antennas and multipath components, g i is the channel vector of (single-antenna) user i, α ri is the propagation gain of multipath component r for user i, which is also ri-th entry of matrix V (see [1] for the channel model and definitions of other related symbols).
Theorem 1 ( [1]): Let W ∈ C M ×L be the steering matrix that consists of L column vectors The associate editor coordinating the review of this manuscript and approving it for publication was Jenny Mahoney. normalization as in e.g. (23), (28), and (35) in [1]. Hence, the condition (2) (which is [1, (15)]) is never satisfied for r = s under this normalization, since for any M , does not matter how large (note that the r = s terms are present in E{z} = E 1 M g H i g k , see [1, (12), (13)], and hence are essential in the proof of Theorem 1 in [1]). Thus, the conclusion of Theorem 1 never holds under this normalization. Furthermore, [1, (26), (31), (38)] also fail for r = s, invalidating the respective conclusions.
It should be noted that the normalization in (5) above, as well as in [1, (23), (28), and (35)], is not arbitrary but quite essential to make the definition of orthogonality, as in e.g. (3) or [1, (16), (19)], consistent. Indeed, let a, b be 2 realvalued vectors. The angle θ between these vectors can be found from where the last equality holds under proper normalization, Since orthogonality means cos θ = 0, this implies the following definition of orthogonality and explains the appearance of 1 M in this definition, which is essential when M → ∞.
To fix this gap of Theorem 1, an extra assumption and a modification are needed, as indicated below.
Proposition 1: Consider the setting of Theorem 1 in [1] and, in addition, assume that the propagation channel vectors of different users are orthogonal to each other ''on average'', i.e.
Then, the condition E{w * mr w ms } 2 we stress again that (11) does not imply 1 M g H i g k → 0, i.e. the FP according to the standard definition, without some additional assumptions.
Compared to [1, Theorem 1], the orthogonality condition in (10) is imposed for r = s only, and the extra condition in (9) is added, which can be motivated by independent scattering environment around each user, see Fig. 1 below, which induces independent small-scale fading around each user and hence α ri and α rk are independent, i = k. This is a standard assumption in the popular Kronecker correlation model, where geographically-separated antennas experience independent scattering and fading [3], [5], [10] as well as in the keyhole channel model [4], [5], [10], [11], [12]. Finally, we point out that (11) holds without expectation (i.e. the standard definition of the FP, where the convergence is in probability) if (9) and (10) hold without expectation as well.

III. THEOREM 2 IN [1]
For completeness, we state below Theorem 2 in [1]. E{w * mr w ms } → 0 (17) is sufficient to guarantee that the FP condition is satisfied, 3 that is
To demonstrate that this Theorem is incorrect, we show that (19) above (i.e. [1, (21)]) and its usage in the proof of Theorem 2 are incorrect in several different ways.
1. Observe that both sides of (19) are complex numbers in general. However, complex numbers cannot be ordered [2] (i.e. one cannot say that one complex number is larger or smaller than the other, in the same sense as for real numbers), unless their difference is a real number, which is not the case here in general. Hence, this inequality cannot be correct in the general case.
2. Even if all terms of (19) are real (or, more generally, the difference of two sides is real), (17) does not imply (18) via (19), since E{z} can be negative (as the scalar product of 2 vectors), so that E{z} = −1 is possible in (19) even if its right-hand-side is zero, e.g. if (17)  with any L ≥ 2, M ≥ 1 (these assumptions may hold, for example, in rank-deficient or keyhole channels [10], [11], [12]). Further assume that random variables {a r } are independent of each other (pair-wise independence is sufficient), zero-mean and have bounded support, e.g. set a r = ±1 with equal probability, so that C α = 1. Straightforward computation gives: yet the right-hand-side of (19) is clearly violating the inequality (19), even for all real variables. 4. Finally note, from Section II, that the right-hand side of (19) does not necessarily converge to zero: if r = s, the respective summation terms add up to LC 2 α > 0 for any M . Also note that (17) is the same as (2), which never holds for r = s, as it was explained above.
In view of the above discussion, a question arises whether the asymptotic orthogonality ''on average'' can be ensured when W and V are not independent (correlated with) of each other. The following Proposition answers this question in affirmative.
Proposition 2: Consider the channel as in [1, (17)], i.e. G = WV, and assume that α ri = a r b i for all r, i, where random variables b i are zero-mean, independent of each other and of a r and w mr , for all m, r, i. Then, the orthogonality ''on average'' holds: (25) Proof: Under the stated assumptions, the following holds: where the last 2 equalities are due to the independence of b i from the other variables and from each other, E{b * i b k } = E{b * i }E{b k } = 0. The setting of Proposition 2 can be slightly broadened by assuming that b i are not correlated with each other and with a * r a s w * mr w ms , instead of being independent. Note that the orthogonality in (25) holds for any M , not just asymptotically, and that α ri and w mr are not necessarily independent (uncorrelated) of each other, since a r and w mr are not. This is due to the joint dependence of a r and w mr on the angle of arrival of incoming multipath component r. Note also that no assumption of bounded support is needed here, just the existence and boundedness of respective expectation terms. The assumed factorization of propagation coefficients α ri = a r b i is justified by the physics of radio wave propagation and is reminiscent of the Kronecker MIMO channel correlation model, where the impacts of scattering environments abound the transmitter (user) and receiver (base station) are separated and independent of each other, which was confirmed experimentally in certain environments [3], and which is a popular model for MIMO channels [5], [10]. This factorization is also observed for rank-deficient or keyhole channels, see e.g. [4,Ch. 7], [5,Ch. 3], [12]; a detailed analysis of keyhole channels can be found in [11], including the factorization property. In our case, b i model scattering around each user, which are independent of each other due to separate locations, and a r model scattering around the base station, which is independent of that of the users, due to the same reason. This is illustrated in Fig. 1.