Toward Extra Large-Scale MIMO: New Channel Properties and Low-Cost Designs

Extra large-scale multiple-input–multiple-output (MIMO) has been recognized as one of the potential development directions of massive MIMO. By employing even more antennas than massive MIMO in the fifth-generation era, extra large-scale MIMO can further exploit the spatial domain resources and enable ultra-high data rates, low latency communications as well as emerging applications, such as sensing and localization, in sixth-generation mobile communication systems. However, with the increase of the size of the antenna array, and the decrease of the distance between a user and the array, new channel properties, that did not manifest in conventional massive MIMO, start to kick in. Most importantly, existing research strategies pertaining to massive MIMO cannot be directly applied or simply extended to fit the extra large-scale MIMO case. Moreover, increasing the number of antennas will inevitably boost the total cost, which refers to not only the high hardware cost, but also the burden of vast processing and computations as well as the substantial training overhead. In this article, we make a survey on the state-of-the-art on the new channel properties of and low-cost designs for extra large-scale MIMO systems. Particularly, we pursue a mathematical analysis to explain why the new features appear and illustrate how they affect the system model. Furthermore, we summarize and compare the low-cost designs from various perspectives and give our suggestions from a practical deployment point of view.


I. INTRODUCTION
M ASSIVE multiple-input-multiple-output (MIMO), also named as large-scale MIMO, has been a successful enabler to boost the data transmission rate in mobile communication systems in the fifth-generation (5G) era [1], and will keep serving as an important physical layer technology in future mobile communication systems. By employing tens or hundreds of antennas at the base station (BS), massive MIMO produces high spatial resolution and supports multiuser transmission on the same time-frequency resources. As the number of BS antennas grows unconventionally large, the multiuser interference and the uncorrelated noise diminish [2], providing preferable conditions for multiuser transmission. In order to further harness the gain caused by using more antennas, the concept of extra large-scale MIMO has been proposed for the sixth-generation mobile communications [3], [4], [5]. Extra large-scale MIMO employs hundreds or even thousands of antennas at the BS to simultaneously provide service to a certain set of users, and is an augmented version of massive MIMO.
In practical implementations, there are two deployment types of an extra large number of antennas, including the centralized type and the distributed type. The centralized type is a direct extension of 5G large-aperture arrays, where all the antennas are uniformly deployed in a co-located fashion, while we can sustain the half-wavelength distance between two adjacent antennas, forming an extra-large aperture array [6]. Alternatively, the antennas can be confined within a predetermined area, resulting in the concept of holographic MIMO [7]. If the practical environment does not allow the deployment of such a large array, then we can distribute the antennas across multiple sites, corresponding to the distributed type [8]. Each site is equipped with a small amount of antennas. These sites jointly serve a same set of users. A typical example is cell-free massive MIMO [9]. In this article, we focus on the centralized type with an extra large-aperture array.
Compared with distributed systems, synchronization among antennas is much easier in centralized extra large-scale MIMO systems. Moreover, extra large-aperture arrays can cover the external walls of buildings in populated city centres or be employed at stadiums/airports to provide wireless Fig. 1. Extra large-aperture arrays provide opportunities for high-rate data transmission, wireless energy transfer, physical-layer security, sensing, and localization. communication services to a plethora of users. Therefore, in a centralized extra large-scale MIMO system, high beamforming gains can be harvested. Narrow beams with very low sidelobes can be generated by the extra large-aperture array and flexibly steered towards desired direction. Several orthogonal beams can be generated simultaneously, yielding an increase of the spatial-division multiplexing gain. Apart from satisfying the traditional requirements of high data rates, employing an extra large-aperture array enables new emerging applications. For instance, in indoor environments, such as in a factory, the autonomous driving of an electric car can be achieved by leveraging the high spatial resolution provided by such an array. There have been studies on extra large-aperture array-enabled new applications, including sensing and localization [10], [11], physical-layer security [12], [13], wireless energy transfer [14], [15], etc., as illustrated in Fig. 1.
Historically, the study of an emerging wireless architecture begins with the investigation of the propagation channel. Channel modeling of an extra large-aperture array system does not simply mean to expand the array size in a traditional MIMO channel model. With the increase of the array aperture, new channel properties kick in. First, the lower bound of the far field, known as the Rayleigh distance, is proportional to the array aperture. Considering that extra large-aperture arrays will generally be deployed in crowded urban or indoor factory environments, users will be close to the array. Different from traditional MIMO systems, where users are in the far field and signals experience plane wave propagation, in an extra largeaperture array system, there is a high probability that spherical waves will be created. Second, for users who are very close to the array, the pathloss between them fluctuates significantly across the array. If obstacles exist in the channel, then the channel power will be concentrated in a proportion of the array elements, known as the visibility region (VR). The spherical wave propagation and the existence of VR reflect the spatial nonstationarity of the channel. On the one hand, the new channel properties require new channel models for extra largeaperture array systems. On the other hand, these properties facilitate the above-mentioned new applications. Therefore, a deep and comprehensive study of the new channel properties is indispensable.
When translating a theoretical architecture into a commercial technology, the implementation and deployment costs are of pivotal importance. The employment of an extra largeaperture array entails the challenges of high hardware cost, high processing and computational complexity, and high training overhead. Regarding the hardware cost, a fully digital structure, where each active antenna is connected with a unique radio frequency (RF) chain, is unacceptably expensive when the number of active antennas grows large. Inspired by the low-cost designs in 5G millimeter wave systems, active antenna arrays with less RF chains can be adopted. Moreover, with the development of materials, extra large-aperture arrays can take the form of reconfigurable intelligent surfaces (RISs), which have the advantages of low cost and low power consumption. Therefore, the problem of high hardware cost can be tackled via different approaches.
In traditional MIMO systems with a limited number of antennas, signal processing, and computations are centralized at a common module, and the complexity is moderate. However, in an extra large-aperture array system, completely centralized processing and computations result in high complexity and are time consuming. In order to reduce the complexity and the processing latency, two approaches can be followed. One is to directly reduce the complexity of an algorithm in the centralized module. The other is to distribute the processing and computations to multiple local modules, thereby easing the burden in the centralized module. The distributed approach is more attractive, but the information exchange among the centralized module and the local modules affects the overall complexity and needs to be carefully assessed.
In a mobile communication system, an efficient transceiver design heavily depends on the precise knowledge of the wireless channel. The training overhead required to acquire the channel state information (CSI) usually increases with the number of antennas. Then, when an extra large-aperture array is deployed, the training overhead becomes substantial, which is evidently prohibitive for practical systems. Fortunately, the extra-large-dimensional channel shows directionality and sparsity in multiple domains. Traditional sparse channel estimation methods, such as compressed sensing, can be applied to reduce the training overhead. The directionality of a spherical wave channel further supports localization and sensing. Further, the existence of the VR enables overhead reduction among multiple users. The feasibility of lowoverhead communication and sensing, together with low-cost architectures and low-complexity processing and computations, guarantee the practical implementation of an extra large-aperture array.
This article makes a comprehensive survey on the new channel properties and the low-cost designs of extra largescale MIMO systems. Section II investigates the spherical wave propagation by analyzing the channel responses on a point, an antenna, and an array step by step, and provides guidance to the selection of channel models in different fields/regions. With the analytical results on spherical waves, Section III explains why the VR appears and investigates the existing categories of VR and their definitions and models. The spatial nonstationarity is verified theoretically and further taken into account in the subsequent low-cost designs in Sections IV-VI. The low-cost architectures with active antenna arrays and RISs are illustrated in Section IV. A comparison of the hardware cost, implementation and synchronization difficulties, and scalability of different architectures is provided. Then, the low-complexity processing and computation designs are introduced in Section V. Existing methods to reduce the complexity in centralized and distributed processing structures are summarized. Finally, the low-overhead communication and sensing based on the directionality and channel sparsity in the transformation domains are studied in Section VI.
Notations: We use letters in normal fonts, lowercase, and uppercase letters in boldface for scalars, vectors, and matrices, respectively. The transpose, conjugate-transpose, and pseudoinverse are indicated by the superscripts (·) T , (·) H , and (·) † , respectively; | · | represents the absolute value of a scalar or the size of a set; · represents the modulus operation of a vector or a matrix; and E{·} denotes expectation. For a matrix, [ · ] i,: , [ · ] :,j , and [ · ] i,j return its ith row, the jth column, and the (i, j)th entry, respectively. The Hadamard and Kronecker products are denoted by and ⊗, respectively.

II. SPHERICAL WAVE
In a traditional MIMO system, the aperture of the BS antenna array is usually negligible when compared with the distance between it and a user served by the BS. The entire array can be regarded as one point. Thus, a signal sent from the user experiences an equal path loss and has a common angleof-arrival (AoA) when arriving at different antennas of the BS array. Experiencing equal path loss and having a common AoA are two key features of a plane wave, which is typically modeled in the far-field region. However, when the aperture of the BS array grows large, the array cannot be regarded as one point any more. Then, spherical waves kick in and the plane wave model becomes irrelevant. In this section, we will make a comprehensive study on spherical waves.

A. Channel Response on Point
We start from the modeling of channel response. In a three-dimensional (3-D) free space, an isotropic point source  s is deployed at the origin of the coordinate system, i.e., s = [0, 0, 0] T , and radiates electromagnetic (EM) waves in all directions as the blue sphere shown in Fig. 2. For simplification, the transmit power of the point source is assumed uniform as 1. An antenna that covers a surface A with area A is located in the radiative field of s, i.e., p−s λ holds for any point p = [x p , y p , z p ] T ∈ A, where λ = (c/f ) is the wavelength of the EM wave with frequency f , while c is the speed of light. Based on the complexity of the models, we illustrate three channel response models that have been reported in the literature as follows.
1) Channel Response Model 1: The distance between the receiving point p and the source point s is p − s . At this distance, the power of the EM wave spreads uniformly on the sphere with radius p − s . Since the area of this sphere is 4π p−s 2 , the power on each point of this sphere equals [16] Then, the channel response on point p can be expressed as follows: which is referred to as channel response model 1. Model 1 describes an ideal case where the power on point p is perfectly and completely harvested. It requires that the normal direction of p with respect to surface A, denoted as v A (p) ∈ R 3×1 and satisfying v A (p) = 1, exactly matches the radiation direction of the EM wave from source s [17]. As an example shown in Fig. 2(b), the surface A perfectly covers the sphere with radius p − s . Then, for any point p ∈ A, the normal line of p goes across the source s, and the channel response model 1 is applicable.
2) Channel Response Model 2: In practice, patch antennas are widely utilized in mobile communication systems. Under this condition, the surface A of a patch antenna is a square. For a certain point p ∈ A, the normal direction does not always match the EM wave radiation direction. Then, the effective received power is a proportion of γ (p, s), and the proportionality factor is [18], [19], [20] satisfying 0 ≤ F(p, s) ≤ 1. The expression of F(p, s) in (3) is a typical form of the antenna pattern.
The channel response on point p can be derived as follows: which is referred to as channel response model 2. We see that when F(p, s) = 1 holds, model 2 is equivalent to model 1.

3) Channel Response Model 3:
Papers [17], [21] considered the current density of the radiative EM waves from the source s, which is written as follows: where u x = [1, 0, 0] T , u y = [0, 1, 0] T , and u z = [0, 0, 1] T are the unit vectors along the x, y, and z directions, respectively, while J x (s), J y (s), and J y (s) represent the current density in the x, y, and z polarizations, respectively, satisfying the following normalization: Then, the effective received power at point p suffers further from the following proportionality factor: As an example, [17] assumed that only the y direction is excited in J(s), which means J y (s) = 1 and J x (s) = J z (s) = 0. Under this condition We see that η(p, s) = 1 happens when [p − s] 2 2 = 0, i.e., y p = 0.
With η(p, s), the channel response on point p is written as follows: λ p−s (9) which is referred to as channel response model 3.

B. Channel of Antenna
By integrating the response across the entire surface A, the channel between the source and the receiver antenna that covers the surface A is calculated by [17] Here, we provide the following three examples from the literature to illustrate the channel response in different cases. 1) Case 1: In this case, the receiver antenna is isotropic and located at p = [0, 0, z p ] T . The effective area of an isotropic antenna is [16] Under channel response model 1, the channel between the source and the isotropic receiver antenna is derived as follows: Then, the free space path loss seen by an isotropic receiver antenna at distance z p can be expressed as follows: which is in accordance with the model in [22].
2) Case 2: Case 2 illustrates a patch antenna whose surface A is a square plane. For any point p ∈ A, the normal direction v A (p) is orthogonal to the surface A. The area A pat satisfies because the length and the width of the patch antenna are less than or equal to the antenna spacing (λ/2). Let A be parallel with the xy plane. Then, we have v A (p) = u z = [0, 0, 1] T . In [20], the channel on the patch antenna under channel response model 2 was studied. By applying (10), the channel can be approximated by where eA pat is the effective area of the antenna [20], 0 < e ≤ 1 is the proportionality factor, while p c = [x c , y c , z p ] T is the center point of A. Notably, for an isotropic antenna in case 1, its area A iso in (11) is its effective area. In [19], the proportionality factor e was not considered; that is to say, A pat was regarded as the effective area of the antenna. By applying (4), we obtain Then, the channel between the source and a patch antenna is which is equivalent to h CR1 (p, s) in (2). Notably, F pat (p, s) ≤ 1, and the equation only holds for p = p c . Furthermore, when which is exactly |h A,case 1 | 2 .
3) Case 3: Case 3 studies a more complicated modeling of the channel on a patch antenna under channel response model 3, which was considered in [17] and [21]. The patch antenna with area A pat in case 2 is also considered; however, the proportionality factor e related to the effective area is not introduced in case 3. The center point is p c = [0, 0, z p ] T . Then, for any point p ∈ A, its x and y coordinates satisfy The current density J(s) = [0, 1, 0] T . Thus, (8) holds, i.e., By applying (16) and (21), we have Given (20) and (22), according to (10), the channel between source s and the patch antenna is more difficult to derive in case 3 than in case 2. Therefore, [17] provided an upper bound of the channel gain as follows: and If A pat = (λ 2 /4), then Under this condition (27) and this upper bound is π times |h A,case 2 | 2 in (19) because the proportionality factor e is not considered in case 3. Notably, recalling (21), we have η(p, s) < 1, and |h CR3 (p, s)| ≤ |h CR2 (p, s)| holds for arbitrary p ∈ A. Therefore, the upper bound in (27) is not tight.

C. Field Partition of Antenna
According to (2), (4), and (9), the channel response varies at different points on the surface spanned by an antenna. The variance of the channel response across the surface differs when the antenna is at different locations with respect to the source point s. If the antenna is close to s, then the channel response variance is significant across the surface. If the antenna is far from s, then A can be viewed as a point from the perspective of s and the channel response variance is negligible. Based on the magnitude of variance of both the amplitude and phase, the entire field of the source s can be divided into three fields/regions [21], [23].
1) Near field, in which both the amplitude and the phase variations of the channel response are nonnegligible across the surface. 2) Fresnel region, in which the amplitude variance of the channel response is negligible but the phase variance of the channel response is nonnegligible across the surface. 3) Fraunhofer region, also known as far field, in which both the amplitude and the phase variations of the channel response are negligible across the surface. Some research works have considered the two-region partition by focusing only on the phase variance. In [24] and [25], the two regions are the Fresnel and the Fraunhofer regions, where the phase of channel response is dependent on and independent from the distance between transmitter and receiver, respectively. Another two-region partition can be found in [26], [27], [28], and [29], where the two regions were named as near and far fields, respectively. In the near field, a plane wavefront is created, whilst in the far field, a spherical wavefront is created.
1) Rayleigh/Fraunhofer Distance: The Rayleigh or Fraunhofer distance is the boundary between the Fresnel and the Fraunhofer regions or that between the near and the far field [21], [26], [27]. It is defined by the maximum phase variance of the channel response. The maximum phase variance cannot exceed (π/8) [23] in the Fraunhofer region or far field; otherwise, the receiver is in the Fresnel region or near field of the source. From (2), (4), and (9), we see that at point p, and regardless of the channel response model, the phase of the channel response equals Consider the widely used patch antenna in cases 2 and 3 as the receiver, whose surface is parallel with the xy plane and the center p c is on the z-axis. Then, the maximum phase variance can be computed by comparing the channel responses at the center and one vertex of the surface, respectively.
which can be further rewritten as follows: Given that we have By applying (30) and (32), we obtain The patch antenna can also be described by its aperture D pat , which satisfies D 2 pat = 2A pat . Under this condition Therefore, the Rayleigh distance is calculated by 2) Lower Bound of Fresnel Region: Papers [20], [21], and [23] introduced a lower bound of the Fresnel region, which is defined by the maximum amplitude variance of the channel response across the surface. Unlike the variance of the phase, which is captured by the difference, the variance of the amplitude is described by the ratio Denote the lower bound of the Fresnel region as d Fresnel . At distance d Fresnel , the amplitude ratio is equal to a threshold, i.e., = th ∈ (0, 1). The value of th can be cos(π/8) [21], [23], or 0.9 2 [20]. Below this threshold, the variance of amplitude is nonnegligible across the surface. In [23], d Fresnel was regarded as the boundary between the Fresnel region and near-field region. In [21], when d Fresnel < d Rayleigh holds, the region between these two boundaries was named as the Fresnel region.
We still consider the patch antenna above. The amplitude of the channel response has different expressions when different models are applied. According to (2), (4), and (9) Under channel response model 1 and we have By further applying (36), we obtain If th = cos(π/8), then d Fresnel, CR1 ≈ 1.2D pat as given in [21], [23]. Under channel response model 2 Recalling v A (p) = u z , we derive that  (43) as derived in [20]. Similarly, under channel response model 3, by directly applying (22), it can be obtained that d 3 2 Fresnel, CR3 Generally, we have The lower bound of the Fresnel region can be alternatively calculated since the concept of near field is not unique. A Fresnel distance which equals 0.62 (D 3 pat /λ) is defined as the lower bound of the Fresnel region [24], [25], and this distance was also regarded as the upper bound of the reactive near field in [21].

D. Field Partition of Array
The field partition of a single antenna can be extended to that of a multiantenna array [20], [21]. Consider a widely applied uniform plane array (UPA) at the receiver. The UPA is composed of N h × N v antennas, where N h and N v are the numbers of columns and rows which are assumed to be even numbers. The distance between two horizontal or vertical adjacent antennas is (λ/2). The UPA is parallel with the xy plane. The center of the UPA is p c = [0, 0, d] T , where d > 0 is the distance between the source and the UPA. In an extreme case that the antennas are seamlessly deployed as shown in Then, one vertex of the UPA is at The aperture of the UPA is 1) Rayleigh/Fraunhofer Distance: The Rayleigh or Fraunhofer distance of the UPA is still defined by the maximum phase variance across the array, which equals (π/8). Recalling (28) and (29), we can write that Given (31) and after some derivations, we obtain the Rayleigh distance of the UPA as follows: which is still determined by the aperture.
2) Lower Bound of Fresnel Region: Following a similar approach as in the single-antenna case, we further study the lower bound of the Fresnel region of the UPA. Under channel response model 1, by applying (39), we get that Then, the lower bound of the Fresnel region is Comparing (  For Example 1, the aperture of a small-scale array is limited. Then, the values of d Rayleigh and d Fresnel are small. When the array is employed at a 5G new radio (NR) BS, whose serving cell has a width of 200 m, it is very likely that users in the cell are beyond the Rayleigh distance of the array, being in the farfield region. However, for Example 2, although the wavelength of a millimeter wave is small, the aperture of a 512 × 64 UPA is much larger than the small-scale UPA in example 1. The Rayleigh distance of this extra-large array becomes 1.426 × 10 3 m, which is much larger than the size of the serving cell. Then, users are no longer in the far-field region of the array. For some users, their distances from the UPA may even be smaller than d Fresnel .

E. Modeling of Channel Between Source and Array
Now, we study the channel model between the point source and the UPA. Denote by H ∈ C N h ×N v the channel matrix and h(n h , n v ) as the channel on the (n h , n v )th antenna, i.e., Based on the distance between the source and the UPA, which is denoted by d, three models of h(n h , n v ) can be derived [20], [30].

1) Channel Model 1:
This model is for the region d < d Fresnel . Considering that in this region, the channel's amplitude and phase variations are nonnegligible across the array, h(n h , n v ) in model 1 will have different amplitude and phase expressions for different (n h , n v ). That is to say which can be obtained by applying the geometrical information of antenna (n h , n v ) in (10). Channel model 1 is referred to as the spherical wave channel model.

2) Channel Model 2:
This model is for the region of d Fresnel ≤ d ≤ d Rayleigh , where the variance of amplitude is negligible across the array. A same |h(n h , n v )| is shared by all (n h , n v ) and is simplified by |h|, whose value can be assigned by is then expressed as follows: Channel model 2 is referred to as the reduced spherical wave channel model.

3) Channel Model 3:
This model is for the region of d > d Rayleigh , where the variations of amplitude and phase are both negligible across the array. A common |h(n h , n v )| is still applied here. Moreover, a uniformed value ∠h is shared by all (n h , n v ). Similarly, we set ∠h = ∠h(0, 0). Model 3 of h(n h , n v ) is written as follows: We should note that according to (56), all the antennas of the UPA experience the same channel with no difference among them. This stems from the UPA orientation, which is parallel with the xy plane, while its center is p c = [0, 0, d] T . That is to say, the source is exactly on the normal line of the UPA which goes across the UPA center. Then, no path difference exists when the wave arrives at different antennas, and thus no phase difference is introduced among the channels on these antennas, as illustrated in Fig. 4(a). The model in (56) also means that the incident wave seen by each antenna comes from the same direction. That is, a plane wave instead of a spherical wave is experienced at the UPA. Hence, channel model 3 is referred to as the plane wave channel model. Consider a more general case that the UPA is parallel with the xy plane and p c = [x c , y c , z c ] T . As shown in Fig. 3, the included angle between the incident wave and a column of the UPA is The included angle between the projection of the incident wave on the UPA and a row of the UPA is The position of the (n h , n v )th antenna is is the difference between ∠h(n h , n v ) and ∠h caused by the path different shown in Fig. 4(b). If x c = y c = 0, then θ = φ = (π/2) and φ(n h , n v ) = 0, which corresponds to the case in (56). Channel model 3 has been widely applied in the fourthgeneration and 5G systems, since the aperture of antenna arrays is not large and users are in the far-field region of the array [31], [32], [33]. However, in 6G systems, extra large-aperture arrays, such as example 2 in Table I, will be employed. Then, users probably fall in the near-field region or the Fresnel region, and channel models 1 and 2 should be utilized. The presence of spherical waves instead of plane waves is one of the major unique characteristics of extra large-scale MIMO systems. Thus, far-field channel models will become inaccurate in the practical near or Fresnel field [11].

III. VISIBILITY REGION
When a user is very close to an extra large-aperture array, most of the channel power can be captured by only a part of the array. This part of the array is referred to as the VR of the user w.r.t. the array. The VR is another key characteristic in extra large-scale MIMO systems [3], [6], [34], [35]. In this section, we will make a comprehensive study on the origins, definition, and modeling of the VR.

A. Origins of the VR
The VR reflects the uneven distribution of the channel power over the array. There are two major manifestations behind the creation of the VR [6]. One is the unequal path loss across different antennas of the array. The other is the blockage stemming from the obstacles between the user and the array.
1) Unequal Path Loss: When the distance between a user and the array is below d Fresnel , channel model 1 should be applied. Under this condition, |h(n h , n v )|, which reflects the path loss on the antenna array, varies significantly with (n h , n v ). Let us revisit the UPA in Fig. 3, which is parallel with the xy plane while its center is at The source s is still located at the origin of the coordinate system. We analyze the value of |h(n h , n v )| across the UPA under the three cases in Section II-B. For antenna (n h , n v ) in case 1, by applying p n h ,n v in (12), we obtain With d fixed and given (59), p n h ,n v − s has the minimum value at n h = n v = 0 and the maximum value at Recalling (51) and Table I A similar phenomenon can be observed when the UPA is customized as described in cases 2 and 3. Considering that we have This phenomenon can be observed in Fig. 5, where the source is very close to the center of the array. We see that in case 3, the value of |h(n h , n v )| is significantly larger for smaller |n h |.
In an extreme case that the ratio of the minimum and the maximum of |h(n h , n v )| approaches zero, the channel power on a proportion of antennas in the array is negligible; for example, the first and the last columns of the array in case 3 of Fig. 5. The channel power can be captured by a proportion of antennas in the array. Particularly, the channel power is concentrated on the antennas that are close to the source. The channels on the antennas that are much farther to the source are significantly weaker.
2) Blockage Due to Obstacles: An extra large-aperture array can be widely spread on the wall of a building in an urban city. Then, D UPA will be large. Normally, users are usually crowded in an urban environment and may be very close Fig. 6. Blockage of the channels on part of the array caused by obstacles, such as trees and cars. Red and gray squares represent the antennas whose channels are connected and blocked, respectively. to the array. Trees, cars, and infrastructures can be seen everywhere and can all be possible obstacles in the channel between the array and a certain user.
Unlike in the far field, where the entire channel is blocked, in the near field or the Fresnel region, only a part of the array may be blocked. The blocked part of the array is determined by the geometry of the array, user, and obstacle, as shown in Fig. 6. Assume that only a line-of-sight (LoS) path exists. For antenna (n h , n v ), if the line between p n h ,n v and s goes across the obstacle, then the channel on antenna (n h , n v ) is blocked, i.e., |h(n h , n v )| = 0. The blocked part of the array also reflects the shape of the obstacle. As illustrated in Fig. 6, the blocked subarrays in gray color take identical patterns with the tree and the car, respectively. The uneven channel power distribution caused by blockage is independent of that resulting from the unequal path loss.
1) VR of User w.r.t. the Array: In the literature, the VR of a user w.r.t. the array is defined as the part of the array that captures the biggest proportion of the channel power over the entire array [6], [34], [35], [36]. It reflects the sparsity of a user channel in the antenna domain. Denote the VR of a user w.r.t. the array as UA . Then, UA is a set that contains the indices of antennas that the channel power of this user is concentrated on. The following property holds: where 0 < ζ ≤ 1 is a threshold with value close to 1. Note that UA contains the minimum number of antennas that satisfy the requirement of (67). The size of UA is denoted by | UA |.
We first consider the channel under unequal path loss but without blockage. The VR caused simply by a spherical wavefront covers a continuous part of the array. Recall the example in Fig. 5, where UA covers the middle part of the UPA. Note that ζ = 1 only holds when UA covers the entire array. However, when we set ζ < 1, we can still find a proper UA to achieve (67). The UA obtained here is the size of a sliding window that covers the antennas in the array that captures ζ percentage of the channel power. Antennas out of the window still receive nonnegligible power and can be excluded from UA . The VR can be obvious if a blockage occurs. At the first antenna in the blocked subarray, a sharp decrease of the channel power can be observed. Consider now the LoS channel case without any non-LoS (NLoS) paths. Then, the channel power on each antenna in the blocked subarray is zero.
In (67), ζ = 1 can be achieved even though Under this condition, UA contains the antennas that are not blocked by obstacles. If we set ζ < 1, then the size of UA can be even smaller by discarding the antennas with the smallest power. Notably, a VR caused by blockage may not be continuous. One obstacle may block the channels on a continuous subarray. If the blocked subarray is at the array center, then the VR will not be continuous. If multiple obstacles exist, the VR may be composed of several discontinuous subarrays.
2) Two-Tier VRs: In the previous context, we focused on the case that only the LoS path exists in the channel. In practice, the wireless propagation environment is composed of various scatterers. Signals can be scattered and then arrive at the array along NLoS paths as well. Unlike the obstacles that block the signal propagation, scatterers provide new propagation paths and act as intermediate nodes. Then, the one-tier user-array channel becomes a two-tier user-scatterer and scatterer-array channel. Accordingly, the VR of a user w.r.t. the array is further partitioned by the VR of a scatterer w.r.t. the array and the VR of a user w.r.t. the scatterers [6], [39], [46], [47], [48], [49].
The scatterers are usually grouped into multiple clusters. Each cluster includes one or multiple neighboring scatterers. Scatterers in a cluster see the same antennas in the array and can be simultaneously observed by a user. The VR of a cluster w.r.t. the array, denoted by CA , contains the antennas that can be seen by the cluster. This definition is similar to that of the VR of a user w.r.t. the array. A cluster here corresponds to the user above; CA is also named as the cluster VR and is assumed to cover a continuous subarray [47]. The central antenna in CA has the highest channel power [47]. Furthermore, if CA includes all the array antennas, then the scatterers in this cluster are referred to as entirely visible scatterers; otherwise, they are referred to as partially visible scatterers [46].
The VR of a user w.r.t. the clusters, denoted by UC , contains the clusters that can be seen by the user. This is similar to the original concept of VR in COST channel models, which refers to a geometric region where a same set of scatterer clusters can be seen if the user is in this region [50]. If the user moves to another position, then the clusters that can be seen by this user vary. Note that, UC is also named as user VR in [47].
By cascading the two-tier VRs, the VR of a user w.r.t. the array can be obtained. For user k, its one-tier VR and two-tier VRs have the following relation: where the one-tier VR UA,k denotes the VR of user k w.r.t. the array, while the two-tier VRs UC,k and CA,c represent the VR of user k w.r.t. the clusters and the VR of cluster c w.r.t. the array, respectively. As an example in Fig. 7, UC,1 = {1} and UC,2 = {2, 3}. Thus, we have UA,1 = CA,1 and UA,2 = CA,2 CA,3 .

1) Channel Covariance Matrix With VR:
A channel covariance matrix reflects the statistical covariance of channels across different antennas. It has been widely applied in the modeling of multiantenna channels. When the channel experiences correlated Rayleigh fading, the channel between the single-antenna user k and the N-dimensional array can be modeled as [51], [52], [53] where h k ∈ C N×1 is the multiantenna complex channel with zero mean, and R k ∈ C N×N is the channel covariance matrix satisfying This model is equivalent to Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
where h w,k ∈ C N×1 is the small-scale fading coefficient vector, whose entries are independent, identically distributed (i.i.d.) complex Gaussian random variables with zero mean and unit variance. In traditional multiantenna systems, the diagonal entries of R k are nonzero. However, if the VR is introduced, then only the diagonal entries in the VR are nonzero [35], [36], [37], [38], [40], [41], [42], [44]. Moreover, R k shows block sparsity. For n 1 , n 2 ∈ [1, N], if n 1 / ∈ UA,k or n 2 / ∈ UA,k , then R(n 1 , n 2 ) = 0. In a typical case that UA,k covers a continuous region, R k has the following structure: where R UA,k ∈ C | UA,k |×| UA,k | is the covariance submatrix with nonzero entries. Given the block sparsity of R k , the channel model (71) can be further rewritten as follows: where When scatterer clusters are further considered, the scatterers can be regarded as a virtual antenna array [46]. In traditional multiantenna systems, the covariance matrix-based scattering channel model is [52] where R A ∈ C N×N and R S ∈ C S×S are the covariance matrices at the array and the scatterer side, respectively, S is the number of scatterers, and H w ∈ C N×S and h w,k ∈ C S×1 are small-scale fading matrices (vectors). In extra large-aperture array systems, since different clusters of scatterers have different VRs, (74) needs to be rewritten. Assume that the number of clusters is C. In cluster c, there are S c scatterers, satisfying C c=1 S c = S. The total number of scatterers that can be seen by user k isS k = c∈ UC,k S c . By cascading the array-scatterer channel and the scatterer-user channel together, the channel between the array and user k is expressed as follows: where D UC,k = {0, 1} S×S k is the selection matrix that selects the scatterers that can be seen by user k, h w,k CS k ×1 is the small-scale fading vector, while is the separate channel between the array and cluster c, D CA,c = {0, 1} N×| CA,c | selects the antennas that can be seen by cluster c, R CA,c ∈ C | CA,c |×| CA,c | is the covariance matrix across antennas within CA,c , while H w,c ∈ C | CA,c |×S c models the small-scale fading. This model has been applied in [46], [47], and [48]. Channel covariance matrix-based channel models pave the way for the analysis of key performance indicators, such as the signal-to-interference and noise (SINR) [35] and the ergodic capacity [46], which further helps the design of transceivers. 2) Steering Vectors With VR: The discrete physical model is another widely used multiantenna channel model [54], [55]. It focuses on the distinguished paths in the environment. The discrete physical channel model is expressed as follows: where β c,s is the complex coefficient of the path resulting from scatterer s in cluster c, which also represents the response of this path on the reference antenna, and a c,s ∈ C N×1 is the steering vector of the path that involves the difference of response on each antenna w.r.t. the reference antenna. In traditional multiantenna systems, the plane wave channel model 3 in (60) is utilized to construct the steering vector a c,s , satisfying [a c,s ] n = e j φ n , where φ n is expressed in (61). Each element of a c,s has amplitude equal to 1.
In extra large-aperture array systems, when introducing the concept of VR, the limited dimensional channel model becomes [39], [49] where p c = {0, 1} N×1 is the VR mask vector of cluster c with the following entries: The steering vector with VR mask, i.e., a c,s p c , can be regarded as the effective steering vector. Notably, when an extra large-aperture array is deployed, a c,s has the forms of the spherical wave channel models 1 and 2 in (54) and (55). In fact, when applying channel model 1, the entries in the steering vector a c,s have different amplitudes, directly reflecting the VR caused by unequal path loss.
Depending on whether blockage happens or not, the VR mask p c should be set in two different ways. Take an example in Fig. 8. The amplitude of a c,s varies significantly across the array as shown by the blue circles. If an obstacle exists, part of the array is blocked; then, p c covers the red windows where the blockage does not take effect. However, if there is no obstacle, then p c selects the green window that captures the majority of power with the minimum window size. Notably, in the case of no blockage, p c can be an all-one vector, contributing to a precise extra large-aperture array channel model. Introducing a zero-one mask vector will result in an approximated channel model with a reduced dimension, which further helps to reduce the complexity of transceiver design.

D. Spatial Nonstationarity
The spherical wave propagation as well as the VR caused by blockages contribute to spatial nonstationarity, which is the new channel property that appears in extra large-aperture array systems. The concept of spatial stationarity of a multiantenna channel is derived from the wide sense stationarity of a stochastic process [56], where the stochastic process becomes the multiantenna channel h here. Note that the spatial stationarity of a multiantenna channel is different from the stationarity of a time-varying channel [57]. The multiantenna channel h is spatially stationary if the correlation of any two distinct elements of h only depends on the difference of the two-element indices. That is to say holds for arbitrary l ∈ [0, N − 1]. Otherwise, the multiantenna channel h is spatially nonstationary.
If the VR of the user w.r.t. the array does not cover the entire array, then the channel is definitely spatially nonstationary. This is because holds regardless of which channel model from (73), (76), and (78) is applied. Thereafter, we have The requirement for spatial stationarity cannot be satisfied when UA [1, N].
If the VR of the user w.r.t. the array covers the entire array, but the user is in the near field or Fresnel region of the array, then the multiantenna channel h is still spatially nonstationary [58]. Under this condition, the channel models (54) and (55) should be utilized. More specifically, when applying (54), E{[h] n } has unequal amplitudes for different n due to the unequal path loss. Furthermore, the phase of [h] n is dependent on the index n whenever (54) or (55) is applied. An equal phase difference between two adjacent channel entries cannot be supported. Thus, E{[h] * l+m [h] l+n } is dependent on the particular l, m and n, instead of m-n. Spatial nonstationarity of a near-field channel has been mathematically verified in [59] and experimentally observed through measurements in [60] and [61].

IV. LOW-COST EXTRA LARGE-APERTURE ARRAY ARCHITECTURES
The new channel properties brought by an extra largeaperture array will inform the hardware and transceiver design. The multiantenna arrays used in traditional systems do not have a large size, and a fully digital architecture is widely employed to connect each active antenna with a unique RF chain. However, with the increase of the antenna array size, the fully digital architecture with high resolution will be expensive and not suitable for practical applications. Low-cost architecture designs are of great importance for the commercial deployment of extra large-aperture arrays. Moreover, for an active antenna array, each antenna is driven by a power amplifier (PA) or a low noice amplifier (LNA) and has the ability to transmit and receive wireless signals. Thereafter, the power consumption of an active antenna array is usually large as well. Fortunately, the new channel features provide room for cost reduction. By jointly considering the hardware and power cost as well as the new channel properties, in this section, we will introduce the potential low-cost extra large-aperture array architectures.

A. Active Arrays With Less RF Chains
Research in this type of architectures originates in the beginning of the 5G era [62], [63], [64], [65], [66], [67], [68], [69], [70], [71]. A large array with massive active antennas is controlled by a small amount of RF chains. The numbers of active antennas and RF chains are denoted as N and N RF , respectively, satisfying N N RF . One RF chain is connected to one or multiple antennas and controls them through RF devices, such as phase shifters (PSs) and/or switches. After analog processing, such as analog beamforming, combining, and selection in the RF module, a base band (BB) processing is further applied among the signals on these RF chains. Therefore, a hybrid RF and BB structure is modeled as follows: where y ∈ C N×1 is the signal received at the antennas, F RF ∈ C N RF ×N is the RF processing matrix, F BB ∈ C K×N RF is the BB processing matrix, K is the number of data streams, and r ∈ C K×1 is used for signal detection. The format of F RF is determined by the type of connections among the RF chains and antennas as well as the type of RF devices on each connection.

1) Connection Type:
The connection type directly determines the hardware cost, transceiver design, transmission performance, as well as the scalability of the architecture. Generally, there are two main types. One is the single-RF chain single-antenna type, and the other is the single-RF chain multiple-antenna type [69], [70].
1) Single-RF Chain Single Antenna: When this connection type is adopted, a single RF chain can be only connected to a single antenna. A switch is required at each RF chain to enable antenna selection, that is, to determine whether this RF chain is activated and which antenna it is connected with. If the RF chain is activated, then only one antenna will be connected with it. A total of N RF switches are deployed. No PSs are needed because beamforming is solely implemented at the BB module. Antenna selection can be further categorized into two types, including full array selection and partial array selection. Full array selection enables an RF chain to connect with any antenna in the array. Partial array selection means that each RF chain can select from a subarray which is physically closest to it. For a certain RF chain, the partial array for antenna selection is usually fixed, and the size of the partial array is determined by the sweeping space of the switch at the RF chain side. Partial arrays corresponding to different RF chains can be disjoint or overlapped. If two RF chains select antennas from a same partial array, then their selection strategy needs to be different. 2) Single-RF Chain Multiple Antennas: When applying this connection type, a single RF chain can be connected with multiple antennas. Signal combination or beamforming is achieved at the RF module, and then the array gain can be harvested. Most studies focus on this connection type. Similar to antenna selection, one RF chain can be connected with the full array or a partial array close to it, corresponding to the full array connection structure and the partial array connection structure, respectively. In the full array connection structure, each antenna can be connected with all the RF chains and vice versa. A unique physical link is established between each RF chain and each antenna. In each link, a PS can be deployed at the antenna side to enable analog beamforming, or an ON/OFF switch can be deployed at the antenna side to reduce the cost and achieve a simple signal combination. A total of N RF N PSs or ON/OFF switches are required in the full array connection structure.
In the partial array connection structure, one RF chain can be connected with a proportion of antennas, but one antenna can be connected with only one RF chain. For a certain RF chain, the partial array that can be connected with is fixed or dynamic. In the former case, a physical link exists between the RF chain and each antenna in the partial array. In each link, a PS or an ON/OFF switch can be deployed at the antenna side as well. The size of each partial array or subarray is fixed, and a total of N PSs or ON/OFF switches are required in the fixed subarray structure. In the latter case, apart from these PSs or ON/OFF switches, an extra switch is employed at each antenna to determine which RF chain it will be connected with. Notably, unlike the ON/OFF switch, this switch is used for RF chain selection and it sweeping space covers all the RF chains. No switch is further needed at the RF chain side for antenna selection. The size of each subarray can be adjusted in a real-time manner. This structure is more suitable for extra large-aperture array systems under spatial nonstationarity.
2) Component Type: Now, we turn our attention on the three component mentioned above, including the PS, the ON/OFF switch, and the switch for selection.

1) PS:
A PS can adjust the phase of an RF signal. It is a key enabler of analog beamforming in multiantenna systems. When PSs are deployed, the RF matrix F RF is called the analog beamforming matrix, contributing to the hybrid beamforming structure together with the BB precoding. However, the cost of a PS is analogous to its operating frequency, as well as its resolution. 2) ON/OFF Switch: An ON/OFF switch can be turned ON or OFF to determine whether the signal can pass through the connection. When a switch is in the physical link between one RF chain and one antenna, this connection can be activated or inactivated by choosing the ON and OFF status, respectively. The cost of an ON/OFF switch is significantly lower than that of a PS, but the insertion loss is a major problem.

3) Switch for selection: A switch for selection has a sweep-
ing space and can be connected to one of the physical links in this sweeping space. It can be deployed at the RF chain side to achieve antenna selection, or be deployed at the antenna side to make RF chain selection. A switch for selection is more expansive than an ON/OFF switch.

3) State-of-the-Art Architectures:
The various connection types and device types can jointly form many different combinations, each corresponding to a particular architecture. Here, we introduce the architectures that have appeared in existing studies, which are listed in Table II, and make an analysis on their signal model, advantages, and drawbacks. a) Single-RF chain single antenna in full array selection: This is the traditional antenna selection architecture as shown in Fig. 9 (i). In this architecture, we have [ Since the sweeping space of a switch is confined, this architecture is widely adopted in traditional multiantenna systems due to the limited array size. However, when employing an extra large-aperture array, it may be impractical to find a switch that could be connected to all antennas in a massive array. b) Single-RF chain single antenna in partial array selection: This architecture is more easily implemented in an extra large-aperture array system. Considering the scalability issue as well, a subarray-based antenna selection architecture is naturally considered. As shown in Fig. 9(ii), the entire array is composed of multiple subarrays. Each subarray has completely the same topology, including the number of antennas and the number of RF chains. Denote the number of subarrays as B. Then, each subarray has (N/B) antennas and (N RF /B) RF chains. One RF chain can select no more than one antenna within the same subarray, and one antenna can be selected by no more than one RF chain within the same subarray. Thus, F RF has a block diagonal structure When applying this architecture, multiple identical subarrays can be directly combined together to construct an extra large-aperture array. This scalability facilitates the design, fabrication and production of the array. Moreover, local antenna selection within a single subarray is supported, thereby giving room for complexity reduction. However, the selected antennas are usually discontinuous and cannot cover a continuous VR. Thus, the array gain will be compromised.
c) Single-RF chain multiple antennas in full array connection with PSs: This is the widely studied full-connection hybrid beamforming architecture in 5G millimeter wave systems [65], [66], [67], [68], [69], [70] and has been considered in the extra large-aperture array system as in [29]. As shown in Fig. 9(iii), each RF chain is connected with all antennas through PSs. The RF matrix F RF has the following format: where f RF,i ∈ C N×1 is the analog beamforming vector in the ith RF chain with [f RF,i ] j = e jφ i,j , and φ i,j ∈ [0, 2π ] is the phase shift introduced by the PS in the physical link between RF chain i and antenna j. In sparse channel conditions, the performance of this architecture can be very close to that of the fully digital architecture. However, this architecture has the following drawbacks. First of all, both the cost and the energy consumption of N RF N PSs are high. Second, with the increase of the array size, the length of the transmission line that connects the antenna array edges grows, and the transmission latency differs significantly across the array. Then, the synchronization across antennas in an RF chain becomes problematic. Third, this architecture lacks scalability. If the array is expanded and more antennas are added to the array, then an equal amount of components need to be added to each RF chain as well, and hence, the structure of each RF chain will change. Finally, integrating such a large number of PSs in an RF module is difficult. Therefore, this architecture is not recommended for extra large-aperture array systems. d) Single-RF chain multiple-antennas in full array connection with ON/OFF switches: This architecture is a reduced version of architecture iii by replacing the expensive PSs with low-cost ON/OFF switches as illustrated in Fig. 9(iv). The RF matrix F RF sustains the format in (87). The difference is that [f RF,i ] j ∈ {0, 1} for i = 1, . . . , N RF and j = 1, . . . , N. Note that this architecture is not subject to the antenna selection constraints in (84). It can simultaneously achieve antenna selection and dynamic partial array connection. However, it also entails integration, synchronization, and scalability challenges. e) Single-RF chain multiple antennas in fixed partial array connection with PSs: This is the well known subarray hybrid beamforming architecture [63], [64], [68], [69], [70]. In Fig. 9(v), each subarray has equal size with only one RF chain and (N/N RF ) antennas. The RF matrix F RF also has a block diagonal structure where f RF,i ∈ C (N/N RF )×1 is the RF vector in the ith subarray with [f RF,i ] j = e jφ i,j for i = 1, . . . , N RF and j = 1, . . . , (N/N RF ). Given an equal number of RF chains, the performance of this architecture is inferior to that of architecture iii. However, the number of PSs in this architecture is much smaller, thereby the cost is greatly reduced. Moreover, the synchronization problem ceases to exist, since antennas connected with the same RF chain are within a single subarray, whose size is usually limited. Besides, the array size can be easily scaled up by using more subarrays. For all these reasons, this architecture finally managed to earn a commercial deployment opportunity in 5G. f) Single-RF chain multiple antennas in fixed partial array connection with ON/OFF switches: This architecture is deduced from architecture v by replacing PSs with ON/OFF switches as shown in Fig. 9(vi). The RF matrix F RF still follows the format (88). The difference is that [f RF,i ] j ∈ {0, 1} for i = 1, . . . , N RF and j = 1, . . . , (N/N RF ). The constraints in (86) do not need to be considered. Notably, when the array size is large, the channel power will be concentrated on the VR of the user w.r.t. the extra large-aperture array. Note that the VR caused by an unequal path loss usually covers a continuous part of the array. To increase the energy efficiency, only the continuous part of the array in VR can be turned ON. That is, the effective size of each subarray is dynamic as well. This architecture also has the advantages of easy synchronization and scalability as well as the lowest hardware cost (only N ON/OFF switches), thereby becoming suitable for extra large-aperture array systems [44].
g) Single-RF chain multiple antennas in dynamic partial array connection with PSs: The concept of a dynamic partial array or dynamic subarray appeared in [71]. It is an improved version of architecture v. As the name suggests, segmentation of the subarrays can be flexibly adjusted instead of being fixed. That is, even though F RF follows the format in (88), the size of f RF,i varies with i. Each antenna can be flexibly connected to an arbitrary RF chain or be deactivated. This architecture not only harvests the array gain but adjusts the effective subarray sizes based on the real-time channel condition. Equally importantly, when a VR exists, dynamic subarrays are verified to achieve better performance than fixed subarrays [6].
However, this architecture has the following drawbacks. First, it is hard to implement. There are two solutions denoted as architectures vii-1 and vii-2, respectively. The first solution is to deploy a switch for selection at each antenna to select one of the N RF RF chains. 1 An extra combiner is actually needed at each RF chain to enable the connection with multiple antennas, as shown in Fig. 9(vii-1). However, it is difficult to connect such a combiner with N switches for selection at the antenna side. The second solution is to modify architecture iv by adding a PS before each antenna in Fig. 9(vii-2). However, the integration of NN RF ON/OFF switches and N PSs and the synchronization among them are challenging. Moreover, the lack of scalability further makes this architecture hard to be deployed in practical extra large-aperture array systems. 4) Proposed Double-Layer Architecture: Considering the advantage of dynamic subarrays as well as the practical implementation and scalability, in this article, we integrate the full-connection and subarray structures and propose a doublelayer architecture, which is refered to as architecture viii. As shown in Fig. 9(viii), the outer layer follows the fixed subarray structure, and the inner layer follows the dynamic subarray structure. The extra large-aperture array is composed of B physical subarrays. Each physical subarray has the same hardware topology, including ( Architecture vii is adopted in each physical subarray. For convenient implementation, a physical link is established between each RF chain and each antenna in the common physical subarray. An ON/OFF switch is deployed in each physical link. For a certain antenna, only one RF chain can be selected, and thus no more than one physical link connected with this antenna is finally turned ON. To enable analog beamforming, each antenna is further equipped with a PS. A total of (N RF N/B 2 ) ON/OFF switches and (N/B) PSs are integrated in a physical subarray.
In the proposed double-layer architecture, the RF matrix F RF ∈ C N RF ×N follows the block diagonal structure in (85). The submatrix F RF,b ∈ C (N RF /B)×(N/B) has the following format: The column vector is the phase shifting vector. If the ith RF chain is activated, then it controls an effective subsubarray. The effective subsubarray has a dynamic size, which depends on the number of antennas whose physical links with the ith RF chain are turned ON. Analog beamforming is also supported within the effective subsubarray, and thus the array gain can be harvested. The proposed double-layer architecture sustains the advantage of easy synchronization and scalability of the subarray structure. Equally importantly, the hardware cost is greatly reduced compared with architecture vii. The insertion loss is substantially mitigated by using much less switches. This architecture also can harvest the full array gain by activating all antennas in a subarray simultaneously. Alternatively, in spatial nonstationary channel conditions, we can only activate the antennas where the biggest proportion of channel power is concentrated in. For the above-mentioned reasons, this is a potential architecture for extra large-aperture arrays. Table III summarizes the hardware cost, advantages, and disadvantages of the nine architectures, including the two solutions of architecture vii and the proposed architecture viii. Considering the scalability, architectures ii, v, vi, and viii are promising in the deployment of an extra large-aperture active array with less RF chains.

B. Reconfigurable Intelligent Surfaces
Another low-cost extra large-aperture array is the RIS [73], [74], [75], [76], [77], which is also known as metasurface [75], [78], [79], or intelligent reflecting surface (IRS) [80]. An RIS is composed of low-cost near passive unit cells, each with independently tunable EM responses controlled by external signals. An incident EM wave can be reflected or refracted by the RIS, or the reflection and the refraction happen simultaneously [81], [82], [83]. An RIS flexibly adjusts the amplitude, phase, or polarization of the incident EM wave in real time. Then, a preferable EM propagation environment can be customized by properly controlling the RIS.
The widely studied category of RISs reflect the EM waves toward the desired directions by adjusting their phases. An RIS works as a controllable reflector in the wireless environment, providing an additional controllable link between the transmitter and the receiver to assist the wireless communication. Suppose the transmitter and the receiver are equipped with a single antenna, respectively. The number of unit cells in the RIS is N. Then, the signal at the receiver can be modeled as follows: where s is the transmitted signal, g ∈ C is the direct channel between the transmitter and receiver, h 1 , h 2 ∈ C N×1 are the channel between the transmitter and RIS and the channel between RIS and the receiver, respectively, while include the phase shift of signal introduced by the RIS, φ n is the phase shift on the nth unit cell, and z is the complex Gaussian noise. Apart from the direct link g, an RIS link h T 2 h 1 is added in. If the direct link is blocked by obstacles, then the RIS can reconstruct the wireless link and recover the communication service. The effective channel in an RIS-assisted wireless communication system is

1) Fully Passive RIS:
Most existing RISs that work in the reflection mode are fully passive regardless of the low external control voltage. No signal processing module exists at the RIS, and, thus, the RIS is not able to transmit or receive wireless signals. Since the individual channels h 1 and h 2 are cascaded together, channel estimation can only be applied at the receiver side. Under this condition, it is convenient to directly estimate the effective channel g eff . Alternatively, by rewriting it is feasible to estimate the cascaded channel h T 2 diag{h 1 }. The estimate of h T 2 diag{h 1 } can further guide the design of v. However, the training overhead required to estimate h T 2 diag{h 1 } ∈ C 1×N at the single-antenna receiver is large. Therefore, the fully passive RIS faces intrinsic difficulties in channel estimation.
2) Semi-Passive RISs: To tackle the channel estimation problem, semi-passive RISs were proposed in [84], [85], [86], and [87]. As shown in Fig. 10, a semi-passive RIS introduces a few active sensors that can receive signals to enable channel estimation at the RIS. These active sensors are connected with RF chains and have two modes. One is the reflection mode, same as a common RIS unit cell. The other is the reception mode, in which the incident signals are received and conveyed to the signal processing module through RF chains. Supposē N unit cells are active sensors, satisfying 1 ≤N ≤ N. Under this condition, the two individual channels from the transmitter and from the receiver to RIS, denoted byh 1 ∈ CN ×1 , andh 2 ∈ CN ×1 , respectively, can be estimated at the RIS. By leveraging the sparsity of channels and the correlation among different unit cells, the large-dimensional channels h 1 and h 2 can be extrapolated from their reduced dimensional versions h 1 andh 2 when the channel experiences spatial stationarity. In practice, one of the transmitter and receiver can be the BS or access point. Considering that the locations of BS/AP and RIS are fixed, the channel between them (denoted as h 1 ), remains unchanged within a long time period. Therefore, h 1 does not need to be frequently estimated, saving a great amount of training overhead. However, the channel extrapolation method may not work well in frequency-division duplexing systems, where reciprocity does not hold between the uplink and downlink channels.
The above low-cost architectures enable the deployment of extra large-aperture arrays. Active antenna arrays and RISs can be jointly applied to satisfy specific service requirements in different application scenarios.

V. LOW-COMPLEXITY PROCESSING AND COMPUTATION
Apart from the problem of high cost, the implementation of an extra large-aperture array also requires high-complexity processing and computations. In multiantenna systems, the computational complexity of the widely used linear signal processing algorithms usually has an order of O(N), where N is the number of antennas. If a matrix multiplication or inversion is further involved, then the order of computational complexity grows. When N grows large, the complexity of these algorithms that jointly process signals across all antennas will grow explosively. The high-complexity processing and computations usually result in unacceptably high latency. The centralized control over the entire extra large-aperture array requires an extremely powerful central process unit (CPU) as shown in Fig. 11(a). Therefore, in extra large-aperture array systems, low-complexity processing and computation design is also a key objective.

A. Complexity Reduction at CPU
One method is to directly reduce the complexity of some high-complexity algorithms for their simplified or scalable implementation in the CPU. Complexity reduction in massive MIMO systems is not a novel concept [88], [89], [90], [91], [92]. Some of these methods can be extended to fit in extra large-aperture array systems.
There have been studies focusing on the complexity reduction in the CPU of extra large-aperture array systems [36], [40], [93], [94], [95]. Most of these studies focus on zeroforcing (ZF), which is a widely used linear signal processing method in multiuser multiantenna systems. The ZF precoder and combiner can be applied at the transmitter and the receiver, respectively, to cancel out the interuser interference. For example, let us denote the downlink channel between the extra large-aperture array at the BS and the single-antenna user k as h k ∈ C N×1 . The channels of K users are stacked together as H = [h 1 , . . . , h K ] ∈ C N×K . Then, the ZF precoder is calculated as follows: Since matrix multiplication and inversion are involved, the computational complexity of calculating W ZF reaches O(NK 2 )+O(K 3 ). To reduce the complexity, Ribeiro et al. [93] proposed a double-layer precoding method. The inner-layer precoder is applied to a group of users that share a similar elevation angle. The outer-layer precoder decreases the interference among different user groups. For each user group, channels on a column of antennas are summed up for the calculation of the inner and outer layer precoders. Supposeh k ∈ C N h ×1 , whose ith entry represents the sum of channels on the ith column of antennas. Then, complexity reduction is achieved by utilizing the low-dimensional h k instead of the extra large-dimensional h k . Another example in [40] focused on the acceleration of the calculation of the ZF combiner at the receiver. The algorithm acceleration problem was addressed from the perspective of linear equation systems and addressed by the randomized Kaczmarz (RK) algorithm.
In addition to the ZF receiver, variational message passing (VMP) is another widely used multiuser MIMO detector, which has lower complexity than ZF because no matrix inversion is involved. In this context, Amiri et al. [36] applied VMP in the extra large-aperture array system under spatial nonstationary channel conditions, and further utilized a maximal ratio combiner (MRC) for initialization. The complexity of VMP and MRC is linear with N and K, which is much smaller than that of ZF.
Some other works focused on the complexity reduction of antenna selection [94] and user scheduling [95] in extra largeaperture array systems. Given the number of antennas N and the number of RF chains N RF , the exhaustive searching-based antenna selection method requires a search over O(N N RF ) combinations of antennas and RF chains, which is unacceptably high in extra large-aperture array systems. To reduce the complexity, [94] proposed a suboptimal method, which initially sets a coarse antenna selection result and then iteratively refines it based on a closed-form analytical expression of the energy efficiency, effectively avoiding the exhaustive search over a huge combination set. Moreover, when the EM wave experiences spherical propagation, then the channel is reconstructed by both the distance and the angle of the incident signal. Based on this channel feature, [95] introduced an effective distance between a user and the extra large-aperture array and then proposed a low-complexity user scheduling scheme that simply compares the effective distances of different users, making the scheduling problem simple and easy.

B. Distributed Processing and Computation
Assigning all the processing and computation tasks to a single CPU is not a reasonable choice in the extra large-aperture array system. An alternative is to partition the entire array into multiple subarrays and distribute the tasks to the subarrays [6], [34], [37], [38], [39], [41], [42], [43], [96]. This is a logical concept of subarray different from the physical subarray above. A logical subarray may have a fully digital physical architecture, but it has its own local processing unit (LPU) as shown in Fig. 11(b) and (c). Some processing and computation tasks of an individual logical subarray, such as channel estimation, antenna selection, etc., can be handled by its own LPU. When LPUs exist, there can be arranged via two logical architectures.
1) Single Layer With LPUs: This logical architecture is illustrated in Fig. 11(b) and solely composed of LPUs. That is to say, all the processing and computation tasks are distributed and performed at the LPUs, without a centralized control over the LPUs. Since no CPU exists, this architecture can be easily scaled up.
Notably, some tasks are local tasks and can be handled by a single LPU. A typical example of a local task is channel estimation. The channel across the entire array can be uniformly We write the uplink signal model in a time-division duplexing system as follows: where s k is the transmit signal from user k. The task of signal detection is to estimate s = [s 1 , . . . , s K ] T from the y ∈ C N×1 , which is the signal received by the entire array.
is the signal received by subarray b. If LPU b independently performs signal detection based on y b , then there is a high probability that different LPUs provide different estimates of s. This is because the channel vector h b and the random noise n b vary across different b, especially in multipath propagation scenarios and when spatial nonstationarity exists. Considering that only one final detection result is required, while the CPU that can make the final decision is absent, a serial detection method was proposed in [41]. VMP is normally combined with belief propagation for multiuser data detection. The output of LPU b < B is the soft information of s and serves as an input of LPU b + 1. The outputs of LPU B are the estimates of s and serve as the final detection result. The serial cooperation among the LPUs brings the benefit of easy scalability, but still suffers from the high processing latency. Moreover, the working procedure among the LPUs is fixed and cannot be flexibly adjusted according to practical channel conditions.
2) Double Layers With CPU and LPUs: A more reasonable and widely studied logical architecture is the double-layer architecture with LPUs in the lower layer and CPU in the upper layer as shown in Fig. 11(c). When spatial nonstationarity holds, different users have different VRs w.r.t. the array. If subarray b is not in the VR of user k, then the CPU can inform LPU b to deactivate the processing and computation related to user k. Therefore, a more efficient transceiver design can be deployed at the CPU, thereby enabling complexity reduction.
In this architecture, each LPU is connected with the CPU. Having completed the distributed processing and calculation, each LPU feeds its local result back to the CPU. Then, the CPU integrates the local results from all the LPUs and obtains the final global result by means of hard decision or data fusion [6]. At the receiver, [37] decentralizes the RK-ZF algorithm and applies it in multiuser signal detection in extra large-scale MIMO systems. LPU b calculates its local linear combiner matrix V b ∈ C K×(N/B) , applies it on the received signal on subarray b, and computes the estimate of s at LPU b as follows: If the VR of user k does not cover subarray b, then the entries in the kth row of V b are zero. Thereafter,ŝ b is sent to the CPU. Having received all the estimates from B LPUs, the CPU integratesŝ 1 , . . . ,ŝ B and makes the final decision through data fusion. Similarly, [42] and [47] applied VMP for multiuser signal detection, and LPU b outputs the symbol probability q b (s) instead of the estimateŝ b . The estimates of multiuser signals are only obtained at the CPU. The concept of LPUs of subarrays can be extended to LPUs of users. In [38], transmit antenna selection and user mapping were studied. Considering that different users have unequal VRs, parallel user mapping convolutional neural networks (CNNs) were proposed to learn the selected antennas for each user independently. The kth CNN outputs N max antennas for user k. The CPU further makes antenna selection from the N max antennas for user k by jointly considering the sum-rate of all K users. In the above studies [6], [37], [38], [42], [47], the LPUs work independently in parallel, and information exchange between one LPU and the CPU occurs only once. Therefore, the working procedure has relatively low latency.
Some recent works proposed the information exchange among LPUs or iterations between CPU and LPUs to gradually improve the performance. Information exchange between two distinct LPUs can be achieved with the assistance of CPU, or, a direct connection can be further established between the two LPUs. At the receiver, the LPUs in [34] performed ZF-based signal detection on a per user basis, while the detection results of a certain user were shared by the LPUs for the detection of signal from the next user. This serial interference cancelation method was also applied in [47], where VMP is employed in each LPU. Notably, given the VR of each user, the operation order of LPUs as well as the detection order of user signals can be initially determined by CPU [34], which further improves the detection performance.
Apart from ZF and VMP, expectation propagation (EP) is another effective algorithm that has been utilized at the receiver for multiuser signal detection in extra large-aperture array systems [97], [98]. EP in a centralized processing strategy that has excellent performance and moderate complexity. In this context, [97] initially implemented EP in a decentralized manner and made efforts on the reduction of computational complexity and information exchange amount, while [98] further refined the decentralized EP by approximating the matrix inversion at the CPU, whose complexity is O(K 3 ), with a polynomial expansion. Given that EP is an iterative algorithm, the decentralized EP method also requires information exchange among the CPU and the LPUs.
In [96], antenna selection and resource allocation were considered at the downlink transmitter. Even though in this work the LPUs operate in parallel, back-and-force information exchange between CPU and LPUs occurs since a genetic algorithm was adopted. Successive operation of LPUs and iterative optimization between two layers inevitably increase the latency.
Multilayer processing can be further applied in extra largeaperture RIS-assisted mobile communication systems [43]. The RIS can be uniformly partitioned into B logical subarrays, corresponding to the lowest processing layer. In the design of the RIS reflection codebook, a reduced dimensional local subcodebook can be first designed for each subarray. Then, subcodebooks in the second lowest layer is obtained from the ones in the lowest layer. Through this sequential design, the fully dimensional codebook can be finally derived in the higher layer. The multilayer processing reduces not only the complexity, but the huge training overhead caused by the extra large-aperture RIS.

VI. LOW-OVERHEAD COMMUNICATION AND SENSING
In this section, we focus on low-overhead design in extra large-aperture array systems. Training is an effective and reliable approach to acquire CSI. With the increase of user equipments and the diversification of device types that are connected to the extra large-aperture array system, the amount of pilots required will grow prohibitively high if independent training is performed across them. Furthermore, for an extra large-aperture array with massive active antennas but less RF chains, estimation of the huge dimensional channel on each antenna inevitably involves a beam sweeping or antenna switching process, which will be time consuming if the number of RF chains is much smaller than the number of active antennas.
Fortunately, the directionality and sparsity of propagation channels create room for overhead reduction, which will be explained in detail in the following part of this section. Furthermore, the extra large-aperture array has an extremely high spatial resolution, and the high-dimensional channel contains the environment information, such as knowledge about the user location and surrounding obstacles. Therefore, sensing can be achieved together with communication during the training process [99]. In this section, we study the low-overhead communication and sensing paradigm.

A. Directionality and Channel Sparsity
In a traditional multiantenna system, the serving area of a BS is large, and users are in the far-field region of the array. The plane wave channel model (60) is then applied, and the plane wave is expressed by its AoA/AoD as shown in (61). Due to the high spatial resolution of the large-aperture array, and the much smaller number of propagation paths than the number of antennas, the channel shows significant sparsity and directionality in the angular domain. In an extra large-aperture array system, there is a high probability that the distance between a user and the BS is smaller than the Rayleigh distance. Under these conditions, the spherical wave channel model (54) should be introduced, and the spherical wave is expressed by the position of the source (28). Moreover, the VR kicks in when blockage exists, which means that the effective array size is reduced. Then, whether the channel sparsity and directionality hold becomes a question.
Assume the BS is equipped with an extra large-aperture uniform linear array (ULA) with N elements lying on the x-axis, where N is even for the simplification of analysis. Considering that the horizontal ULA has flexible control over only the xz plane, and we describe the positions through (x, z) coordinates. The center of the ULA is at the origin of the coordinate system, and the position of antenna n is (−([2n + 1]/2)d, 0), where n = −(N/2), . . . , (N/2) − 1. User k is located at s k = (x k , z k ). By applying the limited dimensional channel model (78) and assume that only the LoS path exists, the channel between the BS and user k can be simplified as follows: where a(s) ∈ C N×1 is the steering vector, satisfying When applying (12) is the distance between the source and antenna n, k is the VR of the user w.r.t. the array, and p( ) follows the structure in (79).

1) Angular Domain:
We start by investigating whether the directionality and sparsity hold for h k in the angular domain when the VR covers the entire array. The angular domain transformation is derived from the plane wave model where equal phase deviation is experienced by each pair of adjacent antennas as shown in (61). Therefore, the discrete Fourier transformation (DFT) matrix is usually adopted as the angular domain transformation matrix. Denote the N-dimensional DFT matrix as U A ∈ C N×N , where [U A ] n 1 ,n 2 = e −j2π(n 1 /N)n 2 , n 1 , n 2 = 0, . . . , N − 1. The nth row corresponds to the direction with an angle of θ = arccos(n/N). The rows of U A are orthogonal with each other. Then, the angular domain channel of user k is written ash A,k = U A h k , whereh A,k ∈ C N×1 has the same dimension with h k . Under the plane wave model, the amplitude of [h A,k ] n will be large if the angle of the LoS path is close to arccos(n/N), and, thus,h A,k would have a sparse pattern. However, under the spherical wave model, the entire array does not experience a common angle, and a significant angular spread appears. As shown in Fig. 12(a),h A,k shows directionality around cos θ = 0 when z k = 5000λ. With the decrease of z k , and wherever user k moves toward the array, the angular spread increases, andh A,1 has more continuous nonzero entries thanh A,2 andh A, 3 . In an extreme but unpractical case that z k = 0, the angular spread will cover the entire angle value region, and then directionality and sparsity no longer exist inh A,k .
2) Cartesian Domain: From (98), we see that [a(s)] n is determined by the 2-D Cartesian coordinate (x k , z k ), instead of a 1-D angle θ . Therefore, under the spherical wave model, it is more reasonable to transform h k to a 2-D domain than to a 1-D domain. Paper [39] proposed to transform the radio channel to the Cartesian domain. The transformation matrix U c ∈ C N c ×N is composed of N c row vectors of ([a H (x,z)]/ a(x,z) ), wherē x andz are the samples of x and z, respectively, and N c is the number of sample pairs (x,z).
Let N c = N x N z , where N x and N z are the numbers of x and z samples, respectively, by uniformly and separately sampling x and z as follows: where x min , x max , z min , and z max jointly define the rectangular region that users may appear in, while x and z are the sampling steps in the x and z axis, respectively. Different from U A , the orthogonality among the rows of U c cannot be guaranteed. The channel in the Cartesian domain is obtained bỹ h C,k = U c h k , whose dimension is N x N z , i.e., not equal to that of h k . The N x N z -dimensional vectorh C,k can be rearranged to an N x × N z -dimensional matrixH C,k . Fig. 12(b) illustrates the normalized amplitudes of the 2-D matricesH C,k , k = 1, 2, 3. For the sample pair which satisfies (x,z) = (x k , z k ), the corresponding entry ofH C,k has the largest amplitude as expected, demonstrating the directionality in the Cartesian domain. When z k = 50λ, even though most entries ofH C,k are nonzero, their amplitudes are still obviously lower than the maximal one. With the increase of z k , the number of nonzero entries decreases. The sparsity ofH C,k gradually becomes significant and can be found solely in the x-domain.
3) Polar Domain: The spherical wave channel is more frequently expressed by the polar coordinates (D k , θ k ), where D k and θ k represent the distance and angle between the ULA's center and user k, respectively, satisfying Then, d k,n in (99) is calculated by The polar transformation matrix can be defined as U P ∈ C N P ×N with row vectors of ([a H (D sinθ,D cosθ)]/ a(D sinθ,D cosθ) ), whereD andθ are samples of D where D min , D max , θ min , and θ max define the fan-shaped region that users may appear in, and D and θ are the sampling steps of lg D and θ , respectively. Here, lg D instead of D is uniformly sampled. This is because with the increase of D, the spherical wave channel becomes less sensitive to D, and, thus, the sampling interval of D can grow with D. Similar to U c , the orthogonality among different rows of U P cannot be guaranteed as well. Thereafter, we obtain the polar domain channel ash P,k = U P h k , whose dimension is N D N θ . Similarly, the N D N θ -dimensional vectorh P,k can be rearranged to an N D × N θ -dimensional matrixH P,k As shown in Fig. 12(c), the entry corresponding to (D,θ) = (D k , θ k ) has the maximal amplitude, verifying the directionality in the polar domain. Moreover, even though the sparsity of the channel in polar domain is not obvious when D is small, the amplitudes of nonzero entries are definitely much lower than the maximum one. The sparsity gets apparent with the increase of D, and is shown only in the angular domain when D = 5000λ.
To decrease the correlation among rows of U P , a joint angle and distance sampling grid was proposed in [29], where θ is uniformly sampled with N θ = N and θ = (π/N). Specifically, the sampling of the distance depends on that of the angle. For a particular sample of angleθ , we acquire a unique sample set ofD, where the obtained vectors of ([a H (D sinθ,D cosθ)]/ a(D sinθ,D cosθ) ) are nearly orthogonal to each other. To achieve this near orthogonality, the size of the distance sample set varies with the value of θ. When cosθ approaches 0, the sample set of the distance is expanded. Otherwise, the size of the sample set of distance decreases, resulting in an insufficient sampling grid of the entire space. Despite this drawback, the row vectors of U P are approximately orthogonal to each other under this setting, and the polar domain channelh P,k shows sparsity.

4) Antenna Domain:
When the user is very close to the array as the example in Fig. 5, or severe blockage happens as illustrated in Fig. 6, the VR of the user w.r.t. the array is a small-scale subset of antennas in the array. Then, the channel shows sparsity in the antenna domain. In the simplest case that the VR of user k is a continuous subarray, the channel can be approximated as follows: where h k,VR is the subvector of h k corresponding to the entries within the VR.

B. Low-Overhead Design
Channel directionality and sparsity in the transformation domains provide room for overhead reduction. More particularly, channel directionality guarantees the accuracy of user localization, which further supports channel reconstruction and sensing. Channel sparsity enables the application of compressed sensing techniques in the estimation of channels and the orthogonal transceiver design among multiple users. Details are given as follows.
Consider an extra large-aperture array system with less RF chains than active antennas at the BS. The spatially nonstationary channel h k follows the limited dimensional model in (77) and can be rewritten as follows: where L k is the number of paths in the channel of user k, while s k,l and k,l are determined by the scatterers, reflectors, and obstacles in the environment. In the uplink training phase, user k transmits a pilot sequence to the BS for channel estimation and sensing. The received pilot sequence at the BS at time instance t is expressed as follows: where Y t ∈ C N RF ×Q is the received pilot sequence with length Q on N RF RF chains at time instance t, P is the transmit power of each user, F RF,t ∈ C N RF ×N is the RF matrix at time instance t, x k ∈ C Q×1 is the pilot sequence of user k satisfying x H k x k = 1 and x H k x j = 0, j = k, N t ∈ C N×Q is the noise matrix with i.i.d. entries, with entry following a complex Gaussian distribution with zero mean and unit variance. A total of T time instances are used for uplink pilot transmission. By stacking Y t , t = 1, . . . , T together and multiplying them with x k , we have

1) Localization Based on Directionality:
When an LoS path exists between user k and the BS, it is usually set as l = 1 in (105), and then s k,1 is the position of user k. The LoS path has stronger power than other NLoS components due to the smallest pathloss. Given the directionality of the near-field channel in Cartesian and polar domains, the matching method of [39] can be applied to find the position s k,1 from y k . Applying (105) in (107), the received pilot can be rewritten as follows: Pβ k,l F a s k,l p k,l + n k .
Utilizing the directionality, we obtain x k,1 ,ẑ k,1 = arg max and the localization result isŝ k,l = (x k,1 ,ẑ k,1 ) or (D k,1 sinθ k,1 ,D k,1 cosθ k,1 ). This localization method can work well when T (N/N RF ). For sensing, given the estimates of position and VR, we can generally decide where the obstacle is. With more paths interacting with a common obstacle, the localization, size, and even shape of the obstacle can be more accurately determined from the positions and VRs of these paths. Then, the environment can be identified.
2) Channel Estimation Based on Sparsity: In practical environments, when the system works in higher frequency bands, the NLoS paths becomes fewer due to the severe pathloss and blockage. In an extra large-aperture array system, we usually have L k N. Therefore, the large dimensional channel h k can be expressed by a limited amount of paths. In the Cartesian or polar domain, most of the channel power is concentrated on nearly L k entries. Based on whether the orthogonality holds among the rows of transformation matrix, there are two categories of low-cost channel estimation methods. One is channel reconstruction, and the other is compressed sensing. Channel reconstruction focuses on the estimation of the limited amount of path parameters instead of the large-dimensional channel [39]. The parameters to be estimated include β k,l , s k,l , and k,l . When an LoS path exists, the user position can be obtained by the above matching method in (109) or (110). If the LoS component √ Pβ k,1 F(a(s k,1 ) p( k,1 )) is extracted from y k in (108), then the second largest path component can be extracted from the residual of y k through the same matching method. The L k paths can be iteratively extracted from their mixture. Finally, the large-dimensional channel h k can be reconstructed by applying the estimates of β k,l , s k,l and k,l into (105). The training overhead of channel reconstruction is comparable to that of localization based on directionality.
Compressed sensing aims to estimate the reduced dimensional sparse channel in a transformation domain. The precondition is that the row vectors of the transformation matrix maintain the orthogonality between them, which can be achieved by the polar domain transformation in [29]. For h P,k ∈ C N P ×1 , we denote the indices of its nonzero entries as ϒ k = {n k,1 , . . . , n k,Ñ k },Ñ k N P . Then, the reduced dimension subchannel [h P,k ] ϒ k contains almost all the information in h k . When U P and [U P ] ϒ k ,: have full ranks, (107) can be further written as follows: Then, the objective becomes to estimate the reduced dimensional channel [h P,k ] ϒ k , which can be realized through compressed sensing. The key point lies in the identification of ϒ k from {1, . . . , N P }. Following the compressed sensingbased channel estimation methods in millimeter wave hybrid beamforming systems, the estimatesΥ k and [ĥ P,k ]Υ k can be estimated through the orthogonal matching pursuing (OMP) algorithm, where the matching step is the same as (109). Then, the large-dimensional channel can be obtained bŷ Notably, since the sampling grid cannot cover the entire space, there is a high probability that the positions estimated by OMP are not the real positions, and a further refinement of the estimated positions toward the real positions is required [29] if localization needs to be achieved simultaneously.

3) Multiuser Pilot Transmission Based on Sparsity:
The nonoverlapping sparsity of different users' antenna-domain channels enables the simultaneous transmission of pilots from or to these users. A common pilot sequence can be shared among users that have nonoverlapping VRs, and the orthogonal pilot sequences are assigned to users with overlapping VRs. Due to the limited amount of orthogonal pilot sequences, the nonoverlapping sparsity among users creates potential for the reduction of the overall training time. By knowing the VR of user k, i.e., UA,k , the BS directly transmits or receives the pilot of user k through UA,k . For instance, suppose UA,1 , . . . , UA,B cover subarrays 1, . . . , B, respectively, and they are nonoverlapped with each other. While UA,B+1 covers the entire array. Then, pilot sequences x 1 and x 2 are assigned to users 1, . . . , B and user B + 1, respectively. In the uplink, the received pilots at the BS from all users can be expressed as follows: By multiplying Y with x 1 , the pilots from users 1, . . . , B are extracted: where y = [y 1 , . . . , y B ] T contains the received pilot on each subarray, and n = [n 1 , . . . , n B ] T = F RF Nx 1 . By further recalling (88) and (104), we can rewrite y b as follows: which involves only the channel of user b. That is to say, only two instead of B + 1 orthogonal pilot sequences are required for multiuser training without introducing interference among them. The nonoverlapping sparsity in the antenna domain has been utilized in [45], where the overhead for random access was greatly reduced and the efficiency was enhanced.
In extra large-aperture RIS-assisted systems, directionality, and channel sparsity still hold in the angular, Cartesian, polar, and RIS unit domains at the RIS side. Therefore, the low-cost designs are also applicable in RIS-assisted systems. Notably, when applying the multiuser pilot transmission scheme, the RIS should be equipped with signal reception capabilities.

VII. CONCLUSION
We investigated the new channel properties of spatial nonstationarity, including the spherical wave propagation and the VR, and made a survey about existing works in the context of hardware cost, processing and computation complexity, and training overhead for extra large-scale MIMO systems. We also studied the origins of spatial nonstationarity and illustrated the modifications of channel modeling when spatial nonstationarity was considered. This new property paves the way for low-cost hardware architectures. Through a detailed comparison, we proposed a double-layer architecture and the RIS as the most promising implementation architecture of an extra large-aperture array. Then, the complexity reduction problem was investigated and the distributed solution with one CPU and multiple LPUs demonstrated the most promising potential. Finally, the low-overhead communication and sensing strategies were investigated, which can be realized given the directionality and sparsity of the channel in the Cartesian, polar, and antenna domains. Summarizing, this article reviewed the early stage research efforts of extra large-scale MIMO, and highlighted the importance of low-cost designs in future practical implementations.