Channel Training & Estimation for Reconfigurable Intelligent Surfaces: Exposition of Principles, Approaches, and Open Problems

Reconfigurable intelligent surfaces (RIS) are passive controllable arrays of small reflectors that direct electromagnetic energy towards or away from the target nodes, thereby allowing better management of signals and interference in a wireless network. The RIS has the potential for significantly improving the performance of wireless networks. Unfortunately, RIS also multiplies the number of Channel State Information (CSI) coefficients between the transmitter and receiver, which magnifies the challenges in estimating and communicating the channel state information. Furthermore, the simplicity and cost-effectiveness of the passive RIS also implies that the incoming links are not locally estimated at the RIS, and fresh pilots are not inserted into outgoing RIS links. This introduces new challenges for training and estimation of channel state information. The rapid growth of the literature on CSI acquisition in RIS-aided systems has been accompanied by variations in the underlying assumptions, models, and notation, which can obscure the similarities and differences of various techniques, and their relative merits. This paper presents a comprehensive exposition of principles and approaches in RIS channel estimation. The basic ideas underlying each class of techniques are reduced to their simplest form under a unified model and notation, and various approaches within each class are discussed. Several open problems in this area are identified and highlighted.


I. INTRODUCTION
The fifth generation (5G) wireless systems are being successfully deployed across the globe, achieving high data rates using large antenna arrays [1], [2].6G systems will aim for even higher data rates, lower latency, and better reliability, at low cost/complexity and high energy efficiency [3], [4].This requires innovations in the physical layer technologies.The continuing migration to the millimeter wave (mmWave) spectrum, while improving data rates, has challenges including channel blockage and intermittent availability.Increasing the network density can solve some of these problems, but it injects more power into the network, increases the interference levels, escalates cost of deployment and operation, and raises scalability concerns.
Reconfigurable intelligent surfaces (RIS) are passive, controllable arrays of small reflectors that direct radio waves toward or away from a target node, enabling better management of signals and interference in wireless networks [5]- [9].They are often interpreted as a mechanism that achieves software-defined control of the wireless  propagation environment (Fig. 1).This approach has the potential to address several of the challenges mentioned above.Judiciously altering the channel characteristics in real-time as a part of the system operation, to achieve favorable propagation environment, is an attractive idea to handle challenging channel conditions.Unlike a relay, RIS does not inject more transmit power into the network, and its operation is independent of the details of PHY signaling other than operating frequency and bandwidth.The combination of lower power consumption and simpler construction makes it cheaper to build and operate, thus helping scalability.
RIS-induced channel coefficients must be estimated at the receiver for coherent communication, and shared with the transmitter (and RIS) for beamforming.The RIS channel estimation problem is distinct from the MIMO (multiple-input multiple-output) case because: (1) RIS is a two-hop channel observable only end-to-end, due to passivity of RIS, (2) the number of channel gain coefficients is multiplied by the RIS size, making for a larger vector to be estimated, and (3) in RIS channel estimation, training occurs through pilots plus passive RIS training states, and the latter is without a direct counterpart in traditional channel training and estimation.

A. Contributions and Distinctions of the Present Work
The present work provides a thorough exposition of the ideas underlying the rapidly expanding literature on RIS channel estimation, in a way that is both comprehensive and accessible to a wider audience.An up-to-date discussion of the available estimation techniques and related issues, ranging from classical least squares to the most recent artificial intelligence and machine learning (AI/ML)-based methods, is provided.
One of the contributions of the present work is to bring a unified notation and system model to the mathematical expression of various RIS channel estimation problems and algorithms.This goes beyond the enumeration of works in a conventional survey, and is greatly helpful in the interpretation and comparison of results in a rapidly expanding literature.Among other features, this work illuminates the commonality and differences between RIS channel estimation results, thus exposing synergies and facilitating the generation of new ideas.
In addition to the exposition of various estimation approaches, this paper also explores critical assumptions, explicit or implicit in RIS channel estimation, that are in need of further investigation and validation, e.g., channel reciprocity or perfect deactivation of RIS elements.Other important issues, such as the near field effect in very large RIS, or the dependence of reflection gains on phases, are also discussed.
To put this paper in context and highlight its distinctions, we briefly review related works.In a concise letter, Wei et al. [10] explored sparsity, user correlation, and time scales for RIS channel estimation.Swindlehurst et al. [10] consider the identifiability of the models as a function of the pilots and RIS training states, and further consider special cases such as single-input single-output (SISO) and MIMO, availability or unavailability of a direct link, and narrowband vs. wideband.Noh et al. [11] concentrates on characteristics of RIS channels in terahertz (THz) and mmWave channels.Zheng et al. [12] provides a survey of RIS channel estimation that enumerates the problems and outcomes, but does not dwell on methodology or characterization of the methods.In comparison, the present work provides a simple yet sufficiently detailed exposition of ideas for a wider audience.Compared with the works mentioned above, the present work also addresses new dimensions, including: the impact of RIS channel estimation overhead on spectral efficiency, and the implications of optimizing spectral efficiency (through channel estimation constraints) on the size of RIS array.The present work also includes coverage of machine learning methods for RIS channel estimation.Also, as mentioned earlier, underlying assumptions with potential impact on practical implementations are also examined in the present work.
This paper also presents several new interpretations and connections that have been unavailable in the literature thus far.Among them, (a) Section VII-A provides an elegant explanation for available savings in multiuser RIS systems, leading to suggestions for future work.(b) Section VII-C discusses the differences and similarities of techniques that infrequently update the RIS coefficients, compared with traditional MIMO statistical CSI methods, opening venues for future work, and (c) Section VII-D clarifies for the first time that the so-called "codebook methods" for RIS actually have deep connections with opportunistic communication.
The organization of this paper is as follows.Section II presents RIS system and channel models.Section III formulates RIS channel estimation problem and presents discussions on pilots, training states, and linear estimation methods.Section IV presents the effect of training overhead and accuracy on the spectral efficiency of RIS, and discusses its impact on the choice of RIS dimensionality.Section V presents the formulation of RIS channel estimation as a sparse recovery problem when operating in high frequency (mmWave/THz) channels, and discusses the solutions proposed in the literature.Section VI presents estimation techniques for RIS-aided orthogonal frequency-division multiplexing (OFDM) in wideband channels.Section VII presents various approaches for reducing estimation overhead.Section VIII presents estimation approaches based on machine learning.Section IX outlines practical issues of contemporary interest and investigation in RIS channel modeling.

II. SYSTEM AND CHANNEL MODELS
For easy reference, the mathematical terms and the key variables used in this paper are summarized in Then, the received signal is given by where the transmit signal x ∈ C Mt×1 satisfies E x 2 = 1, the transmit power is ρ, and the receiver Gaussian noise In the microwave L and S bands, and parts of the C band, 1 a suitable model of rich scattering involves many multipath components, resulting in independent and identically distributed (i.i.d.) Gaussian gains for the direct and RIS-aided channels.When a direct line-of-sight (LoS) path exists between any two nodes, a Rician fading model applies.In that case, the Tx-RIS channel can be decomposed as follows: where K g is the Rician factor, Ḡ is the deterministic LoS component, and G is the random non-LoS component.
Recently, RISs have been studied in higher frequencies (mmWave and THz) where the channels are represented in parametric form; for example, a transmitter-RIS channel with L resolvable paths is given by where α is the complex gain of path , a 1 (•) and a 2 (•) are steering vectors at the transmitter and RIS, respectively, with φ 1, (θ 1, ) and φ 2, (θ 2, ) being azimuth (elevation) angles of departure and arrival for the path .In matrix form: This matrix representation will be revisited in Section V for channel estimation.

B. Wideband/Frequency-Selective Channels
Consider an RIS-aided communication between a single antenna transmitter and receiver (for ease of exposition) in frequency selective channels using OFDM modulation with M c orthogonal sub-carriers.Let L denote the number of channel taps for both the direct and cascaded channels2 , with x be the M c × 1 transmit frequency-domain OFDM symbol and x be its M c -point inverse discrete Fourier transform (IDFT).At the OFDM transmitter, q x is first transformed to x and transmitted over the frequency-selective channel.The signal reaching the receiver due to direct path is given by x h d , where denotes the circular convolution.The signal reaching the receiver via reflection from RIS element n is ψ n (x g n ) h n , where ψ n is the induced reflection coefficient.The overall received signal due to the direct path and the RIS reflections is The OFDM receiver performs DFT operation on y to obtain q y = F Mc y, where Let q h d = F Mc h d denote the frequency domain Tx-Rx channel.Similarly, let q g n and q h n denote the frequency domain RIS-aided channels due to element n.Then, the received signal in the frequency domain is given by where X = diag(q x) and denotes the Hadamard (elementwise) product.The channel gains can follow either rich or sparse scattering models depending on the propagation environment and frequency of operation.

III. PILOTS, TRAINING STATES, AND LINEAR ESTIMATION
A narrowband RIS-aided system is modeled by Eq. ( 1), or alternatively, Recall ⊗ and vec(•) denote Kronecker product and vectorization, respectively.Define where is a columnwise Kronecker product, which is a special case of the Khatri-Rao product.We further define h c vec(H c ).We organize the individual reflection coefficients ψ into a vector ψ, which carries the same information as the diagonal matrix Φ = diag(ψ).Further, define ψ Then, Eq. ( 7) is expressed as follows where Z represents a matrix that includes the transmit vector as well as the RIS training state, h c represents the overall channel (including both cascaded and direct channels), whose estimation is necessary for coherent detection at the receiver and beamforming at the transmitter and RIS.The transmission frame includes a training phase and a data transmission phase.During the training phase, j = 1, . . ., J, the pilot signals x j and RIS training states ψ j give rise to the overall training matrix Z j = ψT j ⊗ x T j ⊗ I Mr , at the receiver resulting in y j = √ ρZ j h c + w.

A. Linear Estimation
Since y j is M r -dimensional and h c is M r M t (N + 1)-dimensional, the least squares and minimum mean-square error (MMSE) estimation requires at least J = M t (N + 1) pilots.Define: Then, the least squares estimate of h c is given by where Z † is the pseudo-inverse of Z.Let R hc denote the covariance matrix of h c .Then, the LMMSE estimate of h c is given by Training Process: As illustrated in Fig. 2, the channel training occurs by the RIS assuming a training state ψ i that remains fixed over M t consecutive slots, and the transmitter emitting M t linearly independent (preferably orthonormal) pilots.This process repeats N + 1 times with different RIS training states 3 that generate linearly independent extended training vectors ψ.

B. RIS Training States
The accuracy of channel estimates depends on the choice of RIS training states.We consider the canonical, DFT, and Hadamard training states.For simplicity of exposition, it is assumed that the direct path is absent, and the channel gains follow i.i.d.CN (0, 1) resulting in R hc = I MtMrN .The following identity is used in the derivation of least squares and MMSE estimators. where

1) Canonical Training:
Canonical training activates one RIS element in each training state, deactivating the remaining N − 1 elements. 4Thus, the RIS training vectors constitute a so-called standard basis or canonical basis, as follows: where δ i,j is the Kronecker delta function.This results in ΨΨ H = I N , and hence, by the identity in Eq. ( 12), ZH Z = M t I MtMrN .Therefore, from (10), the least squares estimate with canonical training states is 3 because of N RIS elements and one direct path, constituting N + 1 degrees of freedom. 4Nulling the reflection of an RIS element with respect to all angles has not been adequately addressed in the literature and remains an open issue.See Section IX for further details.
From (11), the MMSE estimate with canonical training is ĥc = 1 2) DFT Training: In DFT training, each RIS training state is a column of the standard N × N DFT matrix.
The orthogonality ΨΨ H = N I N combined with the identity (12) gives Z H Z = M t N I MtMrN .The least squares estimate with DFT training states is therefore The MMSE estimate is Therefore, G and H cannot be uniquely resolved by observing pilots via the channel HΦG.However, as noted from Eq. ( 9), the knowledge of cascaded channel h c is sufficient for RIS beamforming, while also being more efficient to estimate compared with estimating G and H separately.

Remark 2. (Channel Training under Finite Precision Reflection Coefficients)
The precision of RIS coefficients may be limited in practical implementations, with only a finite number of quantized phase shifts being available, affecting both training and beamforming.Quantization of phase shifts has no effect on channel estimation under canonical training states.Hadamard training requires only 1-bit phase precision, without any compromise in estimation accuracy.DFT-training, however, requires N phase shifts, which may not be available in practice for a large RIS array.When L phase shifts (L < N ) are available at the RIS, a quantized-DFT training can be achieved by mapping the phases of the DFT matrix F N to the nearest phases in the quantized phase set P to obtain quantized-DFT matrix F q,N as follows: Unfortunately, the quantized-DFT matrix is non-orthogonal, resulting in degraded channel estimates.This is especially an issue for low-precision RIS implementations.

Remark 3. (Grouping the RIS Elements)
Under some scenarios, the estimation of all channel gain parameters induced by the RIS may be prohibitive in terms of time, power, or both.A remedy has been proposed [16]- [23] that constrains groups of RIS elements to have the same reflection coefficient.Then, it is not difficult to see that the relevant estimation parameter is an aggregate channel gain corresponding to the total reflection produced by the RIS elements in each group (that have the identical reflection coefficient).This scenario is effectively similar to an RIS with fewer (virtual) reflective elements.Each of these virtual elements, representing multiple physical elements, will have a stronger reflection and therefore a stronger channel gain.An example of this kind is discussed in Section VI.

IV. ESTIMATION VS. SPECTRAL EFFICIENCY
Because the RIS channel is not known ahead of time, channel resources must be spent to train and acquire the channel state information.Pilots require transmit power, and transmission time is occupied for generating independent channel observations, commensurate with the number of channel parameters being estimated.Any channel resource used for training becomes unavailable for data transmission.A larger RIS can improve beamforming gain that is beneficial for capacity, but also requires more training resources, which is detrimental for capacity.This gives rise to an interesting and important tradeoff in the size of RIS and its effects on spectral efficiency, studied in this section.
This section presents the training-based spectral efficiency results for RIS-aided single-antenna transmitters and receivers without a direct path, whose insights carry over to multiple antenna systems as well.The developments in this section follow [24].Let T be the coherence interval of all channels, also used as block length.T d channel symbols are dedicated to data transmission.In the absence of a direct path, a minimum of N temporal degrees of freedom are needed for training, Let ρ τ and ρ d denote the training and data powers, respectively, and let ρ denote the average power.Then, by conservation of energy: With these conditions, the following rate is achievable under canonical training [24]: Under DFT training and Hadamard training, the following rate is achievable: The spectral efficiency is a function of ρ τ and ρ d ; the optimizer of Eq. ( 19) is Similarly, the optimizer of Eq. ( 20) is   Figure 4 shows the RIS array size that maximizes the spectral efficiency, at each signal-to-noise ratio (SNR).
At low-SNR, power is at a premium, while degrees of freedom are less important.Therefore, it is beneficial to estimate the channel induced by the entire (available) RIS array, even though the training requires degrees of freedom.Conversely, at high-SNR, degrees of freedom are more important, therefore from a capacity perspective it may be beneficial to utilize only part of an available RIS, so that fewer time slots are utilized for training.

V. SPARSE CHANNEL ESTIMATION
Whenever the channel has a sparse multipath structure, fewer channel parameters need to be estimated, and the overhead incurred in transmission of pilots and feedback of channel coefficients is reduced.This is especially relevant for higher frequencies (mmWave/THz) wherein the RIS has the most impact.To capture the efficiencies arising from sparse channel structure, the RIS channel estimation is cast in the form of sparse vector recovery and solved with compressive sensing algorithms that reduce the number of required channel measurements compared with traditional channel estimation [25], [26].
Sparse multipath channels are characterized by a geometric model involving angles of arrival/departure and complex gains of the signal paths.The goal of sparse channel recovery is to estimate the parameters of the angular representation of the channel described in Sec.II-A, Eq. ( 4), i.e., which involves angles of arrival/departure captured in the left and right matrix, and the path strengths captured in the diagonal matrix.Even though G might be (highly) rank deficient, it is not (yet) expressed in a suitable format for compressive sampling algorithms.In Eq. ( 4), the basis vectors (columns of A 1 and A 2 ) can take values over an uncountably infinite set.To recast the problem in a friendly format for compressive sampling, the candidate angles are restricted to a finite set of size G, often corresponding to a uniform grid in a prescribed coordinate system.
The basis vectors 6 corresponding to the discretized angles are collected into dictionary matrices A 1 ∈ C Mt×G and With sufficiently good quantization of angles, matrices A 1 and A 2 are approximately7 submatrices of A 1 and A 2 .A sparse matrix Λ g can select the appropriate columns from A 1 and A 2 so that: The problem is now in a standard form for compressive sampling.With pre-determined dictionaries A 1 and A 2 , the objective is to estimate the sparse matrix Λ g from a noisy linear observation of G (via pilots).
If the transmitter-to-RIS channel G has L paths, the discretized grid of angles must have G L elements.
Ignoring for now any grid mismatch issues Similarly, let P denote the sparsity of RIS-to-receiver channel H. Consider a discretized set of candidate angles with size H P , and collect the steering vectors corresponding to these angles into dictionary matrices where Λ h is a H × H sparse matrix with P non-zero elements.For simplicity of exposition, we assume no direct path exists between transmitter and receiver.Thus, the received signal in Eq. ( 1) takes the form where Using a series of vectorization operations, it can be shown that Substituting in Eq. ( 24), where λ vec(Λ T g ⊗ Λ h ) is the (GH) 2 × 1 sparse vector with LP non-zero elements, and is the effective measurement matrix which is a function of pilots and RIS training states.During the training phase, the input sequence (x j , ψ j ) takes J distinct values, and the output sequence y j is observed: The sparse channel vector λ can be reconstructed from the measurements in (26) using standard sparse recovery algorithms such as orthogonal matching pursuit (OMP) [25] and subspace pursuit [27].Alternating direction method of multipliers (ADMM) [28] and approximate message passing [29] have also been explored for sparse channel estimation in RIS.Noh et al. [30] show that, for an RIS-aided single antenna system employing J pilots (J < N ) for sparse channel estimation, using the J equi-spaced columns of the N × N DFT matrix as training states produce lower mean squared error compared with canonical training states and the first J columns of the DFT matrix.
Training Overhead and Complexity: To reconstruct an LP -sparse vector of length (GH) 2 , orthogonal matching pursuit requires O(LP log(GH)) measurements [31].Since each pilot provides M r measurements, the required number of pilots for sparse RIS channel estimation is O LP Mr log(GH) .Subspace pursuit requires even fewer measurements; specifically, it requires O(LP log( GH √ LP )) measurements [31], and hence O( LP Mr log( GH √ LP )) pilots.In general, L and P are small at high frequencies, and hence the contribution of LP to the training overhead is small.Also, due to the logarithmic dependence on GH, the induced overhead is low, especially when M r is large.
A thorough characterization of optimal training with sparse recovery, and the associated training-based capacity (as in Sec.IV) is still open.The complexity of estimation using orthogonal matching pursuit (and subspace pursuit) is O((LP GH) 2 log(GH)), while linear estimation has a complexity of O((M t M r N ) 3 ).With suitable choice of G and H, sparse recovery can reduce the complexity when compared with linear estimation.
Beyond sparsity, other structural properties can be exploited for further reducing the training overhead.For broadband channel estimation in RIS-aided mmWave massive MIMO systems, Wan et al. [32] exploit the common sparsity shared by different sub-carriers and propose a distributed orthogonal matching pursuit to reduce the overhead.In the context of an RIS-aided multiuser downlink setting, Wei et al. [26] show that the angular cascaded channels associated with different users have exactly the same non-zero rows and some common non-zero columns (termed as double structured sparsity).The adaptation of orthogonal matching pursuit to this double-sparse structure is shown to further reduce overhead.Zhou et al. [33] consider uplink channel estimation in RIS-aided mmWave massive MIMO and exploit the fact that in many scenarios the angles of arrival/departure between RIS and base station remain unchanged over multiple coherence blocks.Therefore, the base-station to RIS channel parameters, which are more numerous in massive MIMO, need fewer updates, which can reduce pilot overhead.Lin et al. [34] decompose the sparse channel recovery into three components: recovery of angle of arrival, angle of departure, and complex gains.A semi-passive RIS with a few receiver chains at the RIS was proposed by [35], [36].A semi-passive RIS allows for receiving pilots and channel estimation at the RIS.The transmitter-RIS channel is estimated with the aid of the few RIS on-site measurements, and utilizing compressive sensing.
Matrix factorization and matrix completion [37] can also be used for estimating rank-deficient, sparse RIS channels.The key idea of this approach can be explained as follows.With pilots X τ = [x 1 . . .
where N ∈ C Mr×J is the additive noise matrix.Equation ( 27) can be equivalently written in the factored form as where A Ψ τ (GX τ ).With this representation, the estimation is achieved using a two stage process.In the first stage (matrix factorization), the matrices H and A are estimated based on Y τ .In the second stage (matrix completion), G is estimated based on the estimate of A. The success of this method requires A to be sparse and H to be a low-rank matrix.The sparsity of A can be satisfied by selecting the RIS training matrix Ψ τ to be sparse, i.e., most of its coefficients set to zero.High frequency (mmWave/THz) channels H have low rank due to dominance of reflections over scattering.He and Yuan [37] achieve the matrix factorization step using the bilinear generalized approximate message passing algorithm [38], and the matrix completion step using Riemannian manifold gradient-based algorithm [39].Other methods based on similar ideas can be found in [40]- [44].
The majority of the literature on sparse channel estimation assume that the true angles of arrival/departure lie on a discretized grid (i.e., on the discrete steering angles of the dictionary matrices).In practice, when the angles of arrival/departure do not coincide with the discrete angles in the dictionary, the sampling process leads to many non-zero sample measurements, degrading the sparse recovery algorithm.The sensitivity of sparse recovery to grid mismatch was systematically analyzed in [45], but this analysis has not been widely adopted in the sparse channel estimation literature.In an alternative approach, He et al. [46] propose atomic norm minimization for RIS channel estimation.Atomic norm is a convex function that generalizes the 1 norm for sparse recovery and nuclear norm (i.e., sum of singular values) for low-rank matrix completion.Atomic norm minimization works in the continuous domain and avoids discretization, therefore eliminating the grid mismatch problem [47].Its solution is often via semidefinite programming.

VI. WIDEBAND CHANNEL ESTIMATION
The RIS-aided OFDM model was discussed in Sec.II-B, where the frequency domain input-output relation was provided by Eq. ( 6).Based on this model, the present section provides the main ideas involved in the channel estimation for RIS-aided OFDM.The system model in Eq. ( 6) can be equivalently written as where B is an M c × N matrix whose columns are q g n q h n .The M c × T frequency-time frame is divided into two sub-frames: a training sub-frame of size M c × (N + 1) and data-transmission sub-frame of size M c × (T − N − 1).
In the training sub-frame, the received pilot sequence is given by q y j = √ ρX j Bψ j + q h d + w j , j = 1, . . ., N + 1.
, and using the pilot sequence X j = I N for j = 1, . . ., N + 1, the received training sequence q Y = [q y 1 , . . ., q y N +1 ] is given by where W = [w 1 , . . ., w N +1 ].For convenience of notation we define: From this, the matrix C containing the direct and cascaded channels can be estimated as It has been shown in [18] that choosing Ψ = F N +1 results in the least error variance.In the above method, pilots were inserted on all the M c sub-carriers of the N + 1 training OFDM symbols.In practice, when the channel is correlated in the frequency domain, fewer pilots may be employed and then channel estimates may be interpolation among sub-carriers. 8The system model in Eq. ( 29) is equivalent to: where q Bψ + q h d .Now, as shown in Fig. 5, in the training sub-frame, N p pilots (N p < M c ) are inserted in each OFDM symbol with a spacing of ∆ = Mc Np .Let P = {0, ∆, . . ., (N p − 1)∆} denote the indices of the sub-carries containing pilots.Let q x P and q y P denote the transmitted and received pilots on sub-carriers indexed by P, respectively.Also, let X P = diag(q x P ).Then, the estimate of q P can be obtained as qP = 1 √ ρ X −1 P q y P .
Using qP , the estimate of q (denoted by q) is obtained via interpolation along the subcarriers.The work in [18] applies the DFT/IDFT-based interpolation on the pilot sequence.Now, in order to resolve B and q h d from q, the RIS training states ψ j are adjusted during each training OFDM symbol and the corresponding qj is given by where v j is the error in estimating q during the pilot slot j. Then, Using the above equation, the estimate of matrix C containing direct and RIS-aided channels can be obtained as To further reduce the estimation overhead, one may group the neighboring RIS elements and use the same reflection coefficient for all the elements in each group.This solution has been suggested for both flat and frequency selective channels [16]- [23].If the N RIS elements are divided into N g groups (N g < N ) with each group containing N/N g elements, then the channel estimation requires estimating only N g aggregated RIS-aided channels corresponding to each group, instead of estimating all the N cascaded channels corresponding to each RIS element.
Therefore, the training overhead is reduced from N +1 OFDM symbol durations to N g +1 OFDM symbol durations, where each OFDM symbol duration is composed of (M c + L cp ) time-slots with L cp being the length of the cyclic prefix.The grouping strategy trades-off accuracy for overhead in order to improve the overall spectral efficiency, which is analytically characterized in [16].Zheng et al. [48] extend this estimation technique to multiuser orthogonal frequency-division multiple access (OFDMA) systems.The same authors [49] propose a fast channel estimation scheme for reducing the training-overhead in RIS-aided OFDM.The key idea is to use short OFDM symbols of ) during the training phase, which consumes (M c + L cp ) time-slots per OFDM training symbol.This reduces the (M c + L cp ) pilots that were required by [18].
The above works assume an ideal reflection model in which the RIS elements achieve the same amplitude and phase response across the entire OFDM band.Wenhao et al. [50] show that the practical response of RIS is tightly related to the frequency of the signal.Based on this, Yang et al. [51] studies channel estimation for RIS-aided OFDM under a practical reflection model and finite precision coefficients.While differing in its modeling, the estimation technique of [51] is similar to [18], as outlined above, and is omitted for brevity.

VII. REDUCING THE RIS ESTIMATION OVERHEAD
We begin by exploring savings in pilots and estimation overhead that arise from the multi-user nature of the RIS channel.An RIS-aided uplink system with K single-antenna users and M -antenna base station (BS) must estimate KM N + KM links for RIS beamforming and equalization, which can be prohibitive either under massive MIMO, or in large cells.Linear estimation techniques (Sec.III) require K(N + 1) pilot transmission slots, growing linearly with the size of RIS.We discuss avenues for reducing the pilot overhead.

A. Common RIS-BS Channel
Consider a scenario where a base station is aided by a single RIS for communication with multiple users.To describe channel estimation in this multi-user scenario, we adapt the system model from Eq. ( 1) for the multi-user uplink channel: where h dk is the direct channel from a single-antenna user k to a multi-antenna base station, and g k ∈ C N is the channel from the user k to RIS.Recall that the combined (direct and cascaded) RIS-aided channel gains to be estimated were collected into a single matrix in Eq. ( 8); a specialization of that matrix for the case of a single-antenna user is given by: Among the quantities participating in this expression, the two vectors h dk and g k are distinct for different users, but the BS-RIS matrix H is common between users.The user-by-user uplink channel estimation requires one pilot for estimating the direct channel and N pilots for estimating the cascaded channel, for a total of K(N + 1) total pilot transmission slots.But the commonality of H among users hints at possible savings in the total number of needed pilot slots, which we now explore.
To begin with, the direct channels for all users is estimated, by deactivating the RIS.This requires one pilot slot per user, but this step may not be crucial, because it is often the absence of a direct path that makes the RIS an attractive choice.In the next step, the cascaded channel (H diag(g 1 )) is measured by emitting N pilots from User 1 to the base station.This is accomplished via N successive training states at the RIS, whose details are omitted for brevity.For User 2, we now need to measure (H diag(g 2 )).This new matrix has columns that are co-directional with columns of (H diag(g 1 )), thus only the magnitude of each column needs to be measured.
For N columns, this requires N new observation samples, however, reception at the multi-antenna base station provides M independent observations per pilot transmission.Therefore, after obtaining (H diag(g 1 )), only N M pilot transmissions are needed per additional user, as long as training states are designed properly.The design of training states that ensure the requisite linearly independent observations has been explained in [52], [53].When M > N , following the above argument, it is easy to see that one pilot per user is sufficient for estimating the cascaded channels of Users 2, . . ., K. Therefore, the total training overhead of this scheme is given by where • denotes the smallest integer bigger than or equal to the argument.For massive MIMO systems with M > N , the overhead J = 2K + N − 1, meaning that each user beyond the first one requires two pilot slots.This provides significant savings over the N +1 slots needed conventionally.Guan et al. [54] propose a slight modification of this technique, in which a few stationary nodes called the anchor nodes are assumed to exist in the network.The anchor nodes transmit pilot signals and the base station estimates anchor-RIS-BS channels.Due to the common RIS-BS channel, the User-RIS-BS channels are subsequently estimated with fewer pilot transmissions.Since the anchor nodes are stationary, the estimation of anchor-RIS-BS channels are done less frequently compared with the earlier, single-reference user in [52], [53] which was not assumed to be stationary, thus resulting in additional savings.Guo and Lao [55] also explore the possibility of exploiting the common RIS-BS channel without requiring a reference user.
The methods in [52], [53] estimate the direct channel, and subsequently subtract it from the measurement intended for the cascaded channel, which is not MMSE optimal.A joint estimate of [h d1 H diag(g 1 )], i.e., User 1's direct and cascaded channels, has a lower mean squared error (MSE).Wei et al. [56] propose this joint estimation, and then the remaining user channels are estimated via the same technique as [52], [53].
This modification acknowledges and addresses the propagation of the error in the estimation of h d1 when estimating the cascaded channel of User 1.The estimate of H is used for constructing the cascaded channels of other users too, therefore in a sense, the errors committed in estimating the channel of User 1 can propagate into the estimation of other users' channels.However, since User 2 and subsequent users employ fewer pilots than User 1, it is not obvious that their channel measurements can be used to improve the estimate of H.

B. Slowly Varying BS-RIS Channel
Since the BS and RIS are static, the channel between them varies slowly.In comparison, the BS-user and RIS-user channels are more dynamic because of the mobility of the user.The high dimensional, but slowly varying BS-RIS channel can therefore be estimated less frequently, while the low-dimensional BS-user and RIS-user channels are estimated more frequently.To isolate the estimation of BS-RIS link, [57] assumes a full-duplex base station.The base station will emit pilots and listen for the reflection from the RIS.The self-interference of the full-duplex reception must be dealt with, and the BS-RIS channel recovered.Given the BS-RIS channel estimate, the direct BS-user channel and the RIS-user channel are estimated conventionally.The latter estimates are more frequent, but also require smaller overhead.If T L denotes the coherence time of the BS-RIS channel and T S denotes the coherence time of the BS-user and RIS-user channels such that T L = αT S , then the overhead of the two-timescale method is In practice, α 1 and hence the first term is small.For massive MIMO systems with many base station antennas M > N , the overhead becomes 2(N +1) α + 2K, i.e., after estimating the BS-RIS channel, each user needs two pilots.
Under α > 2, this method has smaller overhead compared with the method of Section VII-A, although one must be careful that the two methods address different channels and different base station capabilities, so they are not directly comparable.

C. Infrequent RIS Coefficient Updates
Another source of potential savings in RIS induced channels is to deliberately reduce the frequency with which RIS reflection coefficients are updated.As long as RIS coefficients are not updated, the RIS blends into the channel and effectively the system is reduced to a (multi-user) MIMO system, with conventional channel training and pilots.
Of course, this involves a tradeoff: fewer pilot slots are needed, but also, the match of RIS coefficients to the channel will go stale, therefore part of the beamforming gains of RIS will be lost.
Ideally, an analysis of this situation requires a temporally varying channel model with a corresponding temporal correlation.However, the work in this area has taken a different direction, via considering a channel model, with line-of-sight and rich scattering components.For the Tx-RIS channel G, this means the Rician model which we also saw in Eq. ( 2).A similar model is utilized for the RIS-Rx channels.
It is assumed that the infrequent update of RIS is able to fully capture the line-of-sight component, while not capturing anything about the rich scattering part of the model, even immediately following the pilot transmission.
This approximation is different from the common modeling of temporal variance in most wireless channels, in which channel knowledge is accurate at times that are proximate to the pilots, but it has the advantage of removing the complexities involved in the temporal dynamics of the channel.Thus, it reduces the problem to an equivalent problem involving a channel state that is partially known.The literature [58]- [62] refers to this new formulation of channel temporal dynamics as statistical channel state information.9 The central idea of [58]- [62] is that the line-of-sight component has a longer coherence interval than the rich scattering component.Single-antenna mobiles estimate the end-to-end uplink channel with a single pilot at the smaller coherence interval, and the base-station beamforming is also updated at the smaller coherence interval.
However, the RIS coefficient is updated only at the longer line-of-sight coherence intervals.This creates significant savings, since most of the pilot slots in the RIS-induced channel, especially for single-antenna mobiles, is needed for estimation and updating of the RIS coefficients.
Several works have attempted to maximize the ergodic downlink rates to single-antenna mobiles with beamforming f , either in the single-user or multiple user scenarios.For the single-user case, the received signal is given by The best beamformer f is found for a given set of RIS coefficients Φ, but then utilize as Φ the (fixed) RIS coefficients that are statistically the best over the variations of the channel.
Han et al. [58] achieve the inner maximization in Eq. ( 37) via maximal ratio transmission, and adjust the reflection coefficients based on an outer bound on the ergodic rate.Hu et al. [59] maximize Eq. ( 37) via alternating optimization method, Zhao et al. [60] uses a penalty dual decomposition method, Zhi et al. [61] achieve minimum user rate maximization via genetic algorithm, and Gan et al. [62] propose methods based on ADMM, fractional programming, and alternating optimization.
Several important points and open problems remain for consideration in this area.To begin with, these methods are based on the assumption that the RIS will be changed infrequently, but also calculate and optimize ergodic capacity.Therefore, the practical implementation of these techniques requires an outer code that goes across many coherence intervals of the slower channel.In many such cases, outage capacity or throughput may be a more suitable metric for optimization, and there is room for future work in this area.
Another useful direction is to find simplifications and approximations of the expression in Eq. ( 37) in order to recognize trends and/or suggest different approaches.In this area, there is a need for achievable rate (inner bound) expressions rather than outer bounds.Inner bounds for this expression have not been developed at the time of the writing of this paper.

D. Opportunistic RIS
Another strategy for reducing the estimation overhead is inspired by an idea that harks back to the concept of opportunistic transmission [65]- [67].Q randomly selected vectors {ψ 1 , . . ., ψ Q } are assigned one-by-one as RIS  phase vectors.In each instance, the RIS changes the scattering environment randomly, so there is no beamforming in the usual sense of the word.For each of these Q scattering conditions, the end-to-end multiple-input single-output (MISO) channel is measured using a few pilots, and the best one is chosen for one block of transmission.An and Gan [68] propose the above approach in narrowband channels, and study bounds on its ergodic performance.
The set {ψ 1 , . . ., ψ Q } is called a codebook in [68], however, this is a slight misnomer since this set need not be determined or agreed upon ahead of time, is statistically independent of signals emitted from transmit antennas, and is not needed at the receiver for decoding.The connection of this class of techniques with opportunistic transmission is evidenced by the appearance of the order statistics of (induced) channels in [68, Proposition 1].An et al. [69] extend this idea to OFDM transmission.

VIII. MACHINE LEARNING BASED CHANNEL ESTIMATION
Machine learning is being actively investigated for channel estimation; this section explores machine learning channel estimation in the context of RIS.
Among the early attempts at using machine learning for RIS channel estimation was a non-parametric convolutional neural network estimator by Elbir et al. [70], applied to RIS-aided downlink mmWave channels.The estimation is achieved in two phases (see Fig. 6), somewhat similar to other methods seen earlier.In the first phase, the base station transmits pilots while the RIS elements are inactive (turned off) so that the direct channel(s) are estimated at the receivers, each of them operating an instance of the convolutional neural network.In the second phase, the RIS assumes (several) training states while the base station transmits pilots.Each of the users employs the received pilots in this phase, and in combination with the estimated direct channels (obtained in the previous phase), produces an estimate of the cascaded channel using the same convolutional neural network architecture.

A. Post Processing Least Squares Estimates
Motivated by image denoising using neural networks, Kundu and McKay [71] model the problem of RIS channel estimation as that of denoising the least squares solution (see Fig. 7).Specifically, in the first step, least squares estimate of direct and cascaded channels is obtained using pilots and DFT training states.The obtained least squares estimate is viewed as a noisy version of the original channel.This is followed by a post-processing step in which the Denoising Convolutional Neural Network (DnCNN) [72] or Fast & Flexible Denoising Network (FFDNet) [73] are used. 10The least squares channel estimate is the input to the neural network, whose output is an estimate of the least squares estimation error.The post-processed estimate is obtained by subtracting the estimate of the least squares estimation error from the least squares estimate.
The optimal minimum mean squared error estimator of the channel gains is the conditional mean, but since the cascaded channel is non-Gaussian, this estimator is non-linear and difficult to characterize.This has been the main motivation mentioned in [71] for a neural network approach.One can infer that the LMMSE estimate, being linear, is akin to a first order term of a Taylor series expansion for the conditional mean estimator, and the neural network attempts to approximate the higher order terms.
Chang et al. [74] propose convolutional deep residual networks for denoising the least squares solution.Nipuni et al. [75] employ neural network denoising for wideband channel estimation in RIS-aided OFDM.Shicong et al. [36] propose an RIS architecture with a few active elements for initially estimating the low-dimensional channel, compressive sensing reconstruction of the complete high-dimensional channel, and a further refinement with a convolutional neural network.Mao et al. [76] refine the channel estimates produced by orthogonal matching pursuit using deep residual networks in RIS-aided mmWave channels.Ye et al. [77] and Jin et al. [78]

B. Partial CSI
In order to reduce the training overhead in deep learning RIS channel estimation, Gao et al. [79] propose a predictive neural network in the context of RIS-aided uplink massive MIMO.As shown in Fig. 8, the proposed method works in three stages: In the first stage, the RIS is turned off and the direct channel is estimated using least squares method, which is further refined using a fully connected neural network.In the second stage, only a part of RIS elements are activated (N 1 < N ), and the cascaded channels corresponding to the activated RIS elements are estimated using least squares method with N 1 × N 1 DFT training states and further refined using a denoising convolutional neural network.In the third stage, the cascaded channels corresponding to the inactive RIS elements are predicted using a fully connected inactive RIS channel prediction neural network.A geometric channel model is employed for the base-station to RIS channel where the channel matrix is generated from a geometric model that assigns the same gains and angles of arrival/departure to different RIS elements, with an implicit underlying assumption that the scatterers are far from the RIS and the base station.The proposed estimation needs N 1 + 1 pilots instead of N + 1 pilots, reducing the overhead.RIS channel estimation in multiuser settings rely on the principle of channel reciprocity for reducing the estimation and feedback overhead, since reciprocity enables the downlink precoding using uplink pilots/estimation in the time division duplex operation [57], [83]- [85].Channel reciprocity holds for many boundary conditions occurring in wireless communication, including reflections from large objects as well as scattering.However, in the case of RIS-assisted systems, the literature is inconclusive (see Fig. 9).Chen et al. [86] based on an equivalent circuit model claims that angle reciprocity only holds for small angles with respect to normal.In the opposite direction, Tang et al. [87] invokes the Rayleigh-Carson theorem to conclude that RIS enjoy reciprocity, but does not elaborate.
Liang et al. [88] states that the angle reciprocity depends on the design of the RIS surface, and proposes a structure that achieves reciprocity for wide range of angles.
A resolution of the differences between these results, and a conclusive determination of the conditions under which RIS-induced channels are reciprocal or non-reciprocal, will be welcomed.The system model and applications for a non-reciprocal RIS may be of interest in future applications, but is in need of verifiable theory and/or experimental evidence.

B. Mutual Coupling
A common assumption in RIS modeling is that the passive elements are spaced half-wavelength apart and the mutual coupling among the elements is negligible, allowing them to be controlled individually and independently.However, in practical planar RIS structures with fixed aperture, it is desirable to increase the number of elements by reducing the inter element spacing in order to increase the directivity of the reflected waves and thereby improve the received power.Reducing the inter-element spacing results in dependency/connection of the impedances of the neighboring elements that is non-negligible, having its effect on the channel model, estimation, and the design of reflection coefficients [89], [90].Gradoni and Di Renzo [91] have proposed an electromagnetic compliant end-toend channel model for RIS-aided communication that accounts for mutual impedance among RIS elements, while also including the effects of antenna elements at the transmitter and receiver.This impedance-based communication model is utilized in [92] to maximize the end-to-end received power by optimizing the RIS tunable load impedances in SISO system, which is further generalized in [93] for MIMO systems.In the end-to-end channel modeling of the aforementioned references, the statistical components of the Tx-RIS and RIS-Rx channels are intertwined with the circuit model parameters, which is not desirable from the signal processing/system design perspective.A more interesting and useful model is one that retains the factored form GΦH, separating the statistical part of the Tx-RIS and RIS-Rx channels, but incorporating the effects of mutual impedances and coupling into the RIS matrix Φ, making it potentially non-diagonal and function of circuit model parameters.Obtaining such a factored model and the associated estimation technique is a potential direction for future research.

C. Perfect Absorption/Reflection at RIS
Several channel estimation methods, such as those based on matrix completion and channel decomposition, depend on the ability to completely deactivate some RIS elements.Some methods depend on separately estimating the direct channel between the transmitter and the receiver, by eliminating the effect of RIS reflections, which requires deactivating all the elements.This requires the incident energy to either be completely absorbed by the RIS, or the incident waves to pass through the RIS.The feasibility of perfect absorption is debatable, with partial results whose applicability to communication systems remains unverified.Some works that study the electromag-netics of metasurfaces suggest that perfect absorption is possible at resonant frequencies with proper tuning of impedance [94]- [97], but it is unclear if/how the proposed structures and the associated methods can be utilized in the context of RIS-aided communication.The issue of perfectly deactivating the passive elements is raised in [10], [37], [70], but the question of its physical feasibility remains unresolved.Mishra and Johansson [98] assume that perfect reflection and absorption are unrealizable and incorporates two constants as implementation errors in the system model.To the best of the authors' knowledge, no currently-available study in the open literature offers an in-depth and conclusive treatment of the feasibility of perfect deactivation of RIS elements and related design issues.Also, analyzing the effect of imperfect deactivation on the accuracy of estimation methods is an important future direction for research.
In a related direction, several works explore whether and how the phase and amplitude response of an RIS elements are related [99]- [101].A few studies in reflectarrays and meta surfaces [99], [102]- [104] aim at designing structures that allow near-independent control of reflection amplitude and phase, however, their applicability to RIS-aided communication has not been established.

D. Frequency Dependence of RIS Elements
RIS-assisted wideband communications [32], [50], [105] requires RIS elements to efficiently operate throughout the frequencies of the band.In typical RIS constructions, however, the reflection coefficients are tuned by switching on or off various reactive elements or patterns that are connected to the RIS element.The effect of these tuning devices can be modeled by an equivalent circuit whose load impedance varies with the carrier frequency.The phase shift applied to the incident wave via the tunable elements is calculated at a specific frequency, often the resonant frequency of the RIS element.Within a small deviation of this center frequency, the phase shift remains linear, but across wider frequencies of operation, the phase shift might vary nonlinearly.In that case, the array factor will vary across frequency, and the beam may not retain sharpness across the band of frequencies in which RIS must operate.Several remedies have been proposed in the neighboring literature in reflectarrays, e.g., coupling the elements to true time delay lines [106] or by coupling multiple resonance elements [107]- [109].The applicability of these methods for RIS-aided communications is yet to be explored, and is a direction for future study.

E. Near Field Issues
The transmitter/receiver is said to be in the far-field of RIS if it is at a distance greater than the Fraunhofer distance 2D 2 RIS λc , where D RIS is the largest aperture of RIS and λ c is the wavelength corresponding to the carrier frequency f c [110].With the far field assumption, the incident/reflected wave from the RIS can be assumed planar, which simplifies calculations (see Fig. 10).Larger RIS structures can achieve superior SNR [7], [111], but if the transmitter and receiver locations are fixed, a sufficiently large RIS array will violate the far field assumption [112].
The near-field scenario arises, for example, when a large RIS is mounted on a large portion of the facade of a building for servicing users in the street.Without the far-field assumption, the incident waves at different elements will have unequal angular directions and polarization.The modeling of RIS in the near field scenario is considered in [113]- [115].The work in [116] accounts for the difference in the effective area of the elements from different This work was supported in part by the National Science Foundation under Grants 1956213 and 2148211.

) 3 )
Hadamard Training: Another choice of RIS training states is to use the columns of N × N Hadamard matrix, which again results in ΨΨ H = N I N and therefore Z H Z = M t N I MtMrN via the identity (12).The least squares and MMSE estimators with Hadamard training are the same as with DFT training and hence are omitted for brevity.Remark 1. (Ambiguity Problem) For any diagonal matrix D ∈ C N ×N , G D −1 G, and H HD, HΦG = H ΦG .

Fig. 3 .
Fig. 3. Training-based bounds on capacity with canonical and DFT training.

Figure 3
Figure 3 shows the training-based bounds on the capacity of RIS-assisted system with 32 RIS elements when the channel coherence interval is T = 150.The spectral efficiency with DFT training is higher compared with canonical training. 5Specifically, with equal power allocation between training and data, DFT training achieves a gain of 3.5 bits/s/Hz compared with canonical training.With optimal power allocation for both, DFT training achieves a gain of 2 bits/s/Hz over canonical training.The reason for under-performance of canonical training is that the magnitude of RIS training states multiplies the pilot power, therefore the zero coefficients in canonical training reduce the received signal-to-noise ratio for pilots, and induce a penalty.On the other hand, DFT training activates all the RIS elements in each training time slot, thereby efficiently utilizing the available pilot power.
Under low SNR conditions, this technique claims better performance than a (corresponding) two-phase least squares technique.Under high SNR conditions, the neural network technique has a performance ceiling, while the least squares techniques do not.The experiments involved a 64-antenna base station, 100-element RIS, 8 single-antenna users, and a geometric channel with 10 paths.The neural network has an input layer, an output regression layer providing complex valued channel estimates, three convolutional layers each with 256 3 × 3 filters, two fully connected layers with 1024 and 2048 nodes.The input layer has size √ M × √ M × 3 for direct channel estimation

Fig. 7 .
Fig. 7. RIS channel estimation via denoising of least squares estimate using neural networks.

Table I
c end-to-end RIS-induced channel H d direct (not through RIS) channel Z Training matrix, includes pilots & RIS training states

TABLE I :
NomenclatureA.Narrowband/Frequency-Flat Channels Consider a narrowband point-to-point communication between a transmitter with M t antennas and a receiver with M r antennas, assisted by an RIS with N elements.Let G ∈ C N ×Mt denote the channel between the transmitter and RIS, H ∈ C Mr×N the channel between the RIS and the receiver, and H d ∈ C Mr×Mt the direct channel between the transmitter and the receiver.The RIS reflection is represented with a matrix Φ = diag(ψ 1 , ψ 2 , . . ., ψ N ) whose passive elements explore generative adversarial networks for estimation in RIS-aided mmWave massive MIMO.
[82]t al.[80]propose an ordinary differential equation (ODE) based convolutional neural network for predicting the channel corresponding to inactive RIS elements.Shtaiwi et al.[81]assume a multi-user uplink channel in which the channel of different users is highly correlated, so that only a few users need to transmit pilots, and the channel of the remaining users may be predicted from the first few.A neural network is employed for the prediction.In RIS-aided uplink communication, Xu et al.[82]uses spatial correlation between the channels of different RIS elements, as well as temporal correlation of time-varying channels, to reduce the communication overhead.For exploiting spatial correlation, some RIS elements are turned off, hence corresponding channels are not directly estimated by pilots, but are interpolated from other RIS channel estimates using a convolutional neural network.For temporal correlation, a recurrent neural network is used to interpolate the channel values between two pilot transmissions.IX.FRONTIERS OF RIS CHANNEL MODELINGPowerful channel estimation techniques depend on accurate, yet convenient, channel models.RISs are relatively new devices whose channel modeling brings together aspects of electromagnetic engineering, hardware constraints, and communication system concepts.Certain frontiers of RIS channel modeling are still being explored; this section outlines several issues of contemporary interest and investigation in RIS channel modeling.A summary of the contents of this section appears in TableII.

TABLE II :
Frontiers of RIS channel modelingA.Channel Reciprocity