Semi-Blind Channel Estimation for Intelligent Reflecting Surfaces in Massive MIMO Systems

Intelligent reflecting surface (IRS) is considered as a promising technology for enhancing the transmission rate in cellular networks. Such improvement is attributed to considering a large IRS with high number of passive reflecting elements, optimized to properly focus the incident beams towards the receiver. However, to achieve this beamforming gain, the channel state information (CSI) should be efficiently acquired at the base station (BS). Unfortunately, the traditional pilot estimation method is challenging, because the passive IRS does not have radio frequency (RF) chains and the number of channel coefficients is proportional to the number of IRS elements. In this paper, we propose a novel semi-blind channel estimation method where the reflected channels are estimated using not only pilot but also data symbols, reducing the channel estimation overhead. The performance of the system is analytically investigated in terms of the uplink achievable sum-rate. The proposed scheme achieves higher energy and spectrum efficiency while being robust to channel estimation errors. For instance, the proposed scheme achieves an 80% increase in spectrum efficiency compared to pilot-only based schemes, for IRSs with $N=32$ elements.


I. INTRODUCTION
Next-generation networks are envisioned to support high data rates, low latency, and low power consumption. Such requirements are essential to support novel demanding cases such as extended reality, haptic communications, and enhanced broadband services. Unfortunately, the conventional techniques to optimize only the transceivers can not cope with these new requirements.
In this regard, the concept of smart radio environments (SREs), where the propagation channel can be also controlled, has been introduced to enhance the performance of communications. In this regard, what is called IRS can be seen as a step toward SREs, opening the door for green and sustainable future cellular networks [1], [2], [3]. IRS relies on low-cost passive elements that reflect the received signal to a specific direction by adjusting the phase shift at each element, such that the desired signals and the interference signals are added constructively and destructively at the user side, respectively.
The associate editor coordinating the review of this manuscript and approving it for publication was Parul Garg.
In order to properly optimize the IRS phases, the CSI is required to be efficiently estimated. The CSI is usually acquired, within the coherence period, using orthogonal pilot sequences that are assigned to each user. Nevertheless, channel estimation in IRS-aided environments is challenging due to two main reasons: i) the traditional pilot-based schemes can not be directly considered to estimate the IRS channel, e.g., the IRS-BS and IRS-users links, as the IRS is passive; hence, it does not have RF chains; ii) the number of IRS elements is typically high, escalating the number of required channel coefficients to be estimated, and consequently the number of needed pilots.
In this regard, we propose a new semi-blind channel estimation method, where the direct channels (BS-users) are estimated using orthogonal pilots (overhead symbols), while the reflective channels (BS-IRS-Users) are estimated by considering data (information) symbols. The proposed approach allows increasing the number of users and IRS elements without reducing the achievable rate. In the following, we start with the related work, then we summarize the main contribution. VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

A. RELATED WORK
In the existing literature, we summarize different IRS channel estimation techniques and survey different blind/semi-blind channel estimation algorithms. A simple ON/OFF reflection pattern was proposed in [4], [5], and [6] where only one IRS element is switched ON at each time slot. The reflected channel is estimated without interfering with the other IRS elements. However, this ON-OFF technique achieves a small achievable rate because most of the coherence period is consumed in pilot transmission. Another approach was proposed in [7] where a three-phase channel estimation method is suggested for IRS multi-user communication. This approach exploits the similarity of the channel between each IRS element and the BS for all users. Therefore, this channel can be estimated only one time and used for other users. The authors in [8] propose a two-phase channel estimation strategy, where the direct and reflected channels associated with a typical user are estimated in the first phase. In the second phase, the CSI associated with other users is estimated. A low complexity channel estimation scheme was proposed in [9] which consists of direction-of-arrival and path gain estimation. Authors in [10] present a new sparse recovery problem for the downlink channel estimation, which exploits the sparsity of massive MIMO channels at the BS side to significantly reduce computational complexity and training overhead.
Another challenge in IRS channel estimation arises from its low-cost passive elements that cannot transmit pilot/training signals to assist channel estimation. There are two approaches for IRS channel estimation depending on the IRS configuration: 1) Semi-passive IRS where a sensing device (receive RF chains) is attached to IRS; 2) Fully-passive IRS where there are no sensing devices used in IRS. For semipassive IRS, there are several approaches which have been proposed for IRS channel estimation [11], [12], [13].
Semi-passive IRS is an alternating optimization approach that was proposed in [11] where analog combining and random spatial sampling are used for IRS channel estimation. In [12], a direction-of-angle estimation issue was investigated in the IRS-aided MIMO system where a deep neural network was proposed to reduce the quantization error caused by the analog-to-digital converter (ADC) used in each RF channel. An IRS channel estimation algorithm based on compressed sensing and deep learning was proposed in [13] for IRS-aided SISO systems.
Regarding fully-passive IRSs, researchers studied the channel estimation for various scenarios, single-user/multiusers, narrow-band/broadband, and continuous/discrete phase shift [5], [6], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25]. Authors in [14] and [15] overcome the power loss issue that arises from ON/OFF schemes where only one element is switched ON at each time slot. They proposed an all-ON IRS pattern that can be employed with orthogonal reflection coefficients. These reflection coefficients are drawn from the DFT matrix. A simple grouping algorithm was developed in [5] where IRS elements with spatially correlated channels are grouped. This grouping technique decreases the training overhead of IRS channel estimation. Moreover, to further improve the training efficiency, a novel approach based on joint pilot sequence and design IRS reflection was proposed in [16]. In multi-user MIMO scenario, training overhead is proportional to the number of both BS antennas and IRS elements. To handle this issue, different IRS channel properties such as (sparsity, channel correlation, and channel low-rank) can be exploited to reduce the channel estimation overhead [6], [17], [18], [19], [20]. Furthermore, deep learning-aided downlink/uplink channel acquisition approaches and hierarchical search codebook designs have been developed to improve channel estimation in passive MIMO/MISO systems [21], [22], [23], [24], [25]. Unfortunately, these approaches require large overhead in terms of the training pilots.
For blind and quasi-blind channel estimation, there are several approaches which have been used for large scale MIMO networks [26], [27], [28], [29]. The received user power faces some mystery in these approaches; however, this mystery can be resolved using partial training (i.e. data-aided rather than total blind approach) [26], [29] and/or imposing a power margin between interfering signals and interesting signals [27]. In blind channel estimation [26], [27], the estimation process is done without using pilot transmission. In general, semi-blind channel estimation [28], [29] exploits both pilot sequences as well as data stream which makes it more robust than blind channel estimation. Authors in [28] and [29] jointly estimate all the channel coefficients using the maximum a-posterior estimator. There are two main drawbacks in these semi-blind schemes: 1) The system computational complexity is high [28]; 2) The accuracy of the channel estimation is low, especially when the channels of users who use the same set of training sequences are correlated. In this regard, we propose a novel semi-blind channel estimation that solve the previous drawbacks.

B. MAIN CONTRIBUTIONS
In this paper, we propose a novel semi-blind (data-aided) channel estimation method for multi-user MIMO networks assisted by an IRS. This semi-blind approach improves the achievable rate by reducing the channel estimation overhead due to the pilot transmission. More precisely, data (information symbols) are transmitted instead of the pilots to estimate some of the channel coefficients. Using data rather than pilot brings several challenges for the traditional channel estimation: i) the data is not known apriori at the receiver, in which traditional channel estimation methods (e.g., least square (LS), minimum mean-square error (MMSE)) can't be directly applied; ii) the data symbols are not typically orthogonal, causing interference at the receiver.
In this regard, we propose to estimate the direct (User-BS) channels using a conventional pilot transmission technique. Then, to overcome the previous challenges, we exploit two features of the signals: i) signals arrive from two distinct paths (i.e., direct and reflect) through quasi-orthogonal channels; ii) the data symbols from different users are orthogonal for sufficiently long sequences.
In order to analyze the performance of the proposed scheme, we derive the sum conditional achievable rate during the coherence time, accounting for the channel estimation overhead. To simplify the expression, we obtain a deterministic equivalent for the SNR and the rate as the problem dimensions tend to infinity. As a benchmark, the achievable rate of the traditional pilot-based channel estimation is derived. The novel scheme enhances the spectral efficiency for multi-user MIMO wireless communication networks employing IRS passive elements. We note that the proposed semi-blind scheme is particularly useful when: 1) large number of IRS elements are considered; 2) The cell has many users; 3) The coherence time of the channel is small [30].
In summary, the contributions of this paper can be summarized as follows.

1) A semi-blind information-aided channel estimation
technique is proposed to improve the achievable rate. 2) We derive the conditional signal-to-noise ratio (SNR) and achievable uplink rate of the proposed scheme. 3) We obtain a lower bound expression for the outage probability. 4) We derive asymptotic closed-form approximations for the SNR and the rate. 5) We show the advantage of the proposed scheme in terms of the achievable rate over previous schemes, ON-OFF scheme [4], [5], [6], the three phase channel estimation method [7] and a two-phase channel estimation scheme [8].
Notation: We use boldface (lower case) for column vectors, x, (upper case) for matrices, X. Let X −1 , X H , X T , and X † denote the inverse, conjugate transpose (Hermitian), transpose, and pseudo inverse of X, respectively. The matrix trace function is denoted by tr(X). . is the Euclidean norm. A circularly symmetric complex Gaussian random vector x is denoted by x ∼ CN (µ, ϕ), where µ is the mean and ϕ is the covariance matrix. The set of all complex numbers is denoted by C. with C N ×1 and C N ×M being the generalizations to vectors and matrices, respectively. The M ×M identity matrix is written as I M .

II. SYSTEM MODEL
We consider a synchronous cellular system consisting of a single cell with one BS. The BS, equipped with M antennas, serves K single-antenna users. In order to improve the spectral efficiency, an IRS with N passive reflecting elements is considered (cf. Figure 1). A block fading channel is assumed where each channel is static for a coherence period of length S = W c × T c symbols, where T c is smaller than the channel coherence time of all users, and W c is smaller than the coherence bandwidth of all users. Let h j ∈ C M ×1 be the direct channel coefficient between user j and BS. The direct channel is modelled as zero mean circularly symmetric complex Gaussian channel h j ∼ CN (0, d(z j )I M ), where z j ∈ R 2 is the location of user j and d(z) is the channel variance that accounts for the shadowing effect and the path loss between BS and the user in location z. The uplink reflected channel from user j to BS through the nth IRS element is given by where t jn ∈ C 1×1 is the channel coefficient between user j and nth IRS element, and r n ∈ C M ×1 is the channel coefficient between nth IRS element and BS. The uplink reflected channel from user j to BS through the nth IRS element is g jn has a covariance matrix equal to d n (z j )I M , 1 where d n (z j ) is the channel variance that accounts for the shadowing effect and the path loss for the reflected channels. In practice, an initial training phase is required to obtain the first estimates of the covariance matrix d n (z j ). We assume the user's position is changed slowly, so the BS can track the values of d(z j ), d n (z j ) over long period of time [31], [32].

A. UPLINK CHANNEL ESTIMATION
The total number of channel coefficients that needed to be estimated is KM + KMN for both direct and reflecting channels. In order to estimate all these channels, we use pilotbased scheme and data-based scheme for the direct channels and the reflecting channels, respectively. Our framework is divided into N + 2 stages (cf. Figure 2). In stage-0, a pilot transmission is used to estimate the direct channels h j , ∀j ∈ {1, 2, · · · , K }, where all IRS elements are OFF. The reflected channels g jn , ∀n ∈ {1, 2, · · · , N }, are estimated sequentially in stage-1 until stage-N using data transmission (carrying information) rather than pilots. More precisely, for stage-1, only one IRS element is turned ON (it can be any element, however, we assume its the first IRS element for simplicity).
In stage-n for n ∈ {2, 3, · · · , N }, the 1 st and n th elements are turned ON, while all the remaining elements are kept OFF. In the last stage (N + 1), data is sent from all K users while all IRS elements are ON. This procedure is discussed in detail below.

1) STAGE-0 [DIRECT CHANNEL ESTIMATION USING PILOTS]
All the IRS elements are turned OFF, and each user transmits an orthogonal pilot sequence The received uplink pilot in stage-0 at BS can be expressed as where p i is the received power for user i, and N p ∈ C M ×τ is i.i.d. noise matrix with elements distributed as CN (0, σ 2 n ), where σ 2 n is the noise variance. The channel coefficient for user j can be estimated using LS estimation, 2 i.e., whereh j is the channel estimation error for user j. Let us define C and as the covariance matrices of both the channel estimation error and estimated channel, respectively. These covariance matrices are important for the rate analysis which is discussed later. The covariance matrix of the LS channel estimation error is while the covariance matrix of the LS channel estimation can be written as

2) STAGE-1 [REFLECTED CHANNEL ESTIMATION USING DATA, ONLY FIRST IRS ELEMENT IS TURNED ON]
After estimating the KM coefficients of the direct channels in stage-0, the reflected channel coefficients g j1 is estimated.
In stage-1, only the first IRS element is turned ON, and all other elements are turned OFF. Also, we consider data (information) symbols rather than the pilots for the coefficient estimation to improve the achievable rate. In particular, each user sends data symbols x j ∈ C 1×α with elements drawn from a Gaussian codebook with variance σ 2 x . 3 The received uplink data at the BS in stage-1 from all users can be expressed as (6) where N d ∈ C M ×α represent the noise with elements distributed as CN (0, σ 2 n ). The traditional channel estimation methods can not be used to estimate the reflected channel of the first IRS element as we transmit data, not pilots. Thus, in the following, we provide a novel approach to estimate the reflected channel using the unknown data. More precisely, we propose to first exclude the effect of the unknown data x i by computing the Gram matrix of the received uplink data, i.e., Since the transmitted symbols from different users are mutually independent, for large problem dimension, we have x α ≈ 1 and [38]. Therefore, (7) can 3 The Gaussian codebook for the data is usually adopted as an approximation for higher order modulation [28], [29], [34], [35], [36], [37]. be approximated as For large M , the users' channels are quasi-orthogonal. This related to massive MIMO properties, i.e., 1 [39]. In order to apply massive MIMO in (8), we need to multiply each term by h j M . However, we don't know h j , thus we multiply (8) byĥ j M and get where using massive MIMO properties After some manipulation, we get In order to estimate g j1 , we subtract (p jĥj × 1 Mĥ H jĥ j ) from the previous equation, i.e., . By dividing the result by p j d(z j ) we obtain, Equation (11) shows that the reflected channel coefficient of the first IRS element for user j can be estimated using the estimated direct channelĥ j as [40] (12) where the channel estimation error is whereh j is defined in (3). We compute both the covariance matrices of the channel estimation error and the estimated channel denoted by C j1 and j1 , respectively. These covariance matrices are used later for the rate analysis. The covariance matrix of the reflected channel estimation error of the first IRS element is derived using (13) The covariance matrix of reflected channel estimation where E{g j1 g H j1 } = d 1 (z j )I M . In stage-1, the first reflected channel g j1 is estimated. The remaining reflected channels g jn , n ∈ {2, 3, · · · , N } are estimated in stage-i (∀i ∈ {2, · · · , N }), sequentially. In each stage, two IRS elements are turned ON (1 st and n th elements). 4 The received uplink data at BS in stage-n can be expressed as (16) where q i1 = h i + g i1 , the combined channel estimation is defined asq i1 =ĥ i +ĝ i1 and the combined channel estimation error isq i1 =h i −g i1 . By exploiting the estimated direct channelĥ j (defined in (3)) and the estimated channel of the first reflected elementĝ j1 (defined in (12)), the channel estimation of the n th IRS element is given bŷ Note that the channel estimation errorg je does not depends on n (the index of the active IRS element). Again for the rate analysis (later), we use (17) to compute the covariance matrix of the channel estimation error in n th stage which is given by This covariance matrix is the same for all n-th IRS elements (c j2 = c j3 = · · · = c jN = c je ). Note that the value of α (data length in stages-n, n ∈ {1, 2, · · · , N }) has an impact effect on the estimation process of the reflected channels. It affects the user rate as well as the estimation accuracy, where in order to obtain higher estimation accuracy the value of α needed to be large. However, larger α represents an overload as not all IRS elements are turned ON. In the following sections, we obtain the data rate and use the outage probability to analyze the impact of α on the estimation accuracy.

B. UPLINK DATA DETECTION
After estimating the channels, the uplink data detection phase 5 starts in stage-N + 1 to detect the transmitted data x i ∈ C 1×β , i ∈ {1, 2, · · · , K } from all the K users. The received uplink data at BS in stage-N + 1 is given by where β = S − τ −N α. In order to detect the transmitted data x j from user j, BS performs normalized matched filter (MF) detection using a linear detector w MF j ∈ C 1×M given by whereŵ j =ĥ j + N m=1ĝjm is the sum of all the estimated channels. Using (19) and (20), the detected signal of user j is given byx j = w MF j Y N +1 . Note that the same data detection approach can be used to detect the transmitted data in stage-n ∀n ∈ {1, 2, · · · , N }. Also note that the design of 5 The BS performs only data detection in Stage-N + 1, unlike previous stages from 1 to N where it conduct channel estimation and data detection. the reflecting channel coefficients affects both the channel estimation and the data detection. In both phases, we deal with the reflected channel g jn which has the covariance matrix equal to d n (z j )I M .

III. ACHIEVABLE RATE
In this section, we obtain the achievable sum rate and its deterministic expression in the uplink MIMO system. This achievable rate is used as a performance criteria in order to show the advantage of the proposed channel estimation scheme over other schemes. In order to obtain the achievable sum rate, we need to calculate the signal-to-interferenceplus-noise ratio (SINR). More precisely, the SINR is derived from data detection which exists in phase II and phase III.
In the following, we analyze the achievable sum rate for the considered uplink MIMO system with the proposed channel estimation technique.

A. DATA DETECTION IN PHASE II
Phase II includes stages-n ∀n ∈ {1, 2, · · · , N }. In each stage, first, we detect the data using normalized MF detection. Then, we use the detected data to determine a lower bound on the SINR for each user. After that, we use some massive MIMO properties and Taylor approximation to get a deterministic expression of SINR as the problem dimension tends to zero with a fixed ratio. Finally, we obtain the achievable rate and its closed form. In the following, we describe this in detail.

1) STAGE-1
The first IRS element is turned ON. The detected data of user j in stage-1 is performed using (6) and (20) where the combined actual channel is q j1 = h j + g j1 and the combined estimated channel isq j1 =ĥ j +ĝ j1 . Using (3) and (13), the combined channel estimation error isq j1 =h j − g j1 , whereq j1 = q j1 +q j1 . Similar to the analysis of the worst-case uncorrelated noise in [41], we can obtain a lower bound on the SINR of user j (also cf. [39], [40]). A lower bound on the SINR of user j in phase II (stage-1) is given by In order to obtain a deterministic expression of SINR in (22), first we need to approximate the pseudo inverse term in the MF detectorq MF j1 = 1 √ p jq † j1 = 1 √ p j (q H jqj ) −1qH j and q j1 = q j1 +q j1 . Using the first order Taylor approximation around zero (cf. [42]), the normalized MF detector can be expressed as ). Second, more simplification can be accomplished via applying massive MIMO We obtain an expression on closed form for the SINR [43]. Lemma 1: A deterministic expression of SINR for user j in phase II (stage-1), when the number of BS antennas is large and the transmitted symbols from different users are mutually independent Proof: We start the proof by applying the Taylor approximation to the normalized MF decoder. Then by applying both the square matrix properties and the massive MIMO concept, we obtain a lower bound in the SINR. Please refer to Appendix VI-A for more detail.

2) STAGE-2 TO STAGE-N
Two IRS elements (1 st and n th elements ∀n ∈ {2, 3, · · · , N }) are ON. We keep only two IRS elements ON in each stage, to reduce the channel estimation error and increase the rate gain due to the IRS (as discussed before). Compared to stage-1, the data is received at the BS from three different paths (cf. (16)). The detected data of user j in stage-n is performed using (16) and (20) (24) where the combined actual channel is q jn = h j + g j1 + g jn = q j1 + g jn and the combined estimated channel isq jn =q j1 + g jn . Using (3), (12), (17), the combined channel estimation error isq jn =h j −g j1 −g je , whereq jn = q jn +q jn . Similar to the analysis in (22), the lower bound on the SINR of user j in phase II (stage-n) is given by The deterministic expression of the SINR in stage-n can be expressed as shown in the following Lemma. Lemma 2: The closed-form of the SINR for user j in phase II (stage-n ∀n ∈ {2, 3, · · · , N }) where d j = d(z j ) + d 1 (z j ) + d n (z j ) and c jn = c je (18).

B. DATA DETECTION IN PHASE III
In stage-N + 1, all IRS elements are ON and all the channels have been estimated. Using (19), (20), the detected data of user j in stage-N + 1 is given bŷ where j . We follow the same mathematical analysis of the previous subsection, however, here all the IRS elements are ON such that the MF decoded vector includes all the estimated channels. The lower bound on the SINR of user j in phase III (stage-N + 1) is given by Similar to (23), a deterministic form of SINR is obtained for Equation (28) using the Taylor approxi- and massive MIMO properties. The following Lemma shows the deterministic expression of the SINR.
Lemma 3: The SINR achieved for user j in phase III (stage-N + 1), (29) as shown at the bottom of the next page.
After we get the lower bounds on SINRs and their deterministic expressions in phase II and phase III, now we can obtain the achievable rate and its approximate expression. VOLUME 10, 2022 Using (22), (25) and (28), the achievable sum rate for user j is given by And, the approximate uplink achievable rate using (23), (26) and (29) is given by

IV. OUTAGE PROBABILITY
The outage probability, which is defined as the probability that the communication link cannot be closed, is an important parameter for characterizing the system performance. We use outage probability to evaluate the proposed channel estimation scheme. Moreover, the impact of data length α symbols (used in a semi-blind approach to estimate the channel in Phase II stage-1 to n) can be studied in terms of channel estimation error (provided in the simulation section) and the system outage probability which we now discuss. In the following, we provide a detailed analysis of how to obtain a lower bound on the outage probability as a function of α.
We start by determining the upper bounds on SINRs in different phases. Then obtain an upper bound on the achievable rate, which is used to obtain a lower bound on the outage probability. The outage probability is given by where R th is the threshold target rate. We obtain a deterministic expression for the outage probability as a function of R th and α, assuming high SNR. We start by obtaining the estimated channel when only the first IRS element is ON (stage-1). The BS performs LS channel estimation for the direct channel (cf. Equation (3)). Using (12), the reflected channel can be written aŝ Equation (33) can be simplified by assuming high SNR (h j = 0, c j = 0) and it can expressed aŝ where r = x j x H j σ 2 x α . Similar, (17) can be expressed aŝ g jn = rg jn + (r − 1)(h j + g j1 ).
Lemma 4: Upper bounds on the SINR of user j are given by Proof: We start the proof by assuming a high SNR, in which a lot of terms are approximated. Then by applying massive MIMO properties, we obtain an upper bound on SINR in both phase II and phase III. More simplification can be done by assuming a small value of r − 1. Please refer to Appendix VI-B for more detail.
As shown in Lemma 4, the upper bounds on the SINR is a function of r. In order to obtain a bound on the outage probability, we need to get the probability density function (PDF) of r. Since r = x α is a random variable that is exponentially distributed; the PDF is given by The mean E[r] = µ = 1/α and the variance Var[r] = σ 2 = 1/α 2 . The F r (x) is cumulative distribution function of r given by Using the upper bounds on SINR obtained in Lemma 4, we can derive a lower bound on the outage probability. Starting from (32), and applying the upper bounds on SINR we get an upper bound on the achievable rate R j | upper bound which we use to obtain a lower bound on the outage probability given by where Q(x) is the Q-function, which can be written as Q(x) = 1 − φ(x) and φ(x) is the cumulative distribution function (CDF). Equation (37) can be simplified when we consider a large number of BS antennas (M ) as shown in the following Lemma. Lemma 5: The lower bound on the outage probability is given by where R th can be expressed as where We use η to have R th percentage of maximum achievable sum-rate, i.e., R th = η × max(R). We evaluate the performance of the outage probability by means of simulations. Furthermore, the proposed channel estimation scheme of the reflected channels (Equation (12)) has a computational complexity O(αNM 2 ) which is the same as the benchmark scheme (LS). Compared to the benchmark scheme, our proposed scheme provides a better performance in terms of the achievable sum rate as the data is transmitted during the estimation phase.

V. SIMULATION RESULTS AND ANALYSIS
In this section, we conduct several Monte Carlo simulations to evaluate the performance of the new semi-blind channel estimation in IRS-assisted multi-user communication system. A hexagonal-cell with a radius of half a kilometer is considered and the minimum distance between each user and its serving BS is 0.14 × 0.5 km. The variance of channel attenuation d(z) = C z−b κ is used to model the path-loss, where b is the location of the BS, z is the location of user, and C is the shadow-fading modelled as 10 log 10 C ∼ N (0, σ 2 sf ). We also use the following parameters: the path-loss exponent is κ = 3.2 and 2.2 for h (User-BS) and g (User-IRS-BS), respectively, and σ 2 sf = 5. There is a K single antenna users transmit over a coherence block of length S = 500 symbols through N = 32 IRS elements. The length of the pilot sequence τ = K and the data transmission in phase II is α. For power control, we apply channel inversion for both pilot sequence as well as data stream p j = ρ d(z j ) [31], [32], [44], where ρ > 0 is a design parameter. 6 Therefore, the average channel gain between users and BS in the uplink phase is constant: E{p j h j 2 } = M ρ. Thus, the average received SNR at the BS is ρ / σ 2 . In our simulation, ρ / σ 2 is set to 0 dB in order to allow for decent channel estimation accuracy.
The simulation section is divided into three main parts: 1) we evaluate the proposed semi-blind channel estimation scheme by plotting the mean square error (MSE); 2) we show the tightness of the deterministic expression of the achievable rate with respect to the actual value for different system parameters. Moreover, we compare the proposed channel estimation scheme with other schemes; 3) we evaluate the outage probability for various system parameters.
In our model, we have two types of channels, direct channels and reflected channels. For the direct channels, we use LS channel estimation (cf. Equation (3)) with pilot length τ = K , so there is no pilot contamination. For the reflected channels, we use semi-blind channel estimation. Figure 3 shows the variation of the NMSE of the reflected channel estimation with the number of users for M = 128 antennas and different values of the data length α. It shows that the NMSE decreases with increasing α and a higher estimation accuracy is obtained. As a higher value of α ensures that the transmitted symbols from different users are mutually independent and our proposed data-aided channel estimation scheme (Equation (12)) works probably. However, there is a different behavior with increasing the number of users. Since, increasing the number of users means more interference in which the NMSE increases. However, smaller values of α cause our proposed channel estimation scheme not to work probably. Thus, the behaviour for a small value of α can not be predicted correctly. Figure 3b shows the variation of NMSE with the data length α for a fixed number of BS antennas M = 512. The NMSE decreases with either increasing the data length α or increasing K , for smaller values of α. However, this behavior with K changes for a higher value of α = 11. Thus the values of the system parameter should be carefully chosen in order to obtain a lower channel estimation error. Figures 4a  and 4b, show the variation of the NMSE with the number of BS antennas M for different values of α and K , receptively. In conclusion, the NMSE decreases with either increasing the data length α or increasing the number of BS antennas M or decreasing the number of users K .
In the second part of the simulation, we simulate the achievable rate when using LS channel estimation for the direct channels and data-aided channel estimation for the reflected channels for different number of users K , different value of α and different number of BS antennas M . Figures 5a and 5b show how tight the obtained approximate rate (Equation (31)) to the exact rate (Equation (30)) for different values of the system parameters. Figure 5a shows an improvement in the achievable rate when either the number of BS antennas M increases or the number of users K decreases, when data length α = K . In figure 5b, shows that there is small improvement in the achievable rate with increasing α until α = 10 where there is a dramatically improvement.   In order to simulate for higher number of IRS elements (N ) we need to increase the coherence block length to be equal to S = 1800 symbols since S = τ + N α + β (c.f.   number of IRS elements in both of the NMSE and the achievable sum rate, respectively. Figure 6 shows that the NMSE of the reflected channel estimation decreases with increasing the number of IRS elements (N ) and then saturated. This decreases in the NMSE happened due to the variation in the correlation between the reflected channels, in which the accuracy of the channel estimation of some of the reflected channels may be better than the other estimated reflected channels. Figure 7 shows that the uplink achievable sum rate increases with increasing the number of IRS elements (N ) then decreases. 7 Therefore, in our proposed data-aided channel estimation scheme, the number of IRS elements which will be added to the network needed to be selected carefully in order to keep the performance high (this subject to our future work). 7 Note that, increasing the number of IRS elements (N ) can be considered as an overload in phase II as only two IRS elements are turned ON for length N α. In the other side increasing N can be considered as a gain in the achievable rate in phase III as all N elements are turned ON (c.f. figure 2). We compare the proposed scheme (semi-blind channel estimation) with three other schemes; 1) The traditional ON-OFF scheme [6], 2) The three phase channel estimation scheme [7], and 3) The Two phase channel estimation scheme [8], for α = K .
In figure 8, we compare our proposed approach with traditional ON-OFF scheme [4], [5], [6], a three phase channel VOLUME 10, 2022 estimation scheme [7] and a two-phase channel estimation method [8]. It's shown that our proposed approach provides much improvement in the achievable rate at different values of the system parameters (M and N ). Figures 9a and 9b show the outage probability and its lower bound with respect to the number of BS antennas M and the number of users K , respectively. Again, we use LS channel estimation for the direct channels and MF for data detection. It is shown that the outage probability increases either with increasing the number of BS antennas M or increasing the number of users K . Also the outage probability is proportional with the data length α.
In conclusion, the value of data length α in phase II is an important parameter in our proposed data-aided channel estimation algorithm where a larger value of α ensures that the transmitted symbols from different users are mutually independent (which is the key idea behind our proposed scheme). Figure 4a shows that for K = 11 users and α = [K , K − 1, K − 2, K − 3] = [11,10,9,8], the NMSE of the reflected channel estimation decreases with increasing the number of BS antennas M (massive). However, for a small value of α = K − 4 = 7 the NMSE of the reflected channel estimation increases with M . Thus, to have a small NMSE the value of α should be greater than 7. However, a larger value of α introduces an overload as only two IRS elements are turned ON during this time. This will affects the outage probability (c.f. Figure 9), as higher values of α are related to higher values of the outage probability. Therefore, the lowest value of α is K − 3 = 8 to ensure that the data symbols from different users are mutually independent and that our proposed algorithm work probably.

VI. CONCLUSION
In this paper, we study channel estimation in an IRS-assisted multi-user communication system, in which a new semi-blind channel estimation approach is proposed. In this scenario, we use pilot-based channel estimation for estimating the direct channels, and data-aided channel estimation for estimating the reflected channels. The new channel estimation scheme shows an improvement in the achievable rate compared with other channel estimation schemes at different values of the system parameters. Moreover, we analytically obtain a deterministic expression for the sum rate and show through simulation how this closed form is tight to the exact value. Furthermore, we obtain an expression for a lower bound on the outage probability using massive MIMO properties and polynomial approximation. Starting from Equation (21) and using Taylor approximation, i.e.,q MF j1 ≈ 1 √ p j q † j1 (I M −q j1 q † j1 ), the detected signal can be written aŝ Using (40), we can determine the lower bound on the SINR of user j from γ II j1 = σ 2 x E{n j n H j |q j1 } = σ 2 x P 1 + P 2 + P 3 (41) such that P 1 is given by Equation (42) is obtained through exploiting the following Lemma and using the facts, i.e, q † Lemma 6: Let A ∈ C K ×K . Then for any matrix X ∈ C M ×K ∼ CN (0, σ 2 x ), we have [42] E{XAX H } = σ 2 x tr(A)I M By applying massive MIMO properties, i.e., 1 M q H j1 q j1 = d(z j ) + d 1 (z j ) and 1 Mq H j1qj1 = c j − c j1 . P 2 and P 3 are given by