Design of an Intelligent Reflecting Surface aided mmWave Massive MIMO using X-Precoding

In this paper, we consider an intelligent reflecting surface (IRS) aided millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) system using hybrid beamforming/combining. To enhance error performance, we adopt X-code (or X-precoder), a low-complexity precoding technique for traditional MIMO channels, to encode information symbols. We first derive an upper bound on word error rate (WER), based on which we design jointly IRS phase shifts and X-code (or X-precoder) to minimize WER. Specifically, we propose two algorithms for IRS design exploiting alternating optimization and gradient ascent optimization methods. Then we devise X-code and X-precoder, respectively, by minimizing average WER over all channel realizations and WER for each channel realization. We also provide their diversity analysis. Further, we present the procedure of decoupling fully digital beamformer/combiner at transceiver into the optimal hybrid one. Finally, simulation results show that both IRS optimization algorithms have similar WERs whereas gradient ascent approach has a lower computational complexity. Simulations demonstrate that the designed X-code (or X-precoder) provides a significant performance gain.


I. INTRODUCTION
Driven by the ever-increasing demand of transmission capacity and spectral efficiency, the emerging technologies such as millimeter wave (mmWave) communications and massive multiple-input multiple-output (MIMO), have been enabled in the fifth-generation (5G) network [1]- [4]. It is known that mmWave bands can offer order of magnitude greater bandwidth, but mmWave also incurs high path loss, leading to unfavorable communication environment [5], [6]. As a means to compensate severe propagation loss, large antenna arrays in massive MIMO are utilized to achieve high beamforming gains and coverage extension. Short wavelengths associated with high frequency facilitate the use of massive MIMO thanks to the reduced sizes of antenna arrays [4]. Moreover, mmWave signals are sensitive to blockage, even a small obstacle, such as human body, can result in 20dB decrease in signal strength [7]. Hence, it is necessary to develop a new technology to cope with this issue.

A. INTELLIGENT REFLECTING SURFACE
Recently, a promising technology, intelligent reflecting surface (IRS) has drawn an increasing attention to combat blockage in mmWave and assist massive MIMO to achieve better performance [8], [9]. IRS is a two-dimensional metasurface which consists of a large array of passive scattering and low-cost elements. Each element is controlled by an embedded controller to change electromagnetic properties (e.g. amplitudes and phase shifts) of incident radio frequency (RF) signals [10]. By smartly adjusting passive elements at IRS, RF signals can be added coherently to achieve advantageous propagation conditions, such as improving energy efficiency [11]- [14], increasing capacity [15]- [25], decreasing error rate [26], mitigating user interference [27]- [29], and so on.

B. IRS-ASSISTED MASSIVE MIMO
A lot of research has been conducted for the optimization design and performance analysis of IRS-aided mmWave massive MIMO systems (see [11]- [30] and references therein).
It is worth mentioning that the authors in [18] developed an alternating optimization algorithm to maximize the capacity. This algorithm iteratively optimizes one reflecting phase of IRS at a time while others are fixed, which however has large computational complexity. Another research focus is on the joint design of hybrid beamforming and IRS reflection elements (see [14], [22]- [26] and references therein). For example, the authors in [23] designed jointly active and passive beamforming to maximize spectral efficiency by proposing a manifold optimization algorithm. This approach has the advantage of creating favorable channel matrix with small condition number. On the other hand, the error performance of IRS-aided mmWave massive MIMO has been rarely investigated. In [26], the authors focused on broadband transmission and proposed a geometric mean decompositionbased algorithm, leading to a satisfactory bit error rate (BER) performance.

C. PRECODING USING X-CODE (OR X-PRECODER)
To further enhance performance of IRS-aided mmWave massive MIMO without introducing extra significant complexity, we consider to use X-code (or X-precoder), a low-complexity precoding technique proposed in [31], [32] for traditional MIMO. In [31], [32], a MIMO channel was first decomposed into a number of parallel subchannels via a singular value decomposition (SVD) operation. Then X-code (or Xprecoder) was used to pair the subchannels with good and bad diversities to improve overall diversity, and an optimal pairing strategy was proposed. Essentially, X-code and Xprecoder share the same precoding matrix structure, in which the parameters are designed differently. The parameters for X-precoder was designed to optimize the error performance for each channel realization, while those for X-code were designed to optimize average error performance over all channel realizations. Finally, a low complexity maximum likelihood (ML) detection was conducted to each pair rather than the entire MIMO channel.

D. OUR CONTRIBUTIONS
In this paper, we consider an IRS-assisted mmWave massive MIMO, where both transmitter (Tx) and receiver (Rx) are deployed with large numbers of antennas, followed by RF chains. The information symbols are precoded by X-code (or X-precoder) [31], and processed by digital/analog active beamforming. The data transmission arrives at the Rx via an IRS, followed by analog/digital combining. Finally, ML detection is used to recover the information symbols. Under such a setting, we perform a joint design of IRS, X-code (or X-precoder), as well as transceiver beamforming to improve overall error performance. Detailed contributions are summarized below.
• We conduct performance analysis and derive an upper bound on word error rate (WER). As a result, we propose a system design criterion, where we design jointly IRS phase shifts and X-code (or X-precoder) to minimize the upper bound on WER.
• Based on the design criterion, we first design the IRS phase shifts. To find the optimized IRS phase shifts, we explore two optimization methods: alternating optimization and gradient ascent optimization. We also compare their computational complexity and convergence. By simulations, we verify that both methods lead to similar performance, whereas gradient ascent approach has lower computational complexity than alternating approach. • We then design the X-code (or X-precoder) to improve error performance. It is known in [31] that both Xcode and X-precoder adopt the same precoding matrix structure (see (4)), but their parameters in (4) should be designed differently. Hence, for X-code, we design the parameters in (4) to minimize average WER over all channel realizations, while for X-precoder, we design them to minimize WER for each channel realization. For both schemes, we provide diversity analysis. Simulation results show a significant performance gain. • We also design the digital (or hybrid) active beamforming to improve the overall system performance.
Simulations demonstrate the effectiveness of our design. The rest of the paper is organized as follows. Section II introduces system model. Section III presents problem formulation. Section IV presents IRS phase shifts design. Section V provides X-precoding design, followed by design of hybrid precoding/combining in Section VI. Simulation results and conclusions are given in Sections VII and VIII.
Notations: Vectors are boldface letters and matrices are boldface capital letters. Scalars are lower case letters. A i,j represents i-th row and j-th column entry in matrix A. The ||A|| F denotes Frobenius form of A. Superscripts T represents transpose and H represents conjugate transpose. R a×b and C a×b represent space of matrix with size a × b in real value and complex value, respectively. The notation tr(·) denotes the trace operation, and |A| denotes the cardinality of a set A. The operator diag{a 1 , . . . , a n } creates a n×n matrix with the vector values along the diagonal. The operators (u) and (u) are used to obtain the real and imaginary parts of a complex vector u. The notation E[·] denotes the expectation operation. The p F q (a 1 , . . . , a p ; b 1 , . . . , b q , x) stands for a hypergeometric function [33] and Q(·) stands for the Q-function. Also, f (·) and f (·) denote the first and second order derivatives of the function f (·). And CN (0, σ 2 ) is complex Gaussian distribution with zero mean and variance σ 2 .

II. SYSTEM MODEL
We consider a point-to-point mmWave massive MIMO IRSaided wireless communication system, transmitting N s data streams (see Fig. 1). At Tx, there are N t transmit antennas with N RF t RF chains, while at Rx, there are N r receive antennas with N RF r chains, where N s ≤ min(N RF t , N RF r ) min(N t , N r ). An IRS equipped with M passive elements is deployed to enhance communication between Tx and Rx.

A. TRANSMITTER SITE
At the Tx, the information symbol vector u = [u 1 , · · · , u Ns ] T ∈ C Ns are assumed to be drawn from the normalized M-quadrature amplitude modulation (QAM) square constellation, i.e., where is used such that the average symbol energy E s = 1, and u R and u I are real and imaginary parts of the non-normalized QAM symbol associated with u i . Then the information vector u is first precoded by the X-code (or X-precoder) 1 , proposed in [31] (see details in Section II-A1), yielding where Φ ∈ R Ns×Ns is the X-code matrix [31] (see (6) in Section II-A1). The symbol vector s is further processed by a digital baseband precoder F BB ∈ C N RF t ×Ns and an analog RF precoder F RF ∈ C Nt×N RF t , leading to the transmitted signal vector where s is assumed to satisfy E[ss H ] = I Ns . Also, the transmit power constraint is enforced by normalizing F BB such that ||F RF F BB || 2 F = N s .

1) Precoding using X-code
As introduced above, we consider the information symbol vector is precoded by X-code to improve the error performance and gain better channel diversity. We first recall X-code, proposed in [31] for MIMO system that achieves the best diversity order, and then we show how we adapt the X-code in our system. Without loss of generality, we consider an even N s . In [31], a MIMO channel was decomposed (via a SVD operation) into a number of parallel subchannels, and the X-code was used to pair the subchannels with good and bad diversities, yielding a total of Ns 2 subchannel pairs. Further, an optimal pairing strategy was proposed in [31], where each pair has the pairing list of for k = 1, . . . , Ns 2 , which is further encoded by the following 2 × 2 precoding submatrix where φ K denotes the phase rotation in A k . Collecting all A k , ∀k, we form the entire X-code precoding matrix Φ as In addition, in [31], phases φ k in (6) and the pairing strategy are designed to improve the overall MIMO diversity.
In this paper, we adopt the same pairing strategy as (4) and the same precoding matrix structure as (6) with submatrices (5). Further, to avoid increasing transmit power, we impose orthogonality constraint on each A k , which consists of a single phase φ k , for k = 1, · · · , Ns 2 . Later in Section V, we will design the optimal φ k , ∀k, to achieve the improved error performance and channel diversity.

B. IRS
As shown in Fig. 1, an IRS is used to assist communications between the Tx and Rx. The IRS has M passive elements, VOLUME XX, 2022 each of which can be adjusted to change phase shift by an embedded controller. Let θ = [θ 1 , θ 2 , · · · , θ M ] and Θ = diag{βe jθ1 , βe jθ2 , · · · , βe jθ M } ∈ C M ×M be diagonal phase shift matrix for IRS, where θ m ∈ [0, 2π) and β ∈ [0, 1] are the phase and amplitude introduced by each IRS element. For simplicity, we set β = 1 to maximize signal reflection in the sequel of this paper.

C. CHANNEL MODEL
From Fig. 1, we let G ∈ C M ×Nt and R ∈ C Nr×M denote the Tx-IRS and IRS-Rx links, respectively. Then H RΘG ∈ N r × N t stands for the cascade channel from the Tx to Rx through IRS. Following the Saleh-Valenzuela (SV) channel model [34] for mmWave, assuming the uniform planar arrays (UPAs) at Tx and Rx, the Tx-IRS and IRS-Rx channels are given by where N cl and N ray are number of clusters and number of rays per cluster, α i,l and β i,l are complex channel gains of l-th ray in the i-th cluster, and η r i,l (η t i,l ) and ϑ r i,l (ϑ t i,l ) represent the azimuth and elevation angles of arrival (departure) associated with the Rx (Tx), and a r,T1 with T 1 ∈ {IRS, Rx} and a t,T2 with T 2 ∈ {Tx, IRS} denote the normalized receive and transmit array response vectors, respectively. The array response vector of a half-wavelength spread UPA with N h ×N v elements can be expressed as a(η, ϑ) a I,T (η, ϑ) where I ∈ {Rx, Tx}, T ∈ {T 1 , T 2 }, and η, ϑ denote the azimuth and elevation angles.

D. RECEIVER SITE
At the Rx, assuming that CSI is available [35]- [37], the received signal is then given by where r ×Ns is digital baseband combining matrix, and n is zero-mean additive white Gaussian noise (AWGN) with noise power N 0 , i.e., n ∼ CN (0, N 0 ). The Tx signalto-noise ratio (SNR) is defined as SNR = Ns N0 . It should be noted that RF precoder and combiner are implemented using analog phase shifters and therefore entries of F RF and W RF have constant modulus. Further, at the Rx, a ML detector is adopted to each pair of subchannels to recover the information symbol vector (see details in Section III).

III. PROBLEM FORMULATION NAD PEROFRMANCE ANALYSIS
Here we aim to design jointly the X-code matrix Φ, IRS phase shift matrix Θ, and hybrid analog-digital precoding/combining matrices to minimize the error probability. Such an joint optimization problem is difficult to solve, thus for simplicity, we first consider fully connected hybrid analog-digital beamforming and ignore the constraint introduced by analog phase shifters. Then, we conduct performance analysis by deriving an upper bound on WER and present the system design criterion.

A. DIGITAL PRECODING/COMBINING BEAMFORMING
Let F = F RF F BB ∈ C Nt×Ns and W = W RF W BB ∈ C Nr×Ns denote the digital precoder and combiner [34]. To obtain the optimal F and W, we apply a SVD on H ∈ N r × N t , yielding where U ∈ C Nr×Nr and V ∈ C Nt×Nt are unitary matrices, and Σ ∈ C Nr×Nt is diagonal matrix with the diagonal entries of singular values in decreasing order. We have submatrices U 1 ∈ C Nr×Ns , Σ 1 ∈ C Ns×Ns and V 1 ∈ C Nt×Ns . The submatrices U 2 , Σ 2 and V 2 are not considered here for computation. Assuming equal power allocation, a fixed Xcode matrix Φ and an IRS phase shift matrix Θ, the optimal solution is given by [34] Using (10), we can decouple analog F RF (W RF ) and digital F BB (W BB ) beamformers according to strategies in [34], [38]- [40], indicating the hybrid approach performs similarly to fully digital one with a small number of RF chains.

B. PERFORMANCE ANALYSIS AND SYSTEM DESIGN CRITERION
Substituting (10) into (8) yields where B = Σ 1 Φ is the effective channel matrix, and Φ is given in (6). Here,ñ = U H 1 n is the noise vector having the same statistic as n.
Let σ 1 > σ 2 > · · · > σ Ns denote the diagonal elements of Σ 1 (i.e., the singular values determined by the cascade channel matrix H). Following X-code pairing strategy in (4), each pair has an equivalent subchannel matrix, B k ∈ R 2×2 , for k = 1, 2, · · · , Ns 2 , given by . (12) where A k is in (5). Let the k-th pair be u k = [u k , u Ns−k+1 ] T . As discussed before, a ML detector is adopted at the Rx to detect real and imaginary components of u k as whereû k is ML detector output for the k-th pair.
LetP denote WER for real component for a given channel realization B k ,P whereP k represents WER for real component of k-th pair.
Since WERs for real and imaginary parts are same, the overall WER for a given channel B k is Consider the real component of k-th pair. For a given channel B k , let P ( (u k ) → (v k )) denote the pairwise error probability (PEP), i.e., the probability when (u k ) is transmitted and (v k ) is detected, and denotes pairwise difference vector of the real component of the k-th pair. We obtain the following upper bound.
Proposition 1. For a fixed channel realization B k , the WER for real component of k-th pair is upper bounded bŷ where |S k | denotes the cardinality of S k (a collection of signal sets for the k-th pair of real components) and |S k | = M for M-QAM signalling.
Proof. The first inequality follows the standard union bound on PEP for a given channel realization [41]. Second step follows direct computation of PEP with Gaussian noise.
and obtain the upper bound on WER for 4-QAM and a given B k , asP Remark 1. (System design criterion) To minimize overall WER P , we need to minimizeP k , ∀k, and the upper bound in (17) by jointly designing X-code matrix Φ and the IRS phase matrix Θ (or equivalently, IRS phase shift vector θ).

IV. IRS PHASE SHIFT MATRIX DESIGN
Based on the design criterion, we design IRS phase shift matrix Θ with fixed X-precoding matrix Φ and obtain the following Proposition.
Proposition 2. For a fixed B k and M-QAM signalling, the WER for the real component is upper bounded bŷ where κ is a very small positive real number.
Proof. See Appendix A.
Hence, our objective problem becomes Below we provide two different approaches to solve the problem in (20).

A. ALTERNATING OPTIMIZATION
Let r i ∈ C Nr be i-th column of R and g i ∈ C 1×Nt be i-th row of G, we express H as Similar to [18], we can obtain Proposition 3. With any given {θ i } M i=1,i =m , the optimal value of θ m is given by

VOLUME XX, 2022
Proof. The result can be easily obtained by observing (22).
Within this context, we can first generate a set of IRS phase shifts θ and then iteratively optimize θ m , for m = 1, 2, · · · M , one by one with the other angles fixed. The algorithm above is summarized in Algorithm 1. Obtain optimal value of θ m of IRS according to (23). 4: end for 5: Check convergence. If yes, stop; if not, go to step 2. 6: Obtain optimal IRS phase shift vector θ ← θ m .
Note in Algorithm 1, it can be seen that the value of objective function (20) is non-decreasing during each iteration, which guarantees the convergence of Algorithm 1.

B. GRADIENT ASCENT OPTIMIZATION
LetĜ = GG H and we obtain where Q is a real number which is given by It can be seen that (24) is a multi-variable function F (θ), which increases fastest when the initial data goes into the direction of positive gradient. The gradient of function F (θ) for each θ m , m = 1, 2, · · · , M , is given by After we obtain gradient, we can update the set of IRS phase shifts θ with a step size δ, where δ ∈ R is a small number. Accordingly, we summarize the above algorithm in Algorithm 2.
In Algorithm 2, the step size δ is chosen to make sure tr(HH H ) (n+1) ≥ tr(HH H ) (n) during each iteration, which means the objective function in (20) converges into at least a local maximum optimal value.

C. COMPUTATIONAL COMPLEXITY COMPARISON
With the same initial set of θ, the dominant computational complexity comes from IRS phase shift optimization during each iteration. The complexity of alternating optimization approach can be described as O(N r N t M I 1 ), where I 1 represents the total number of iterations until convergence. By contrast, the complexity of gradient ascent approach is given by O(M I 2 ), where I 2 represents total number of iterations required to reach convergence. With N r = 16 and N t = 36 antennas employed at the Tx and Rx respectively, the convergence behavior of two IRS optimization algorithms is shown in Fig 2. The terms "Alter" and "Grad" respectively represent alternating and gradient ascent optimization approaches. It can be seen that alternating algorithm converges quicker than gradient ascent approach, where the previous method attains the maximum value of tr(HH H ) within 5 iterations and the last one attains the maximum value within 25 iterations. Nevertheless, the gradient ascent approach has much lower computational complexity during each iteration and thus lower overall complexity, as shown in Fig. 3. With the same number of passive elements at IRS, the value of N r N t I 1 is much higher than I 2 , even if I 2 is larger than I 1 . Thus, alternating optimization algorithm has much higher complexity, especially for large-array Tx and Rx.

V. PRECODING DESIGN: X-CODE AND X-PRECODER
As discussed before, both X-code and X-precoder employ the same pairing strategy in (4), and the same precoding matrix structure in (6) (or equivalently, the submatrix structure in (5)). Note that the submatrix in (5) is parameterized by a single phase φ k , which needs to be designed in both cases to improve the error performance. The only difference between 6 VOLUME XX, 2022 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and Computational complexity Alter Grad FIGURE 3: Computational complexity comparison between two IRS optimization algorithms these two designs lies in i) X-code chooses φ k according to the average WER over all channel realizations; i) X-precoder chooses φ k according to the WER over each channel realization. Next, we introduce how to choose the best φ k , k = 1, 2, · · · , Ns 2 , for both schemes, given fixed IRS phase shifts, i.e., fixed optimized singular values σ 1 > σ 2 > · · · > σ Ns . We also provide diversity analysis for both schemes.

A. X-CODE DESIGN
As discussed above, we aim to find φ k , ∀k, which can minimize the average WER.

1) Finding φ k
Using distribution fitting tool in MATLAB, as shown in Fig. 4, it is found that optimized singular values σ i of cascade channel matrix H (i.e., the diagonal elements of Σ 1 ), can be approximated using the following probability density where σ i ∼ Γ(λ i , ϕ i ) follows gamma distribution, and λ i , ϕ i represent the shape and scale parameters defining this gamma distribution.
Recall (16), and we can obtain the following Proposition.

VOLUME XX, 2022
Proof. See Appendix B We let h(φ k ) denote the upper bound in (29), a function of φ k . To find φ k that minimizes h(φ k ) 2 , we need to solve which unfortunately does not have an explicit analytical solution. However, since X-precoding matrix is fixed over all channel realizations, the optimal φ k can be performed numerically off-line. Alternatively, we can adopt π 4 , ∀k, instead of the exact φ k , for simplicity. It approaches the error performance of exact optimal φ k as demonstrated in Section VII.

2) Diversity analysis for X-code
Here, we provide diversity analysis for the X-coded system. We have the following Proposition. . Further, the lower bound on the diversity of the system is determined by the k -th pair's one, i.e., max λ k 2 , , where k is defined as where K = {1, . . . , Ns 2 }. Proof. See Appendix C.
Later in Section VII, we will evaluate the proposed diversity analysis via simulations.

B. OPTIMAL DESIGN OF X-PRECODER
As mentioned before, in contrast to X-code, the phase φ k of X-precoder submatrix in (6) changes during every channel realization. Thus we need to design φ k , ∀k, that minimizes the upper bound on WER in (17). To simplify notation, we define f (φ k ), a function of φ k , as which is the upper bound on WER of real components in the k-th pair, as in (17). We also only consider φ k ∈ (0, π 4 ]. The first derivative of f (φ k ) is given by where a, b are given in (18) and d k , d Ns−k+1 are given in (16). We then show the selection of optimal φ k in following Proposition.
Proposition 6. The optimal φ k of X-precoding of k-th pair is given by where φ 0 is root of function f (φ k ) in (34) and φ 0 / ∈ {0, π 4 }. τ is threshold which is root of second derivative of function f (φ k = π 4 ), which is given by Proof. It can be seen that is monotonically decreasing and has minimum value when φ k = π 4 . In the case of . Also, we have f (φ k = π 4 ) < 0 under this case. Thus, we force (34) to zero to determine the value of φ 0 and force (36) to zero at φ k = π 4 to find the value of threshold τ , where both problems can be solved in MATLAB.
The problem of solving φ k in X-precoder is complex. Nonetheless, we still are able to derive an analytic solution for lower order modulation, discussed below. Proposition 7. The optimal φ k of X-precoding under 4-QAM can be expressed as Proof. See Appendix D.
Remark 2. (Diversity discussions) It can be easily shown that, the lower bounds on the diversity gain of X-code in Proposition 5 (for each pair and the entire system) can be also used for the X-precoder case, since the derivation uses the phase π 4 for all pairs which may be the suboptimal solution to some pairs in the X-precoder case (see (35), (37)).

VOLUME XX, 2022
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3187171

VI. HYBRID PRECODING/COMBINING DESIGN
After designing the IRS phase shift matrix Θ opt , we have the optimal H opt = RΘ opt G = U opt Σ opt V H opt . Then the optimal fully digital precoding/combining matrices can be obtained according to (10). In the following, we need to decouple fully digital precoder/combiner into hybrid analog and digital precoder/combiner. Such a problem can be formulated as This problem can also be applied to decoding side. Specifically, we can apply manifold optimization based algorithm in [39] to solve this decoupling problem (see details in [39]). In summary, the overall algorithm of jointly optimization of transceiver and IRS is summarized in Algorithm 3.

Algorithm 3 Proposed Algorithm for Jointly Design of Transceiver and IRS
Input R, G, N s , N RF r , N RF t , N 0 1: Obtain optimal passive beamforming matrix Θ opt at IRS based on Algorithm 1/2 in Section III. 2: Obtain optimal X-code or X-precoder φ k in Section IV. 3: Calculate optimal fully digital precoding/combining matrices, F opt and W opt . 4: Obtain hybrid analog and digital precoding matrices F RF , F BB , and hybrid combining matrices W RF , W BB , based on manifold optimization approach.

VII. SIMULATION RESULTS
In this section, we evaluate the WER performance for IRSaided mmWave massive MIMO system with (or without) Xcode (or X-precoder). We model propagation environment as N cl = 8 clusters and N ray = 10 rays per cluster. The azimuth and elevation angles of arrival and departure follow Laplacian distributed [34], [42]. The complex channel gains have distribution α i,l , β i,l ∈ CN (0, 1). The inner-element spacing is half-wavelength. In all simulations, we set N s = 4 and transmit power at Tx as P t = N s . There are N t = 36 and N r = 16 antennas employed at the Tx and the Rx, respectively. The number of passive element at IRS is set to M = 10 × 10. RF chains are set to N RF t = N RF r = 6. Totally 10 6 Monte-Carlo MATLAB simulation runs are adopted for average.
In Fig 5, we demonstrate average WERs versus transmit SNRs of IRS-aided scheme under the case of 4-QAM, where "digital" and "hybrid" respectively represent fully digital precoding/combining and hybrid analog and digital precoding/combining. The terms "withXP" and "withoutXP" respectively represent the system scheme with and without X-precoder, where X-precoding matrices change based on  (37). We compare two IRS design methods, alternating ("alter") and gradient ascent ("grad") optimization algorithms. We observe that utilization of X-precoder can achieve significantly better WER performance compared to IRS-aided scheme without X-precoder. For example, at WER of 10 −3 , the IRS scheme with X-precoder outperforms its counterpart without X-precoder by approximately 6.2dB. It also can be seen that the hybrid precoding/combining performs closely to fully digital precoding/combining, indicating that hybrid precoding/combining with small number of RF chains can asymptotically approach fully digital precoding/combining. In addition, the error performance of two IRS optimization approaches are similar. However, our proposed gradient ascent algorithm has lower overall computational complexity as shown in Fig. 3. In Fig 6, we illustrate WERs of the system using X-code and X-precoder, respectively. Using distribution fitting tool in MATLAB, the optimized singular values have following distributions: σ 1 ∼ Γ(14.81, 45.86), σ 2 ∼ Γ(7.28, 26.18), σ 3 ∼ Γ(10.71, 9.28) and σ 4 ∼ Γ(12.44, 5.02). Then the optimal X-code are obtained as in (31). We firstly compare WERs of the optimal X-code and the X-code with π 4 for all pairs. It shows that both X-code (the optimal one and the one with π 4 ) have similar performance, demonstrating the simple choice of π 4 is a practically good solution. Secondly, we compare WERs of these two X-code and X-precoder and find they perform similarly at practical SNRs (or upto WER of 10 −3 ), and then X-code get slightly worse than X-precoder at high SNRs (or beyond WER of 10 −4 ). This is owing to the fact that X-code's phase design is done for average WER and X-precoder's phase design is performed for the WER of each channel realization. In addition, we evaluate the simulated diversity versus the diversity lower bound demonstrated in Proposition 5 for the optimal X-coded system, where we have k = 2. We find that the system diversity is lower bounded by max λ2 2 , λ3 2 = 5.36, while the simulated diversity is about 5.37, demonstrating a close match between the diversity VOLUME XX, 2022 In Fig. 7, we observe two IRS optimization algorithms approach similar results, and hybrid precoding/combining performs closely to fully digital precoding/combining. These observations can be also found in Fig. 8 with 16-QAM signalling and X-precoder.
Finally, in Fig. 9, we demonstrate error performance of IRS-aided communication system with different value of IRS elements. Assuming UPA is employed at IRS, we observe that WERs decrease as M increases.

VIII. CONCLUSIONS
In this paper, we considered an IRS-aided mmWave massive MIMO with hybrid beamforming/combining, where the information symbols are precoded by X-code (or X-precoder). We first designed transceiver digital (or hybrid) active beamforming. Then we derived an upper bound on WER, based on which we jointly optimize IRS phase shifts and X-code (or X-precoder) to enhance WER performance. Specifically, we devised two IRS optimization algorithms based upon alternating and gradient ascent optimization approaches and compared their computational complexity and convergence. Further, we designed X-code and X-precoder to minimize the average WER and WER, respectively. We also provided their diversity analysis. Finally, by simulations, we observe that gradient ascent optimization has same performance as alternating optimization, but with a lower computational complexity. Also, we observe that adopting X-code and X-  precoder techniques can significantly improve error performance.

APPENDIX A PROOF OF PROPOSITION 2
For M-QAM signalling, using the improved exponential bounds in [43] Q then (17) becomeŝ To simplify notation, we let y = ||B k d k || 2
10 VOLUME XX, 2022 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.
Substituting (43) into (42) yieldŝ For M-QAM signalling, we have set E s = 1. Using (14), we haveP where (I) becomes equality when N s = rank(H) and the result in (19) can be easily observed.

APPENDIX C PROOF OF PROPOSITION 5
For the k-th pair, we set the near-optimal phase φ k = π 4 , 4-QAM signalling, and (19) becomeŝ At high SNRs, we can discard the first part in the summation due to its smaller value compared to the sum of the other two parts. We also have From [45, 13.2.18], we can obtain that U (a, b, z) = Γ(b − 1) Γ(a)