Dual-Polarized RSMA for Massive MIMO Systems

This letter proposes a novel dual-polarized rate-splitting multiple access (RSMA) technique for massive multiple-input multiple-output (MIMO) networks. The proposed strategy transmits common and private symbols in parallel through dynamic polarization multiplexing, and it does not require successive interference cancellation (SIC) in the reception. For assisting the design of dual-polarized MIMO-RSMA systems, we propose a deep neural network (DNN) framework for predicting the ergodic sum-rates. An efficient DNN-aided adaptive power allocation policy is also developed for maximizing the ergodic sum-rates. Simulation results validate the effectiveness of the DNNs for sum-rate prediction and power allocation and reveal that the dual-polarized MIMO-RSMA strategy can impressively outperform conventional baseline schemes.

R ATE-SPLITTING multiple access (RSMA) has recently appeared as a powerful downlink transmission technique for multiple-input multiple-output (MIMO) systems. At the base station (BS), RSMA encodes the data messages of different users into common and private symbols and transmits them through linear precoding. Upon reception, users rely on successive interference cancellation (SIC) to recover the original message. The features of RSMA enable attractive performance improvements, such as higher data rates and robustness to imperfect channel state information (CSI). When RSMA is combined with massive MIMO systems, with a large number of antennas at the BS, further improvements can be achieved, outperforming conventional techniques like time-division multiple access (TDMA), space-division multiple access (SDMA), and non-orthogonal multiple access (NOMA) [1], [2].
Despite the advantages of RSMA, there are still unsolved issues and room for improvement. In particular, SIC introduces interference in the decoding process of RSMA, which is detrimental to the system spectral efficiency. Moreover, SIC error propagation can happen in practice, which also deteriorates the system performance. The recent work in [3] has shown that dual-polarized antenna arrays can be harnessed to alleviate SIC issues and improve user multiplexing. Moreover, dual-polarized antenna arrays are widely employed in commercial cellular systems and have been adopted as the standard in the 3rd generation partnership project (3GPP) long-term evolution advanced (LTE-A) and 5G New Radio (NR) specifications [3]. These facts imply that the polarization domain is a practical resource that is abundantly available and offers promising opportunities for enhancing the performance of next-generation communication systems. Under such motivations, we propose an appealing dualpolarized RSMA strategy for dual-polarized massive MIMO systems, a concept not yet reported in the technical literature. With the goal of maximizing the system sum-rate, common and private symbols are multiplexed dynamically in the polarization domain. Our low-complexity strategy enables users to detect common and private symbols simultaneously from orthogonal polarizations without SIC, which reduces the overall interference experienced in the system. Due to the dynamic nature of the system model, classical performance analysis and optimization become unfeasible. Alternatively, we propose a deep neural network (DNN) framework for predicting the ergodic sum-rates of the proposed scheme. The DNN sum-rate prediction framework can be used as an efficient tool for assisting the design of dual-polarized MIMO-RSMA systems. To improve the ergodic sum-rate further, we also develop a DNN-aided adaptive power allocation framework, which smartly splits the transmit power between common and private symbols. Simulation results validate the effectiveness of the DNN frameworks and confirm that remarkable performance improvements are achievable with the dual-polarized MIMO-RSMA strategy.
Notation: The transpose and the Hermitian transpose of a matrix A are represented by A T and A H , respectively. I M is the M × M identity matrix, 0 M ,N is the M × N matrix with all zero entries, and ⊗ is the Kronecker product. Moreover, the cardinality of a set A is represented by |A|, • represents the function composition, and E[·] denotes expectation.

II. SYSTEM MODEL
We consider a downlink single-cell scenario in which one base station (BS) employing M/2 co-located pairs of dualpolarized antennas (with vertical (v) and horizontal (h) polarizations) communicates with multiple users equipped with a single pair of dual-polarized antennas. The BS clusters the users into G groups, with each group containing N g users. Users within a given group are assumed to share a common covariance matrix given by R g = I 2 ⊗Σ g = I 2 ⊗(Q g Δ g Q H g ), where Σ g ∈ C M 2 × M 2 denotes the covariance matrix of rank r g observed in each polarization, Δ g is a real-valuedr g ×r g diagonal matrix containingr g < r g nonzero eigenvalues of Σ g , and Q g is a matrix comprising its corresponding eigenvectors.
where g ij gn ∈ Cr g denotes the reduced-dimension fast-fading channel vector from polarization i to j, with i , j ∈ {v , h}, and χ ∈ [0, 1] is the inverse cross-polar discrimination which measures the ratio of cross-polar to co-polar signal powers.

A. CSI Estimation and Acquisition
Due to quantization errors and other issues, the acquisition of g ij gn at the BS is imperfect. As in [2], we model the corrupted estimate of g ij gn bŷ where z ij gn is a complex standard Gaussian random vector independent of g ij gn , and τ is a factor that informs the quality of the CSI estimation, such that τ = 0 corresponds to the perfect CSI case, and τ = 1 models the extreme scenario where the estimateĝ ij gn is statistically independent of g ij gn .
On the other hand, we assume that Σ g is perfectly known at the BS. In particular, the one-ring model [3] is adopted for generating Σ g in this letter.

B. Dual-Polarized Rate-Splitting Multiple Access
The proposed dual-polarized MIMO-RSMA strategy can be explained as follows. First, each user's message is split into a common and a private part. Then, the BS encodes the common parts into a single super-symbol, which we denote by x c g , and the private parts into private symbols, denoted by x p gn . The symbol x c g is intended for all users within the gth group, whereas x p gn should be decoded by the intended nth user only. In the original RSMA technique, x c g and x p gn are linearly precoded and superimposed in the power domain for transmission, which requires SIC in the reception. In contrast to conventional RSMA, our proposed technique transmits common and private symbols in parallel data streams via the polarization domain. More specifically, the BS transmits where P g ∈ C M ×M is the precoding matrix for cancelling inter-group interference, in whichM determines the dimension of the transformed channel. The parameter P denotes the total transmit power, ζ gn is the large-scale fading coefficient for the nth user in the gth group, and α c g and α p gn are the power allocation coefficients for the common and private symbols, with the constraint α c g + Ng n=1 α p gn = 1. In turn, w c g ∈ CM and w p gn ∈ CM are precoding vectors responsible for multiplexing the common and private symbols in polarizations i c g and i p g , respectively, such that i c g = i p g ∈ {v , h}, which are defined by The polarizations i c g and i p g are assigned dynamically at each coherence interval by the BS. To this end, based on the estimated CSI modeled by (2), the BS predicts the instantaneous rates of the common and private symbols experienced by the users, denoted byR c gn andR p gn , and determines the desired polarizations based on the following criteria After computing (6), the BS feeds back i c g and i p g to the users. In the reception, users within the gth group detects the common message from polarization i c g and the private messages from polarization i p g . A simplified diagram of the proposed scheme is presented in Fig. 1.

C. Precoding for Inter-Group Interference Cancellation
After the signal in (3) has passed through the channel in (1), the nth user in the gth group receives: where n i gn denotes the additive noise observed by the nth user in polarization i ∈ {v , h}, which follows the complex Gaussian distribution with zero mean and variance σ 2 .
From (7), it is clear that the inter-group interference can be cancelled if, ∀g = g , the following is satisfied (8) whereP g ∈ C M 2 ×M is the precoding matrix for each polarization, in whichM M /2. To this end,P g can be constructed by concatenatingM basis vectors of the null space of the matrix With this design, the signal in (7) can be simplified as As a result, the nth user decodes the common message with following signal-to-interference-plus-noise ratio (SINR) where ρ = P /σ 2 denotes the signal-to-noise ratio (SNR), and the first term in the denominator models the cross-polar interference from polarization i p g to polarization i c g . In turn, the SINR observed by the nth user in the gth group when decoding its private message can be represented by where the first term in the denominator corresponds to the cross-polar interference, and the term

D. Precoding for the Common and Private Symbols
The precoding vector w p,i p g gn ∈ CM should be designed to cancel the remaining inter-user interference observed in the assigned polarization i p g ∈ {v , h} within each group. Mathematically, we must have [(g , the private precoder for one user must be near-orthogonal (orthogonal with perfect CSI) to the effective channels of other users. By defininĝ (12) However, the problem in (12) is non-convex and NP-hard for general numbers of transmit antennas [4]. Fortunately, when M → ∞, the asymptotic optimal w c,i c g g is given by a linear combination of the effective channel vectors, as follows [5] w c,i c which consists of a weighted matched filter (MF) precoder for the channels of polarization i c g ∈ {v , h}, where μ gn is the weight for the nth user in the gth group. As in [5], we employ an equally-weighted MF precoder, i.e., μ g1 = · · · = μ gNg = μ g . By defining ω g = 1 gP H gPg ω g ), which satisfies the unity norm constraint in (12).

III. ERGODIC SUM-RATE ANALYSIS AND ADAPTIVE POWER ALLOCATION WITH DEEP NEURAL NETWORKS A. Ergodic Sum-Rate
The instantaneous data rate for the nth user in the gth group is given by the sum of its private and common rates, which are calculated as R p gn = log 2 (1 + γ p gn ) and R c gn = min ∀l∈{1,...,Ng } {log 2 (1+γ c gl )}, respectively. Thus, the ergodic sum-rate for the gth group can be obtained analytically through where f γ p gn (x ) is the probability density function (PDF) of γ p gn , and f γ c g (1) (y) denotes the PDF of the first order statistic of γ c gn , i.e., the PDF of min ∀l {γ c gl }. However, due to the correlated gains in the SINRs in (10) and (11), obtaining the exact expressions of f γ p gn (x ) and f γ c g (1) (y) becomes a convoluted task. This complication makes the derivation of (14) intractable. Alternatively, we exploit the powerful capabilities of DNNs to approximate the desired sum-rate.
Given that the input parameters forR g form a compact subset, denoted by X g , and thatR g is a real-valued continuous function, the universal approximation theorem [6,Th. 2.2] ensures that a DNN with at least one hidden layer can approximateR g to any degree of accuracy, i.e., sup xg ∈Xg for every > 0, whereR g is the function that models the DNN, and x g ∈ X g ⊆ R b Xg represents the feature vector with b Xg input parameters of the sum-rate function. This theorem provides theoretical support for the adoption of DNNs as predictors for the intricate multivariate expression in (14).

B. DNN for Ergodic Sum-Rate Prediction
We consider a DNN model with L dense layers, in which there are one input layer, one output layer, and L − 2 hidden layers, where the lth layer has Q l neurons. For reducing the training complexity, we address the ergodic sum-rate for each spatial group separately. More specifically, the training dataset for users within the gth group is represented by whereR g,i is the target output, i.e., the actual ergodic sum-rate, of the ith training sample in D g , and x g,i is the ith input sample vector containing b Xg = 2N g + 7 system parameters, which are structured as , ρ] T . Note that the entries of x g,i are within different ranges and thatR g,i can assume values from a broad interval, which can lead to an unstable and slow training convergence. To avoid this limitation, the training samples are scaled to the unity range. Under such considerations, ergodic sum-rate prediction function for the gth group can be expressed bŷ where r l (·) maps the transformation applied to the input data in the lth layer, which is defined by in which x is the input for the lth layer, W l ∈ R Q l ×Q l−1 is the weight matrix connecting the lth and (l − 1)th layers, and b l ∈ R Q l and π l : R Q l → R Q l represent, respectively, the bias vector and activation function for the lth layer. In the hidden layers, we use as the activation function the rectified linear unity (ReLU), i.e., π l (x) = max(0, x), ∀l ∈ {1, . . . , L − 2}, and in the output layer, a linear activation function is adopted.
For training, the data samples in D g are randomly selected and partitioned into J batches. As a result, the mean-squared error (MSE) loss function for the jth batch, ∀j ∈ {1, . . . , J }, to be minimized, can be written as where D g,j = {(x g,s ,R g,s )|s = 1, . . . , S } ⊆ D g represents the subset corresponding to the jth data batch, in which S denotes the carnality of D g,j , i.e., the batch size.

C. DNN-Aided Adaptive Power Allocation
Following the work in [5], the power allocation adopted for the private symbols is computed by α p gu = (1−α c g )/N g , which consists of a uniform allocation policy given as a function of the power coefficient for the common symbol. Therefore, the challenge with this strategy remains in determining the coefficient α c g . In particular, our goal is to maximize the ergodic sum-rate, which can be formulated as (19) However, due to the dynamic polarization multiplexing, the coupled SNRs in (10) and (11), and the coupled coefficients α c g and α p gu , a closed-form optimal solution for (19) cannot be obtained. Determining the desired coefficient through an exhaustive search is also a possibility. However, brute-force strategies can be computationally expensive, which is not ideal for real-time communication. On the other hand, DNNs offer a short run-time after trained. With this motivation, we propose a DNN framework for approximating the optimal power coefficient. To this end, for each spatial group, we train a DNN model with one input layer, one output layer, and D − 2 hidden layers, with the dth layer having V d neurons.
Specifically, for training the DNN for power allocation, we use the dataset M g = {(z g,i , α c * g,i )|z g,i ∈ Z g , α c * g,i ∈ R, i = 1, . . . , |M g |}, where α c * g,i is the target power coefficient that maximizes the ergodic sum-rate for the ith input vector z g,i ∈ Z g ⊆ R b Zg , which is defined by z g,i = [M ,M , N g , χ, τ, [ζ g1 , . . . , ζ gNg ], ρ] T . Both z g,i and α c * g,i are achieved by exploiting the existing datasets D g . That is, for each sample z g,i , we select the power coefficient that maximizes the corresponding sum-rate in D g and create the new dataset M g . As in Section III-B, the vectors z g,i are scaled to the unity range. As a result, the function that predicts the optimal power allocation coefficient can be written aŝ where r d (·) is defined as in (17), in which ReLU activation functions are employed in the hidden layers and a linear function in the output layer. We also adopt the MSE loss function in this model. Moreover, for satisfying the constraint in (19), the power coefficient is computed byα c * g = min{1,Λ g (z g,i )}.

D. Complexity Remarks
Note that we implement one DNN model for each spatial group. The main implication of this choice is that the covariance matrices, which have large dimensions, are not required for designing and training the DNNs. Consequently, we can considerably simplify the model architecture and decrease the training complexity. In practice, DNNs can be trained very efficiently in specialized hardware. Therefore, the complexity of the testing phase is more relevant for the practical operation of the proposed scheme. Specifically, the computational complexity of one forward pass can be expressed in terms of floating-point operations [7]. Under this analysis, the DNN for sum-rate prediction has a complexity of O( L l=1 Q l−1 Q l ). For the DNN-aided power allocation strategy, on the other hand, it is also important to mention the complexity associated with the generation of the dataset with optimal power coefficients. More specifically, we need to perform an exhaustive search on the dataset D g to construct M g , which imposes additional complexity. Nevertheless, this search needs to be executed only once before the training, thus, it is a computationally affordable task. After the dataset M g is properly generated and the DNN is trained, the desired power coefficient is computed with a complexity of

E. Datasets Generation and DNN Implementation
Due to the unknown PDFs of γ p gn and min ∀l {γ c gl }, we cannot generate the datasets D g and M g , g ∈ {1, . . . , G}, with the expression in (14). Due to this reason, instead, we used Monte Carlo simulations for obtaining the required data samples, in which the high-performance Julia Programming Language [8] has been used for implementing the proposed MIMO-RSMA network. For generating the training data, we adjusted the number of groups to G = 3, and the number of users within each group to N 1 = · · · = N g = 3. Consequently, the resulting number of features in the ith input vectors x g,i and z g,i were b Xg = 13 and b Zg = 9, respectively. Then, we have extensively varied the system parameters and generated for each group a total of 6,561,000 samples for D g , and 72.900 samples for M g , where each sample was generated by averaging 2 × 10 3 random channel realizations. Moreover, 90% of the samples were used for training and 10% for testing.
The DNN models were implemented and trained in Python 3.9.11 using Tensor Flow Metal 2.8.0. The DNN for sumrate prediction was implemented with five hidden layers, with the first and last hidden layers comprising 128 neurons and the remaining layers comprising 256 neurons each. In turn, the DNN for power allocation was implemented with four hidden layers, with the first and last hidden layers also containing 128 neurons and the remaining layers containing 256 neurons. For training the DNNs, we adopted the adaptive moment estimation (ADAM) optimizer. Moreover, the batch sizes for sum-rate prediction and power allocation were adjusted to 1000 and 100 samples, respectively, and both DNNs were trained for 80 epochs. Fig. 2 presents the training convergence in terms of root mean squared error (RMSE) for the two DNNs. As can be seen, the learning rate of 0.001 achieves the lowest RMSE. Thus, this value is adopted in the next section.

IV. SIMULATION RESULTS
The DNNs for sum-rate prediction and power allocation are evaluated in this section. The performance superiority of the proposed dual-polarized MIMO-RSMA scheme is also demonstrated over conventional baseline systems, including the single-polarized MIMO-RSMA, MIMO-TDMA, MIMO-SDMA, MIMO-NOMA, and the dual-polarized MIMO-NOMA approach proposed in [9]. In all systems, we configure the BS with M = 64 transmit antennas, and we consider that users are distributed within G = 3 spatial groups. Without loss of generality, we present results for the first group, which contains N = 3 users, is located at the azimuth angle of 20 • , and has an angular spread of 11 • . Moreover, the distances from the BS to users 1, 2, and 3 are set to d 1 = 115 m, d 2 = 100 m, and d 3 = 85 m, respectively. Under this setting, the large-scale fading coefficient for each user is modeled by ζ n = δd −η n , where δ is an array gain parameter adjusted to 40 dB, and η is the path-loss exponent set to 2.7. Furthermore, we setM = 6 and adjust the total transmit power to P = 1 W. Unless otherwise stated, when fixed power allocation is employed, we set α c = 0.5 and α p n = (1 − α c )/N ≈ 0.17 for the MIMO-RSMA schemes, whereas, for the MIMO-NOMA counterpart, we set the coefficients of users 1, 2, and 3 to 5/8, 2/8, and 1/8, respectively. In turn, a uniform power allocation is employed in the MIMO-SDMA systems, and in the MIMO-TDMA, the full transmit power, P, is used at each time slot. Fig. 3 validates the DNN framework for ergodic sum-rate prediction under fixed power allocation. As can be seen, the predicted curves can follow the simulated ones with high accuracy in all considered cases. This figure also provides the first insights into the performance behavior of the proposed dualpolarized MIMO-RSMA scheme. Fig. 3(a), for instance, shows that the power coefficient for the common message plays an important role in the ergodic sum-rate performance and that the optimal power coefficient changes with the observed SNR. Fig. 3(b) reveals how imperfect CSI impacts the sum-rate of the proposed strategy. As can be seen, even though the system performance deteriorates with the increase of τ , a remarkable sum-rate of more than 30 bits per channel use (bpcu) can be achieved even when τ = 0.4, which confirms robustness to imperfect CSI. On the other hand, as can be seen in Fig. 3(c), the dual-polarized MIMO-RSMA is more severely impacted by polarization interference (with fixed power allocation).
The ergodic sum-rates achieved with the dual-polarized MIMO-RSMA scheme and with the conventional systems are compared in Fig. 4. As we can see in Fig. 4(a), when χ = 0, the dual-polarized MIMO-RSMA systems always achieve the best performance. However, with χ = 0.2, the dual-polarized MIMO-RSMA scheme with fixed power allocation becomes less spectrally efficient than the singlepolarized MIMO-RSMA and MIMO-SDMA counterparts. In contrast, by smartly splitting the transmit power between private and common streams, the proposed dual-polarized approach with the DNN-aided power allocation can impressively outperform all conventional baseline schemes despite the high interference. The effectiveness of our proposal is further corroborated in Fig. 4(b), where we plot the ergodic sum-rates versus the CSI quality factor τ . As can be seen, the dual-polarized MIMO-RSMA scheme with the DNN power allocation achieves the highest sum-rates for all values of τ and χ. The reason for such robustness is that the DNN mitigates the effects of both imperfect CSI and cross-polar interference by smartly adjusting α c g . For instance, the DNN assigns power only to one polarization if χ becomes excessively high for tackling cross-polar interference or allocates more power to the common stream when the CSI becomes degraded. Last, Fig. 4(c) compares the sum-rate performance of the dualpolarized MIMO-RSMA and of SIC-based schemes under the effects of SIC error propagation. The sum-rates of the schemes that rely on SIC are strongly degraded when the SIC error factor increases. On the other hand, the robust dual-polarized MIMO-RSMA is unaffected by SIC issues.

V. CONCLUSION
We have proposed a novel low-complexity dual-polarized massive MIMO-RSMA scheme, which is free from the interference issues of SIC and robust to imperfect CSI. We have also developed DNN frameworks for ergodic sum-rate prediction and efficient power allocation, which ensured high performance even under strong cross-polar interference.