A Versatile Low-Complexity Feedback Scheme for FDD Systems via Generative Modeling

We propose a versatile feedback scheme for both single- and multi-user multiple-input multiple-output (MIMO) frequency division duplex (FDD) systems. Particularly, we propose utilizing a Gaussian mixture model (GMM) with a reduced number of parameters for codebook construction, feedback encoding, and precoder design. The GMM is fitted offline at the base station (BS) to uplink training samples to approximate the channel distribution of all possible mobile terminals (MTs) within the BS cell. Subsequently, a codebook is constructed, with each element based on one GMM component. Extracting directional information from the codebook or exploiting the GMM's sample generation ability facilitates joint precoder design for a multi-user MIMO system using state-of-the-art precoding algorithms. After offloading the GMM to the MTs, they can easily determine their feedback by selecting the index of the GMM component with the highest responsibility for their received pilot signal. This strategy exhibits low complexity and supports parallelization. Simulations demonstrate that the proposed approach outperforms conventional methods, which either estimate the channel and utilize a Lloyd codebook or use a deep neural network to determine the feedback in terms of spectral efficiency or sum-rate. The performance gains can be exploited to deploy systems with fewer pilots or feedback bits.


I. INTRODUCTION
In multiple-input multiple-output (MIMO) communication systems, channel state information (CSI) has to be acquired at the base station (BS) in regular time intervals.In frequency division duplex (FDD) mode, the BS and the mobile terminal (MT) transmit in the same time slot but at different frequencies.This breaks the reciprocity between the instantaneous uplink (UL) CSI and downlink (DL) CSI.Accordingly, acquiring DL CSI in FDD operation is difficult [2].The most common solution is to avoid direct feedback of the CSI by using only a small number of feedback bits, i.e., limited feedback systems are considered [3]- [5].
In conventional approaches, firstly the DL CSI is estimated at the MT and subsequently the feedback information is determined.For instance, the feedback can be used as an index for a predefined codebook of precoders or can represent quantized information (channel directions) about the DL CSI [3]- [5].Thus, conventional methods heavily rely on accurate CSI estimation.To obtain accurate DL CSI at the MTs, many pilots typically need to be sent from the BS to the MTs.However, in massive MIMO FDD systems with typically many antenna elements at the BS, the pilot overhead to fully illuminate the channel is unaffordable [6].Therefore, algorithms that yield relatively good system performances with a low pilot overhead, i.e., in cases where the number of pilots is less than the number of transmit antennas, are of great interest.It is desired to potentially circumvent explicit DL CSI estimation at the MT but instead directly infer the feedback information from pilot observations.
In this context, in recent work [7]- [13], a variety of endto-end deep neural network (DNN) techniques have been proposed, which process the pilot observations through neural network modules to a feedback information.In particular, in [7], a DNN is employed in order to determine the feedback information for a single stream transmission in a single-user MIMO system, that outperforms conventional approaches.The work in [8] also uses a DNN for feedback encoding but supports transmissions over multiple streams with a single DNN for all signal-to-noise ratio (SNR) values.A similar approach was used in [9] for the multi-user case with singleantenna MTs.In [10], a variational autoencoder was used to provide the BS statistical information of each MT in combination with a stochastic iterative precoding algorithm to jointly design the precoders, again for the multi-user case with single-antenna MTs.More recently, it was investigated in [11] how existing feedback schemes can be leveraged in an end-to-end limited feedback approach for the multi-user singleantenna case.An extension to the multi-user MIMO (MU-MIMO) case, i.e., MTs with multiple antennas, was proposed in [12].Therein, the authors considered the joint design of the feedback and the precoders but compared their approach exclusively to non-iterative precoding techniques.Another endto-end DNN-based approach was proposed in [13], where submodules of the DNN were inspired by state-of-the-art iterative precoding algorithms.
Although the aforementioned end-to-end DNN approaches in [7]- [13] are optimized for a particular setting, they face some challenges which may potentially hinder their application in practical systems.Specifically, the DNN approaches are inflexible and allow support only for either the single-or multiuser mode, i.e., different task-dependent DNNs are required for each mode.This includes the adaptation to different SNR values, number of pilots, and number of users which usually needs additional and specifically trained networks.Moreover, the number of DNN parameters in [7]- [13] scales quadratically with the product of transmit and receive antennas, leading to difficulties in the training time and the convergence abilities.In fact, in [12], [13], only relatively small antenna configurations are considered which are not in accordance with trends towards massive MIMO.Even more disadvantageously, the offload amount in order to communicate the DNN parameters from the BS to the MTs is drastically increasing in the number of antennas, the supported SNR range, transmission modes, and for a varying number of pilots, resulting in unaffordable signaling overhead.
Machine learning (ML) techniques typically require a representative dataset of channels stemming from the BS cell for their training phase.In FDD mode, the MTs would have to collect large amounts of DL CSI and either need to perform the training themselves or to share the collected data with the BS.The corresponding computation and signaling overhead is generally unaffordable in practice.Recently, in [14], it has been shown that DL CSI training data can be replaced with UL CSI training data even for the design of DL functionalities.This completely eliminates the aforementioned overhead.The UL CSI can be acquired at the BS during the regular UL transmission.The observation in [14] was confirmed for various DL functionalities, e.g., in [8], [15]- [17].Consequently, in this work, we also utilize the idea of centrally learning DL-related functionalities at the BS using UL training data.
Gaussian mixture models (GMMs) are widely adopted in the wireless communications literature.For example, in [18], [19], and [20], GMMs are used for predicting channel states, multipath clustering, and pilot optimization, respectively.In [21], a GMM is used to approximate the true but unknown channel probability density function (PDF) and a powerful channel estimator is derived.The strong performance is justified by the universal approximation ability of GMMs, cf.[22].The primary motivations for leveraging GMMs in this work to propose a versatile low-complexity feedback scheme for pointto-point MIMO and MU-MIMO FDD systems, apart from their universal approximation property, are the following.On the one hand, GMMs comprise a discrete latent space, which enables the clustering of channels and makes the inference of the latent variable, given an observation, tractable.These properties are exploited to design a novel codebook and propose a limited feedback scheme.On the other hand, GMMs are generative models.Generative models refer to techniques that aim to learn the underlying distribution of a training data set with the goal to enable the generation of new samples that resemble the original distribution.We propose to utilize this sample generation ability in combination with the feedback information to enhance the precoder design via a stochastic algorithm.In recent years, other generative concepts such as generative adversarial networks [23] and variational autoencoders [24] also gained a lot of attention.In the context of wireless communications, these generative models were utilized for, e.g., channel estimation [25], precoding [10], and as a channel modeling framework [26], [27].
The contributions of this work are summarized as follows: 1) The GMM can be centrally fitted at the BS using solely UL training data.We propose to cluster the training data according to the GMM components and design a codebook entry per GMM component, which yields a scenario-specific codebook and supports the single-user mode.By offloading the GMM to the MTs upon entering the coverage area of the BS, we propose to use the index, which represents the GMM component that yields the highest responsibility of the observed pilot signal of a MT as feedback information.Thereby, the responsibility evaluates the probability that the channel to a particular MT stems from the corresponding GMM component.2) We further propose two approaches to support the multiuser mode.By extracting directional information as relevant features of the constructed codebook, jointly designing precoders for a MU-MIMO system with an arbitrary precoding algorithm, cf.[28]- [34], is possible.Alternatively, we propose to leverage the GMM's sample generation ability in order to provide statistical information of each MT to the BS and design the precoders using a state-of-the-art stochastic iterative precoding algorithm, e.g., [35], [36].Thus, the proposed scheme allows to influence the complexity of the precoder design at the BS depending on the selected transmission mode and the precoding algorithm.
3) The complexity of determining the feedback at the MT side by using the GMM does not scale with the number of transmit antennas, in contrast to conventional approaches, and even allows for parallelization.Due to the analytic representation of the GMM, the feedback scheme can be straightforwardly adapted to any SNR, pilot configuration, and number of users without retraining, which is in contrast to the end-to-end DNN-based approaches [12], [13].Moreover, model-based insights can be leveraged to drastically decrease the training time and the offloading overhead, and allow to conveniently scale with larger antenna dimensions.Thus, the GMM-based feedback scheme is particularly beneficial for massive MIMO systems.4) Despite exhibiting a lower complexity, the proposed GMM-based feedback scheme provides high robustness against CSI imperfections and outperforms conventional single-and multi-user precoding approaches, especially in settings with a low pilot overhead.With extensive simulations, we show that the performance gains achieved with our proposed scheme can be leveraged to deploy system setups with, e.g., a reduced number of pilots or with a smaller number of feedback bits, as compared to classical approaches.
The paper is structured as follows.The system models are introduced in Section II.In Section III and Section IV, we discuss conventional methods and present the proposed approaches.In Section V, we discuss the versatility of the proposed scheme.In Section VI, channel estimators are discussed and in Section VII a complexity analysis is provided.Simulation results are provided in Section VIII, and in Section IX conclusions are drawn.
Notation: Matrices and vectors are denoted with bold uppercase and bold lowercase letters, respectively.The transpose or conjugate transpose of a matrix A is denoted by A T or A H , respectively.The all-zeros vector and the identity matrix with appropriate dimensions are denoted by 0 or I, respectively.The Euclidean norm of a vector a ∈ C N is denoted by a .The cardinality of a set V is denoted by |V|.A complex-valued normal distribution with mean vector µ and covariance matrix C is denoted by N C (µ, C) and ∼ stands for "distributed as".The determinant or the trace of matrix A is given by det(A) and tr(A), respectively.The vectorization (stacking columns) of a matrix A ∈ C m×N is written as a = vec(A) ∈ C mN , and the reverse operation is denoted by A = unvec(a).

A. Data Transmission -Point-to-Point MIMO System
The DL received signal of a point-to-point MIMO system can be expressed as y ′ = Hx + n ′ , where y ′ ∈ C Nrx is the receive vector, x ∈ C Ntx is the transmit vector sent over the MIMO channel H ∈ C Nrx×Ntx , and n ′ ∼ N C (0, σ 2 n I Nrx ) denotes the additive white Gaussian noise (AWGN).In this paper, we consider configurations with N rx < N tx .The BS is equipped with a uniform rectangular array (URA) and the MT is equipped with a uniform linear array (ULA).If perfect CSI is known to both, the transmitter and receiver, and assuming transmit data with zero-mean Gaussian distribution, the capacity of the MIMO channel is HQH H , e.g., [37, page 326], where Q ∈ C Ntx×Ntx is the transmit covariance matrix, ρ is the transmit power, and the transmit vector is given by x = Q 1/2 s with E[ss H ] = I Ntx [3].The optimal transmit covariance matrix Q ⋆ achieves the capacity and can be obtained by decomposing the channel into N rx parallel streams and employing water-filling [38].Since channel reciprocity does not hold in FDD systems, only the MT could compute the optimal transmit covariance matrix Q ⋆ if the DL CSI is estimated perfectly.This makes some form of feedback from the MT to the BS necessary.Ideally, the MT would feed the complete DL CSI back to the BS, which is infeasible in general.Instead, a small number of B bits is fed back to the BS.The B feedback bits are commonly used for encoding an index Finally, the BS employs the transmit covariance matrix Q k ⋆ for data transmission [3].

B. Data Transmission -Multi-user MIMO System
We consider a single-cell MU-MIMO system in the DL, where linear precoding is adopted.The system consists of a BS equipped with N tx transmit antennas and J MTs.Each MT j ∈ J = {1, 2, . . ., J} is equipped with N rx antennas.Let the transmit signal vector corresponding to MT j be s j ∈ C dj , where d j is the number of data streams.We assume that E[s j ] = 0 and E[s j s H j ] = I dj .Furthermore, the symbols sent to each MT are assumed to be independent of each other.The overall precoded DL data vector is x = J j=1 M j s j where M j ∈ C Ntx×dj is the precoding matrix applied at the BS to process the transmit signal of MT j (without loss of generality d j = N rx , ∀j is set in the following, if not mentioned otherwise).The precoders satisfy the transmit power constraint tr( J j=1 M j M H j ) = ρ.Thus, the received signal at MT j is where H j ∈ C Nrx×Ntx is the MIMO channel from the BS to MT j and n ′ j ∈ C Nrx ∼ N C (0, σ 2 j I Nrx ) denotes the AWGN of MT j.The instantaneous achievable rate of MT j can be written as If the BS had access to the perfect DL CSI of each of the MTs, it could employ common non-iterative algorithms such as block diagonalization (BD) [28], regularized block diagonalization (RBD) [29], or regularized channel inversion (RCI) [30], [31], or iterative algorithms such as the iterative weighted minimum mean square error (WMMSE) algorithm [32]- [34], in order to jointly design the precoders M j of all MTs, j ∈ J .However, for the considered limited feedback, each MT j is assumed to encode an index k ⋆ j with B bits, representing quantized information regarding the DL CSI, and feeds this information back to the BS.
In the seminal work [5], such a limited feedback system was investigated, where quantized information regarding the CSI of each MT is fed back to the BS after determining the best entry of a randomly generated MT-specific codebook.The random quantization codebook of each MT is assumed to be perfectly known to the BS [5].Then, BD was employed at the BS in order to jointly design the precoders.Multi-user systems with J = Ntx Nrx ≥ 2 were considered, i.e., the total number of receive antennas JN rx equals the number of transmit antennas N tx , in order to omit having to select a subset of MTs for transmission [5].We will also consider such setups in our simulations.

C. Pilot Transmission Phase
In the pilot transmission phase, the DL received signal of each MT j ∈ J is where n p is the number of transmitted pilots and The pilot matrix P ∈ C Ntx×np is a 2D-DFT (sub)matrix, constructed by the Kronecker product of two discrete Fourier transform (DFT) matrices, P = P h ⊗ P v , where each column p p of P , for p ∈ {1, 2, . . ., n p }, is normalized such that p p 2 = ρ to fulfill the power constraint, since we employ a URA at the BS, see e.g., [39].In this work, we consider n p ≤ N tx , i.e., the number of pilots is less than or equal to the number of transmit antennas.For what follows, it is convenient to vectorize (3), yielding y j = Ah j + n j , with the definitions h j = vec(H j ), y j = vec(Y j ), n j = vec(N j ), A = P T ⊗ I Nrx and n j ∼ N C (0, Σ) with Σ = σ 2 n I Nrxnp .In case of a point-to-point MIMO system, we drop the index j for notational convenience and end up with

D. Channel Model and Data Generation
The QuaDRiGa channel simulator [40], [41] is used to generate CSI for the UL and DL domains in an urban macrocell (UMa) scenario.The carrier frequencies are 2.53 GHz for the UL and 2.73 GHz for the DL, such that there is a frequency gap of 200 MHz.The BS uses a URA with "3GPP-3D" antennas and the MTs use ULAs with "omni-directional" antennas.The BS covers a 120 • sector and is placed at 25 m height.The minimum and maximum distances between MTs and the BS are 35 m and 500 m, respectively.In 80 % of the cases, the MTs are located indoors at different floor levels.The outdoor MTs have a height of 1.5 m.A QuaDRiGa MIMO channel is given by H = L ℓ=1 G ℓ e −2πjfcτ ℓ with ℓ being the path number, L the number of multi-path components (MPCs), f c the carrier frequency, and τ ℓ the ℓth path delay.The number L depends on whether there is line of sight (LOS), non-line of sight (NLOS), or outdoor-to-indoor (O2I) propagation: L LOS = 37, L NLOS = 61, or L O2I = 37 [41].The coefficients matrix G ℓ consists of one complex entry for each antenna pair and comprises the attenuation of a path, the antenna radiation pattern weighting, and the polarization.The generated channels are post-processed to remove the path gain [41].In the following, we denote by the training dataset consisting of M channels from the scenario described above.

A. Conventional Codebook Construction and Encoding Scheme
In an offline training phase, one can construct a codebook Q with K = 2 B elements.A standard codebook construction approach uses Lloyd's algorithm [4], [42].Given a training dataset of channels H, see ( 5)), the iterative Lloyd clustering algorithm alternates between two stages until a convergence criterion is met.Note that we use the channel matrix H and its vectorized expression h interchangeably in the following for ease of notation.We write k=1 for the codebook in iteration i.The two stages in iteration i are: (6) 2) Update the codebook: where for a channel matrix H and a covariance matrix Q, is the spectral efficiency.The optimization problem in stage 2) is solved via projected gradient ascent (PGA), cf.[8], [43].
To initialize the algorithm, stage 1) is replaced with a random partition of H in the first iteration.Lau's heuristic [4]: In order to avoid solving the costly optimization problem in stage 2) in every iteration, a heuristic for the codebook update is given in [4]: A representative matrix k , and then the matrices k are decomposed into N rx parallel streams and water-filling is employed, yielding the updated codebook entries . However, as attested by the simulation results in [1], the heuristic approach leads to a performance loss as compared to the PGA approach.Thus, we will restrict our analysis to the PGA approach in this work.
In the online phase, following the pilot transmission phase, the MT is assumed to estimate the DL channel Ĥ and then uses it to determine the best codebook entry Q k ⋆ of the commonly shared codebook Q via: The feedback consists of the index k ⋆ encoded by B bits and the BS employs the transmit covariance matrix Q k ⋆ for data transmission.

B. Proposed Codebook Construction and Encoding Scheme
The channel characteristics of the whole propagation environment of a BS cell can be described by means of a PDF f h .This PDF f h describes the stochastic nature of all channels in the whole coverage area of the BS.The channel of any MT within the BS cell is a realization of a random variable with PDF f h .The main problem is that this PDF is typically not available analytically.In this setting, ML approaches play an increasingly important role.They aim to implicitly learn the underlying PDF from representative data samples stemming from the BS cell, cf.[8], [11].In contrast, motivated by universal approximation properties of GMMs [22], we fit a GMM f (K) h with K components in order to approximate the unknown channel PDF f h , similar as in [21], [44] in an analytic form.In this work, the training data set stems from a stochastic-geometric channel simulator [40], similarly as in [11], [21].Alternatively, training data can be acquired, for instance, from a measurement campaign [44] or by using a ray-tracing software [16].In [45], it was shown that a GMM can be even learned from imperfect data.The analysis with these different training data sources and data imperfections is out of the scope of this work.
A GMM is a PDF of the form [46, Subsection 2.3.9] where every summand is one of its K components.Maximum likelihood estimates of the parameters of a GMM, viz., the means µ k , the covariances C k , and the mixing coefficients p(k), can be computed using a training dataset H, see (5), and an expectation maximization (EM) algorithm, see [46, Subsection 9.2.2], which can be summarized with the following four steps: i) Initialize the parameters of the GMM and calculate the initial value of the log-likelihood; ii) E-step: Determine the responsibilities, which evaluate the probability that a given data point belongs to (or is explained by) a particular component of the GMM; iii) M-step: Update the parameters using the current responsibilities; iv) Evaluate the log-likelihood with the updated parameters and repeat the Estep and M-step until the convergence of the log-likelihood.After a GMM is fitted, we can determine the likelihood that a particular channel h stems from one of the components by evaluating the responsibilities [46, Section 9.2]: Due to the joint Gaussianity of each GMM component and the AWGN, the approximate PDF of the observations is straightforwardly computed using the GMM from (10) as which is also a GMM (GMM of the observation).Since GMMs allow to calculate the responsibilities by evaluating Gaussian likelihoods, we can compute: The idea of the proposed method is to compute a codebook transmit covariance matrix Q k for every component of the GMM and to use the responsibilities p(k | y) to determine the feedback index.In detail, in an offline training phase, we take K = 2 B as the number of GMM components, use a training dataset of channels H to fit a K-components GMM f (K) h , and compute a codebook Q = {Q k } K k=1 of transmit covariance matrices-one matrix for every GMM component-exclusively at the BS.We explain the codebook construction in another paragraph below.During the online phase, when the objective is to determine a feedback index, we bypass explicit channel estimation and directly determine a feedback index using the responsibilities computed via y: i.e., the highest responsibility of the observed pilot signal of a MT serves as the feedback information.Thus, we find the feedback index k ⋆ without requiring (estimated) CSI.Note, we thereby also avoid the evaluation of the log 2 det in (9).Furthermore, the knowledge of the codebook at the MT is not required.The MT only requires the parameters of the GMM to compute (14).We can think of p(k | y) as an approximation of p(k | h) from ( 11) because of the fixed noise covariance of every component.That is, since there is a true underlying channel h leading to the current observation y = Ah + n, the responsibility p(k | y) can be seen as an approximation of the probability p(k | h) that the channel h stems from the kth GMM component.To investigate the influence of using p(k | y) instead of p(k | h), it is interesting to look at the performance of the feedback information calculated as Note that this approach is infeasible in practice because the channel h would have to be known in the online phase.Nevertheless, it serves as a baseline for the performance analysis.
Proposed codebook construction: Once the training dataset H has been used to fit a K-components GMM, we cluster the training data according to their GMM responsibilities, i.e., channels exhibiting high similarities measured in terms of the responsibilities are assigned to the same component and thereby form a cluster.That is, we partition H into K disjoint sets denoted by for k = 1, . . ., K. We now determine the codebook Q = {Q k } K k=1 by computing every transmit covariance matrix Q k such that it maximizes the summed rate in V k : subject to tr(Q) ≤ ρ where r(H, Q) is the spectral efficiency defined in (8).This optimization problem is solved via a PGA algorithm similar as in Section III-A, cf.[8], [43].Analogously, we can replace the optimization problem in ( 17) with Lau's heuristic from [4] (cf.III-A) to compute a transmit covariance matrix for every GMM component, which degrades the performance [1].Thus, we again restrict our analysis to the PGA approach.In summary, the GMM is used twice: Once for codebook construction (done offline) and afterwards for the determination of a feedback index (done online).For the latter, it is not necessary to estimate the channel and evaluating (9) is avoided.This is particularly beneficial for the online computational complexity, which is discussed in Section VII.Moreover, the GMM of the observations, see (12), can be straightforwardly adapted at the MT to any SNR and pilot configuration, by simply updating the means and covariances, cf.(12), without retraining.

A. Conventional Method
In [5], the authors considered limited feedback in the MU-MIMO setting, where BD was applied as precoding algorithm and a uniform power allocation policy was adopted, i.e., no water-filling across streams was conducted.Accordingly, the feedback conveys information regarding the spatial direction of each MT's channel to the BS, and no magnitude information is fed back.
Consider the singular value decomposition (SVD) of the (estimated) channel Ĥj = U j S j V H j of MT j, where S j contains the singular values in descending order on its diagonal, and let V H j contain the first N rx rows of V H j , cf. e.g., [47].Given the matrix V H j , the idea from [5] is to feed back the information regarding V H j to the BS by using a random quantization codebook [48].In particular, each MT-specific random quantization codebook is fixed beforehand and is known to the BS.That is, the codebook ) which are chosen independently and are uniformly distributed over the Grassmann manifold [5], [49].The elements of the random quantization codebook are thus constructed by generating an N tx × N rx dimensional matrix with i.i.d.complex Gaussian entries and then computing an N tx ×N rx dimensional subspace spanned by the matrix using the procedure described in [50].The selection method considered in [5] was the chordal distance metric, i.e., , and other metrics were not investigated in [5].We also considered the chordal distance in our simulations but found that using a capacity inspired selection metric (like in [47]) performed consistently better.Thus, we used the same evaluation principle per MT as in (9), but replaced the transmit covariance matrices by In this way, we also account for the SNR compared to the chordal distance metric, which does not depend on the SNR.For the sake of brevity, we will not show the results obtained by using the chordal distance metric in Section VIII, since we compare to the consistently better baseline.
Each MT reports k ⋆ j to the BS and the BS then represents each MT's channel by . Given the quantized directional information regarding each MT's channel, the BS can employ a common precoding algorithm in order to jointly design the precoders.The authors of [5] focused their analysis on BD and adopted the uniform power allocation policy.In contrast, we will consider a broad range of precoding algorithms, including non-iterative algorithms, RBD with the uniform power allocation policy [29] (an extension of BD), RCI [31], and the iterative WMMSE algorithm [34, Algorithm 1], in order to jointly design the precoders M j of all MTs j ∈ J .

B. Proposed Subspace-based Method
Inspired by the approach from [5], we propose to use the following approach to obtain quantized information regarding each MT's channel.Each transmit covariance matrix of the Lloyd codebook, cf.Section III-A, or the GMM codebook, cf.Section III-B, contains quantized information regarding the spatial directions of each MT's channel and power loadings per stream.We can extract the directional information of each codebook entry by simply performing an SVD of each transmit covariance matrix, that is, where T k contains the singular values in descending order, and taking the matrix Xk which collects the first N rx vectors of X k , as the respective directional information.Accordingly, we have Note that, we have to ensure to take a codebook at sufficiently large SNR in order to guarantee that the matrices Q k all have a rank larger than or equal to N rx .In our simulations this was the case with the codebooks constructed for an SNR of 25 dB.
The described subspace-based method can be used in combination with both, the Lloyd codebook, cf.Section III-A, and the proposed GMM-based codebook, cf.Section III-B.In case of the subspaces extracted from the Lloyd codebook, the feedback per MT is calculated after channel estimation by evaluating When using the GMM-based approach each MT determines its feedback by simply evaluating the responsibilities: In both cases, each MT reports k ⋆ j to the BS which represents each MT's channel using the subspace information associated with the respective codebook entry H j = XH k ⋆ j .The BS can then employ RBD with the uniform power allocation policy, RCI, or the iterative WMMSE algorithm [34, Algorithm 1], in order to jointly design the precoders M j of all MTs j ∈ J .

C. Proposed Generative Modeling-based Method
As an alternative to the aforementioned subspace-based method, the channel matrix of each MT may be modeled as a random variable, similar as in [35], [36], and the average/ergodic achievable rate R j = E[R inst j ] of each MT j ∈ J is considered (note that the expectation is taken with respect to the channel distribution).Then, the ergodic sum-rate maximization problem can be written as [35], [36] max In [35], [36], the authors proposed the stochastic WMMSE (SWMMSE) algorithm.It was shown that the algorithm is guaranteed to converge to the set of stationary points of the stochastic sum-rate maximization problem almost surely [35], Algorithm 1 Generative Modeling-based MU-MIMO Precoder Design 1: Set i = 0, set max. iteration number Imax, and randomly initialize the precoders such that tr( J j=1 Mj M H j ) = ρ.2: repeat 3: Zj ← Mj , ∀j 8: Bj ← Bj + βZj + H i,H j Uj Wj, ∀j 10: i ← i + 1 12: until convergence or i ≥ Imax [36].In each iteration step, the SWMMSE algorithm requires channel samples that represent statistical information about each MT.
The discussed conventional methods are unable to perform the SWMMSE iterations since they are lacking of a generative model, i.e., a model that learns the underlying PDF of the channels and allows to generate new samples that resemble the original channel distribution.In contrast, the proposed GMM approach is able to generate samples following the channel's distribution due to the GMM's sample generation ability.This allows to jointly design the precoders via the SWMMSE algorithm by exploiting statistical information about the MTs given their feedback information.In particular, given the feedback index k ⋆ j of each MT, see (21), one can draw samples from the respective GMM components via which represents statistical information about the channel of MT j.The BS can subsequently design the precoders exploiting the SWMMSE algorithm based on the generated samples utilizing the GMM.In order to feed the channel samples to the SWMMSE algorithm, the channels have to be reshaped H j,sample = unvec(h j,sample ), cf., [35], [36] for more details on the SWMMSE.A summary of the proposed generative modeling-based precoder design method is given in Algorithm 1.

V. DISCUSSION ON THE VERSATILITY OF THE PROPOSED FEEDBACK SCHEME
As discussed earlier, after fitting the GMM centrally at the BS, codebooks to support the point-to-point transmission mode can be constructed, cf.Section III-B.Then, the GMM of the channels, cf.(10), is offloaded to every MT within the coverage area of the BS.In the online phase, the BS regularly sends pilots to the MTs.Depending on the SNR and the pilots, the GMM of the observations, cf.(12), can be straightforwardly constructed from the offloaded GMM of the channels.With the help of the GMM of the observations, the received observations at each MT are processed to a feedback index k ⋆ j by evaluating the responsibility via (21).Given the feedback information of each MT at the BS, the BS can decide for the point-to-point or multi-user mode.In case of the pointto-point mode, the BS can simply select the codebook entry 17), associated with MT j that should be served for data transmission, and no further processing is required.In the multi-user mode, the BS can represent each MT's channel using the subspace information associated with the respective XH k ⋆ j extracted from the high SNR codebook, cf.(19), and jointly design the precoders for multi-user transmission using either non-iterative (RBD or RCI) or iterative approaches (WMMSE), thereby influencing the required processing time for designing the precoders.Alternatively, the BS can exploit the generative modeling capability of the GMM and design the precoders using the SWMMSE via sampling, cf.(23).The proposed versatile feedback scheme is summarized as a flowchart in Fig. 1, where red (blue) colored nodes represent processing steps that are performed at the BS (MTs).
This flexibility is not provided by the discussed state-of-theart approaches, since the feedback for the point-to-point mode associated with the current SNR is determined by selecting an element via (9) out of a codebook constructed with this particular SNR (cf.Section III-A), whereas in the multiuser case, the codebook entry selection is based on another codebook, i.e., the random quantization codebook, cf.(18), or the directional codebook, cf.(20).

VI. BASELINE CHANNEL ESTIMATORS
Conventionally, the DL channel is estimated firstly at each MT and subsequently, the best fitting codebook entry is determined.Thus, in this section, we present the baseline channel estimators which we consider in our simulations.Since channel estimation takes place at each MT separately, we present the estimators from a MT perspective and drop the index j in the following for brevity.The mean squared error (MSE)-optimal channel estimate for the model ( 4) is given by the conditional mean estimator (CME) E[h | y], cf., e.g., [51,Section 8.1].However, the true channel PDF is generally not known and, therefore, the CME can generally not be calculated analytically.Even if the true channel PDF was known, the CME E[h | y] might still not have an analytic expression.
In this work, we use the recently proposed GMM-based channel estimator ĥGMM , see (24), from [21] as a baseline.
The GMM-based channel estimator is proven to asymptotically converge to the optimal CME as the number of GMM components K is increased, with the restriction that A is invertible, see (4).In our case, we would have to fulfil that n p = N tx .However, even if A is not invertible (n p < N tx ) and for a moderate number for K, the GMM-based channel estimator is a powerful estimator as shown in [21].The GMM-based channel estimator utilizes the same GMM as found in Subsection III-B.In particular, the MT can use the GMM (obtained through offloading) to estimate the channel by evaluating: with the responsibilities p(k | y) from ( 13) and ) Accordingly, the estimator ĥGMM is given by a weighted sum of K linear minimum mean square error (LMMSE) estimators-one for each component.The weights p(k | y) are the probabilities that the current observation y corresponds to the kth component.
Another baseline is the LMMSE estimator, which utilizes the sample covariance matrix m , which is calculated using the same set of training samples to fit the GMM, and calculate channel estimates as (cf.[15], [21]): Lastly, compressive sensing approaches commonly assume that the channel exhibits a certain structure: h ≈ Dt, where D = D rx ⊗ (D tx,h ⊗ D tx,v ) is a dictionary with oversampled DFT matrices D rx , D tx,h , and D tx,v (cf., e.g., [52]), because we have a URA at the BS and a ULA at the MT.A compressive sensing algorithm like orthogonal matching pursuit (OMP) [53] can now be used to obtain a sparse vector t, and the estimated channel is then given by ĥOMP = Dt.
Since the sparsity order is not known, but the algorithm's performance crucially depends on it, we use a genie-aided approach to obtain a bound on the performance of the algorithm.Specifically, we use the true channel (perfect CSI knowledge) to choose the optimal sparsity order.

VII. COMPLEXITY ANALYSIS
The responsibilities in (13) are calculated by evaluating Gaussian densities.Since the GMM's covariance matrices and mean vectors do not change for different observations, the inverse and the determinant of the densities can be pre-computed.Therefore, the online evaluation of the responsibilities p(k | y) in ( 13) is dominated by matrix-vector multiplications and has a complexity of O(L 2 ) per GMM component, with L = N rx n p .
Correspondingly, determining the feedback using the GMM via ( 14) has a complexity of O(KN 2 rx n 2 p ). Recall that in this case, no channel estimation needs to be conducted.One huge advantage is, that the complexity does not scale with the number of transmit antennas N tx .This is particularly beneficial if the BS is equipped with many antennas, as it is the case for massive MIMO systems.Moreover, the proposed method allows for parallelization with respect to the number of components K, i.e., all of the K responsibilities can be evaluated in parallel.
When using the conventional approach of first estimating the channel and then searching for the best codebook entry, the complexity depends on the channel estimation complexity and the complexity of the selection method from (9).Among all considered conventional approaches, the GMM estimator from (24) in combination with the selection method that maximizes the rate expression from (9) performed best in our simulations, cf.Section VIII.Evaluating the GMM estimator from (24) has a complexity of O(KN 2 rx n 2 p + KN 2 rx N tx n p ), due to the calculation of the responsibilities p(k | y) and the evaluation of the LMMSE filters from ( 25) [21].The responsibilities p(k | y) from ( 13) to determine the feedback using the GMM according to (14) are the same responsibilities which are needed to evaluate the GMM estimator from (24).Evaluating the GMM estimator further requires the calculation of the LMMSE filters from (25).Thus, in terms of floating-point operations (FLOPS) our proposed method from ( 14) is in any case of lower complexity for the MT as compared to evaluating the GMM estimator from (24).In addition to that, with the conventional approach, the estimated channel has to be further processed to a feedback index by evaluating (9).The complexity of this selection method is O(KN tx N 2 rx + KN 3 rx ), when exploiting the QR decomposition [54,Section 5.2].
This complexity analysis also holds for the multi-user case since the feedback is determined at each MT separately by conducting similar steps, i.e., by first estimating the channel and then evaluating either (18) in the case of random codebooks, or (20) in the case of the directional codebook.
Kronecker Approximation for Reducing the Offloading Amount: In order for a MT to be able to compute feedback indices, the parameters of the GMM f (K) h need to be offloaded to the MT upon entering the BS' coverage area.As demonstrated in a numerical example in Table I, the number of GMM parameters can be quite large.This is mainly due to the large number of parameters of the GMM's covariance matrices.In order to reduce the number of GMM parameters, we can incorporate model-based insights without influencing the online computational complexity.For spatial correlation scenarios, a well-known assumption is that the scattering in the vicinity of the transmitter and receiver are independent of each other, cf.[55].Similarly, as in [21], we use this assumption to constrain the GMM covariance matrices to a Kronecker factorization with fewer parameters, i.e., we construct a GMM consisting of covariance matrices of the form C k = C tx,k ⊗ C rx,k .Thus, instead of fitting a single unconstrained GMM with N × N -dimensional covariances, a transmit-side (receive-side) GMM with N tx × N tx (N rx × N rx )dimensional covariances and K tx (K rx ) components is fitted.Thereafter, a K = K tx K rx -components GMM with N × Ndimensional covariances is obtained by combinatorially computing all Kronecker products of the respective transmit-and receive-side covariance matrices.It was observed in [  that the Kronecker GMM performs almost equally well as compared to the unconstrained GMM.The advantages of the Kronecker GMM are a lower offline training complexity, the ability to parallelize the fitting process, and the need for fewer training samples since the Kronecker GMM has much fewer parameters.
Table I illustrates exemplarily the difference in the number of GMM covariance parameters (taking symmetries into account), where we plug in the simulation parameters of one of the settings with B = 6, which we consider in Section VIII.We can see that, with the Kronecker GMM, the number of parameters that need to be offloaded is drastically reduced.For the remaining settings considered in Section VIII, the reduction factors due to the Kronecker GMM are in the order of approximately 10 2 to 10 3 .For this reason, we solely consider the Kronecker GMM in Section VIII.

VIII. SIMULATION RESULTS
With trends towards massive MIMO, both the BS and the MTs are equipped with many antennas [56].The BS equipped with a URA has in total N tx = N tx,h N tx,v antenna elements, with N tx,h horizontal and N tx,v vertical elements.At the MT, we have a ULA with N rx elements.We consider B feedback bits and thus K = 2 B .We generate datasets with 30 • 10 3 channels for both the UL and DL domain of the scenario described in Section II-D: H UL and H DL .The data samples are normalized such that E[ h 2 ] = N = N tx N rx holds for the vectorized channels.We further set ρ = 1, which allows us to define the SNR as 1 σ 2 n for all MTs, i.e., when σ 2 j = σ 2 n , ∀j ∈ J .We split the two sets H UL and H DL into a training set with M = 20 • 10 3 samples, and the remaining samples constitute an evaluation set, viz., H UL train , H UL eval , H DL train , and H DL eval .The following transmit strategies are always evaluated on H DL eval , i.e., in the DL domain.When we fit the GMM based on H UL train , we transpose all elements of the set to emulate a DL, cf.[8], [15], [17].

A. Point-to-point MIMO
In the single-user case, we depict the normalized spectral efficiency (nSE) as performance measure.The spectral efficiency achieved with a given transmit covariance matrix is normalized by the spectral efficiency achieved with the optimal transmit covariance matrix, which is given by decomposing the channel into N rx parallel streams and employing water-filling [38].The empirical cCDF P (nSE > s) of the normalized spectral efficiency, corresponds to the empirical probability that nSE exceeds a specific value s.
We consider the following baseline transmit strategies: The curves labeled "uni pow cov" represent uniform power allocation where the transmit covariance matrix is given by

Q = ρ
Ntx I.In this case, no CSI knowledge or codebook is used.Moreover, "uni pow eigsp" corresponds to the transmit strategy where a transmit covariance matrix is calculated by allocating equal power on the eigenvectors of the channel.That is, the channel is decomposed into N rx parallel streams and ρ Nrx power is allocated to each stream.Note that this approach is infeasible because the BS would require full knowledge of the DL channel (or its eigenvectors).
In the following, the simulation parameters are N tx = 32 Fig. 4: The probability that the nSE of a certain transmit strategy exceeds s = 80% of the optimal transmit strategy's spectral efficiency for a varying number of pilots, for a system with Ntx = 32, Nrx = 16, and B = 6 bits.
(N tx,h = 8, N tx,v = 4), N rx = 16 and B = 6 bits, thus K = 64 (K tx = 16, K rx = 4).In Fig. 2(a), we set the SNR = 0 dB.The conventional codebook construction approach (cf.Section III-A) is denoted by "Lloyd UL/DL", depending on whether H UL train or H DL train is used as training data to construct the codebooks, respectively.With these approaches, the codebook is known to the BS and the MT and, additionally, perfect CSI is assumed at the MT.Each MT then selects the best possible codebook entry by evaluating (9).We can observe that, using DL or UL training data, results in approximately the same performance.The proposed codebook construction and encoding scheme is denoted by "GMM UL/DL", where we either use H UL train or H DL train as training data to fit the GMM and to construct the codebook as described in Section III-B.With our proposed approach, the knowledge of the codebook at the MT is not required.After offloading the GMM to the MT and given perfect CSI knowledge, the MT can then determine the feedback index by evaluating (15).Again, using DL or UL training data results in approximately the same performance, which is in accordance with the findings from [8], [15], [17].The proposed GMM approach performs slightly worse than the Lloyd clustering approach, which is a consequence of the perfect CSI assumption.In Fig. 2(b), we set SNR = 10 dB, and observe similar results.
However, assuming perfect CSI at the MT is not feasible.In the following, we consider imperfect CSI, i.e., systems with reduced pilot overhead (n p ≤ N tx ).In the remainder, we consider UL training data exclusively.Thus, we omit writing "UL" in the legend from now on.
In Fig. 3(a), the SNR = 0 dB and we have n p = 8.We depict results for the conventional Lloyd codebook construction approach (cf.Section III-A), where we first estimate the channel either via OMP (27), the LMMSE approach (26), or via the GMM estimator (24), and then select a transmit covariance matrix by evaluating (9) given the estimated channel: "Lloyd, ĥOMP ", "Lloyd, ĥLMMSE ", or "Lloyd, ĥGMM ", respectively.Moreover, we compare to the SNR-independent DNN approach from [8], denoted by "DNN y", where a classifier is employed to directly map the observation to a feedback index that specifies an element from the Lloyd codebook.During the training phase, the DNN was provided with input-output pairs {(Y m = unvec(y m ), k ⋆ m )} M m=1 for an SNR range of 0 dB to 25 dB, with 5 dB steps.We employ random search [57] to determine the hyperparameters of the DNN.The DNN consists of D CM convolutional modules, which comprise a convolutional layer, a batch normalization, and an activation function, where D CM is randomly chosen from the range [2,5].Each of the convolutional layers consists of D K kernels, where D K is randomly chosen within [16,64].After a subsequent two-dimensional max-pooling, the features are flattened, and a fully connected layer is employed with an output dimension of K. Depending on the randomly drawn parameters, the number of DNN parameters is at least the same as or even higher than the number of GMM parameters, and increases with the number of pilots.Moreover, note that a separate DNN per pilot configuration is needed.The complexity of the DNN approach is O(KD K N rx n p ).Further details can be found in [8].As can be seen, estimating the channel via the GMM estimator gives the best performance when considering the conventional channel estimation-based approaches.The DNN approach, which also does not require any codebook knowledge, similar to our proposed GMM-based feedback scheme, achieves comparable performance as "Lloyd, ĥOMP " or "Lloyd, ĥLMMSE ".In contrast, with the proposed approach (cf.Section III-B) denoted by "GMM, y", where we bypass channel estimation and directly evaluate (14) for determining a feedback index, we achieve even better performance as compared to any of the conventional approaches.With the curves "Lloyd, h" and "GMM, h", we depict the results for the utopian case of assuming perfect CSI knowledge at the MT.Although the Lloyd approach performs well if perfect CSI is available at the MT, the performance suffers significantly from CSI imperfections (due to noise and low pilot overhead).In contrast, the proposed GMM-based feedback scheme is superior in case of imperfect CSI available at the MT, which resembles practical system deployments.A similar observation can also be made in Fig. 3(b), where the SNR = 15 dB and we only have n p = 4 pilots.
In Fig. 4(a), we set SNR = 0 dB and in Fig. 4(b), we have SNR = 5 dB, where we fix s = 0.8, thus, we consider P (nSE > 0.8) for a varying number of pilots n p .We see that our proposed low-complexity feedback scheme is beneficial in the low number of pilots regime and outperforms the conventional approaches, which either require both channel estimation and the feedback evaluation via (9) or the DNNbased approach.

B. Multi-user MIMO
In this subsection, we present simulation results for the multi-user setup and depict the sum-rate as performance measure, which is given by J j=1 R inst j , cf. (2).We depict the results for 2,500 constellations, where for each constellation, we draw J MTs randomly from our evaluation set H DL eval .The empirical cCDF P (SR > s) of the sum-rate, is used to depict the empirical probability that the sum-rate (SR) exceeds a specific value s.In the following, with "GMM, h" and "GMM, y" we denote the cases, where either perfect CSI h j is assumed or the observations y j are used at each MT j to determine a feedback index using the GMM feedback encoding approach, cf.(21).We omit the index j in the legend for notational convenience.The channel of each MT is then represented by the subspace information extracted from the high-SNR GMM codebook, cf.Section IV-B.With "Lloyd, h", "Lloyd, ĥGMM ", "Lloyd, ĥOMP ", and "Lloyd, ĥLMMSE ", or with "Random, h", "Random, ĥGMM ", "Random, ĥOMP ", and "Random, ĥLMMSE ", we denote the cases where either perfect CSI is assumed at each MT, or the channel is firstly estimated at each MT and then the index of the best fitting subspace entry of the high-SNR Lloyd codebook or of the random codebook, is fed back from each MT to the BS, cf.Sections IV-B and IV-A.
The above mentioned approaches are evaluated using either RBD, RCI, or the iterative WMMSE to jointly design the precoders M j , ∀j ∈ J .In case of RBD and RCI, the used regularization factor is , and the precoders are normalized to satisfy the transmit power constraint, cf.[29]- [31].In the case of the iterative WMMSE, we use [34,Algorithm 1].Additionally, with "GMM samples, h" and "GMM samples, y", we denote the cases where we generate samples which represent each MT's distribution using the GMM and feed them to the SWMMSE algorithm, cf.Section IV-C.In all iterative approaches, we set I max = 300 iterations.
In the following, the simulation parameters are N tx = 16 (N tx,h = 4, N tx,v = 4), N rx = 4 and B = 6 bits, thus K = 64 (K tx = 16, K rx = 4).Accordingly, we have J = 4 MTs.In Fig. 5(a), the SNR = 5 dB and we have n p = 8 pilots.In this case, we use RBD in order to jointly design the precoders.We can observe that the random codebook performs worst.Even with perfect CSI assumed at the MTs, the random codebook approach cannot compete with the environment-aware approaches.The Lloyd directional codebook approach yields the best performance, if the GMM-based channel estimator from ( 24) is used prior to codebook entry selection.Similar to the point-to-point MIMO case, we can observe that the chosen channel estimator significantly impacts the performance.That is, using the LMMSE estimator from (26) or the genie OMP from ( 27) yield worse results as compared to the GMM-based channel estimator.Furthermore, our proposed GMM-based feedback approach ("GMM, y") even outperforms the best conventional approach ("Lloyd, ĥGMM ").A similar behavior can be observed in Fig. 5(b), where we increased the SNR to 10 dB and decreased the number of pilots n p = 4, and use RCI in order to jointly design the precoders.
In the following two figures (Fig. 6 and Fig. 7) we restrict our analysis to RBD as the precoder design algorithm for sake of brevity.The purpose of the next two figures is to quantify the performance gains obtained with our proposed approach from different perspectives.In particular, in Fig. 6 we have SNR = 10 dB and n p = 4, and we consider systems with B = 4 bits, thus, K = 16 (K tx = 8, K rx = 2), or B = 8 bits, thus K = 256 (K tx = 32, K rx = 8) and compare the performance of our proposed GMM-based feedback approach ("GMM, y, B ∈ {4, 8}") to the best performing conventional Lloyd directional codebook ("Lloyd, ĥGMM , B ∈ {4, 8}") and random codebook ("Random, ĥGMM , B ∈ {4, 8}") approaches, which use the GMM-based channel estimator in the channel estimation phase.We can observe that our proposed feedback approach is superior to the conventional methods.In particular, our proposed feedback approach with only B = 4 bits ("GMM, y, B = 4") even outperforms the conventional Lloyd directional codebook with twice as much, i.e., B = 8, bits ("Lloyd, ĥGMM , B = 8").
In Fig. 7, we consider a system with more transmit antennas and, accordingly, more MTs.The simulation parameters are N tx = 64 (N tx,h = 8, N tx,v = 8), N rx = 4, and B = 6 bits, thus, K = 64 (K tx = 16, K rx = 4), and the SNR = 5 dB.Accordingly, we have J = 16 MTs.This time, we depict the performances for a varying number of pilots n p .For a fixed number of pilots n p , our proposed approach ("GMM, y, n p ∈ {2, 6, 12}") outperforms the conventional approaches ("Lloyd, ĥGMM , n p ∈ {2, 6, 12}" and "Lloyd, ĥOMP , n p = 12").More- over, with only n p = 2 pilots, our proposed feedback approach ("GMM, y, n p = 2"), almost achieves the same performance as the conventional approaches, which require n p = 6 pilots in the case of "Lloyd, ĥGMM , n p = 6", or even n p = 12 in the case of "Lloyd, ĥOMP , n p = 12" (i.e., the MTs are unaware of the GMM and use the OMP channel estimator).
If random codebooks are used, the performance with even a large pilot overhead, i.e. n p = 64 pilots, is poor ("Random, ĥGMM , n p = 64").Thus, with our proposed approach, systems with lower pilot overhead can be deployed, which would inherently increase the system throughput.Additionally, with fewer pilots, the complexity of determining the feedback index at the MTs with the proposed GMM-based approach decreases, cf.Section VII.So far, we have only considered non-iterative precoding algorithms, i.e., RBD and RCI.In the remainder, we will focus our analysis on the iterative WMMSE and the SWMMSE precoding techniques.Due to the exclusive usage of channel directional information, i.e., no channel magnitude information is fed back to the BS, as in [5] (cf.Section IV-A), in case of random codebooks, or the Lloyd directional codebook from Section IV-B, or the GMM directional codebook from Section IV-B, changing the number of streams d impacts the overall sum-rate which can be achieved using the iterative WMMSE.In fact, we observed that depending on the SNR, the number of pilots n p , the chosen channel estimator (for the conventional approaches), and the selected codebook, the performance can be improved by varying d ∈ {1, 2, • • • , N rx }, and then setting d to the value which gives the best average performance.In a practical deployment, the parameter d can be pre-adjusted in the offline phase at the BS by emulating a DL system (for example using the set H UL eval ).However, things are different with the SWMMSE.There we observed that setting d = N rx always yields the best performance.Intuitively, due to the sampling involved in the design procedure of the SWMMSE, (average) channel magnitude information is provided to and exploited by the SWMMSE algorithm, which enables the SWMMSE to adjust the stream powers accordingly.
In the remainder, the simulation parameters are again N tx = 16 (N tx,h = 4, N tx,v = 4), N rx = 4, yielding J = 4 MTs, and B = 6 bits, thus, K = 64 (K tx = 16, K rx = 4).In Fig. 8, the SNR = 5 dB and we have n p = 8 pilots.This is the same simulation setting as in Fig. 5(a), where RBD was used in order to design the precoders.By comparing Fig. 8 and Fig. 5(a), we can conclude that by using the iterative precoding techniques, the performances of the conventional and the proposed feedback approaches are improved tremendously.We can observe, that also in the case of iterative precoding techniques, the random codebook approach performs worst.Also in this case, the Lloyd directional codebook approach yields the best performance, if the GMM-based channel estimator from ( 24) is used prior to codebook entry selection whereas using the LMMSE estimator from (26) or the genie OMP from ( 27) deteriorates the performance.Furthermore, our proposed low-complexity GMM-based feedback approach ("GMM, y") again outperforms the best conventional approach ("Lloyd, ĥGMM ").With our generative modeling-based approach from Section IV-C, i.e., the SWMMSE with samples generated by the GMM, denoted by "GMM samples, h" or "GMM samples, y", we even outperform the performance bound of the Lloyd directional codebook approach (which uses the iterative WMMSE) with perfect CSI assumed at the MTs ("Lloyd, h").This shows the great potential of the generative modeling ability of the GMM.In Fig. 9, we still consider a setting with n p = 8 pilots but vary the SNR.We depict the sum-rate averaged over all constellations.We can see, that our proposed GMMbased feedback approach, with either exploiting directional information ("GMM, y") or the generative modeling-based approach ("GMM samples, y") outperform the conventional approaches.We can observe, that in this case, up to an SNR of ≈ 15 dB, the generative modeling-based method performs better, and for larger SNR values, the directional approach is superior.This is illustrated by the arrows in Fig. 9. Thus, the results suggest that jointly designing the precoders by solving the ergodic sum-rate maximization problem from (22) by utilizing the SWMMSE, is beneficial for low to medium SNR values, and the directional approach, which exploits the iterative WMMSE, is superior for larger SNR values.
This observation is also supported by the results in Fig. 10, where we depict the performances of our proposed approaches "GMM, y" and "GMM samples, y" and compare them to the best performing conventional method "Lloyd, ĥGMM ", and the random codebook-based approach which uses the OMP estimator "Random, ĥOMP " (i.e., no environment awareness) for a varying SNR and n p ∈ {2, 8, 16}.There, dotted curves represent n p = 2, dashed curves n p = 8, and solid curves n p = 16 pilots.The approaches with no environment awareness, i.e., "Random, ĥOMP " perform worst.Both of our proposed approaches, i.e., "GMM, y" and "GMM samples, y", with only n p = 2 pilots, outperform the conventional "Lloyd, ĥGMM " approach with four times more, i.e., n p = 8, pilots.When the number of pilots is equal to the number of transmit antennas (large pilot overhead), i.e., n p = N tx = 16, our proposed approaches which solely require the GMM at the MTs, at least can compete with the conventional "Lloyd, ĥGMM " approach, which requires both the GMM and the Lloyd directional codebook at each MT.For SNRs up to about 15 dB our proposed generative modeling-based approach even outperforms the conventional method based on the Lloyd directional codebook.Finally, in Fig. 11, we present how the sum-rate evolves over the number of iterations in the case of the iterative WMMSE (solid curves), or over the number of drawn samples (per MT) in the case of the SWMMSE (dashed curves) for a setting with an SNR = 5 dB and n p = 8 pilots.In comparison, we depict the performance of the case, where we applied RBD (dotted curves) in order to jointly design the precoders.Note that since RBD is a non-iterative approach, the respective curves are constant over the iterations.We can observe that in the case of random codebooks ("Random, ĥGMM ") the performance gains achieved by using the iterative WMMSE, compared to using RBD, are relatively small.In contrast, using the Lloyd ("Lloyd, ĥGMM ") or GMM ("GMM, y") directional codebook approaches, we obtain huge performance gains when we use the iterative precoding techniques.In these cases, already a small number of iterations is enough to reach the performance maximum.Interestingly, a small overshoot can be observed.This artefact is possibly due to the fact that the iterative WMMSE is designed for perfect CSI, but here we are restricted to using directional information due to the limited feedback.In contrast, when we use the generative modeling-based approach ("GMM samples, y") we can observe that the performance steadily improves over the number of drawn samples.We observed this behavior consistently for different SNR values and numbers of pilots.
IX. CONCLUSION In this work, we have investigated a novel GMM-based feedback scheme for FDD systems.In particular, we proposed to use a GMM for codebook construction, feedback encoding, and as a generator, which provides statistical information about the channels of the MTs to the BS.The proposed scheme exhibits lower computational complexity as compared to stateof-the-art approaches, and even allows for parallelization at the MTs.Moreover, the proposed scheme stands out through its versatility.That is, given the feedback information of the MTs at the BS, it is flexible in deciding for the singleuser or the multi-user transmission mode.The versatility is even more pronounced through a convenient adaption at the MTs to any desired SNR and pilot configuration without retraining the GMM.This is a huge advantage as compared to existing end-to-end DNN approaches, which do not provide this versatility.Numerical results have demonstrated that the proposed feedback scheme outperforms conventional methods, especially in configurations with reduced pilot overhead.The achieved performance gains of the proposed scheme can be leveraged to deploy systems with lower pilot overhead or even fewer feedback bits as compared to state-of-the-art methods.

Fig. 1 :
Fig. 1: Flowchart of the proposed versatile feedback scheme.Red (blue) colored nodes are processed at the BS (MTs).

Fig. 2 :11
Fig.2: Empirical complementary cumulative distribution functions (cCDFs) of the normalized (by the optimal transmit strategy) spectral efficiencies achieved with different codebooks and transmit strategies evaluated with perfect CSI, for a system with Ntx = 32, Nrx = 16, and B = 6 bits.

Fig. 3 :
Fig.3: Empirical cCDFs of the normalized (by the optimal transmit strategy) spectral efficiencies achieved with different codebooks and transmit strategies evaluated for a system with Ntx = 32, Nrx = 16, different SNRs, different number of pilots np, and B = 6 bits.
∀j {generate sample of the respective GMM component for each MT} 4: 21]

TABLE I :
Analysis of the number of parameters of the (structured) GMM.
Empirical cCDFs of the sum-rate (Ntx = 16, Nrx = 4, and J = 4 MTs) achieved with different feedback approaches when the iterative WMMSE or the SWMMSE are employed, for a system with an SNR = 5 dB, np = 8 pilots, and B = 6 bits.