Loading [MathJax]/extensions/TeX/boldsymbol.js
Diffusion-Based Generative Prior for Low-Complexity MIMO Channel Estimation | IEEE Journals & Magazine | IEEE Xplore

Diffusion-Based Generative Prior for Low-Complexity MIMO Channel Estimation


Abstract:

This letter proposes a novel channel estimator based on diffusion models (DMs), one of the currently top-rated generative models, with provable convergence to the mean sq...Show More

Abstract:

This letter proposes a novel channel estimator based on diffusion models (DMs), one of the currently top-rated generative models, with provable convergence to the mean square error (MSE)-optimal estimator. A lightweight convolutional neural network (CNN) with positional embedding of the signal-to-noise ratio (SNR) information is designed to learn the channel distribution in the sparse angular domain. Combined with an estimation strategy that avoids stochastic resampling and truncates reverse diffusion steps that account for lower SNR than the given pilot observation, the resulting DM estimator unifies low complexity and memory overhead. Numerical results exhibit better performance than state-of-the-art estimators.
Published in: IEEE Wireless Communications Letters ( Volume: 13, Issue: 12, December 2024)
Page(s): 3493 - 3497
Date of Publication: 04 October 2024

ISSN Information:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Generative models have shown great success in learning complex data distributions and subsequently leveraging this prior information for wireless communication applications. This success is built on the great importance of inferring knowledge of the unknown and generally complex channel distribution of, e.g., a base station (BS) environment through a representative dataset. Consequently, the development of advanced channel estimation methodologies has ensued, primarily relying on state-of-the-art generative models such as Gaussian mixture models (GMMs) [1], mixture of factor analyzers (MFAs) [2], generative adversarial networks (GANs) [3], or variational autoencoders (VAEs) [4].

Recently, DMs [5] and score-based models [6] have been identified among the most powerful generative models. Both models are closely related by learning the data distribution through corrupting clean samples with additive (Gaussian) noise and learning the reverse process to generate new samples from pure noise. Advantages over alternative generative models are their great training stability, powerful generative ability (e.g., beating GANs [7]), and their natural modeling of a full SNR range, allowing for great generalization abilities. However, the huge computational overhead associated with these models, i.e., a large number of neural network (NN) forward passes together with resampling after each step in the reverse process, makes their direct application non-trivial in a real-time application like channel estimation.

Nevertheless, DMs have been used in wireless communications, e.g., for channel coding [8] and semantic communications [9]. The work in [10] proposed to utilize a score-based model to perform channel estimation through posterior sampling. However, the approach has several disadvantages that hinder its usage in practical applications, e.g., a high number of network parameters in the order of millions, a huge number of reverse steps in the order of thousands that have to be evaluated online, and the stochasticity of the estimate.

Recently, a deterministic denoising strategy was proposed in [11] that utilizes a DM where the observation’s SNR level is matched with the corresponding intermediate DM step, drastically reducing the number of reverse steps without requiring resampling. Moreover, this denoising strategy was shown to be asymptotically MSE-optimal if the number of DM steps grows large. For practical distributions, it was shown that already a moderate number of DM steps is sufficient for achieving a strong denoising performance close to the utopian bound of the conditional mean estimator (CME). Despite these advantages, the deployed NN based on a sophisticated architecture with parameters in the order of millions might still be impractical to be used for channel estimation. However, multiple-input multiple-output (MIMO) channels contain structural properties, e.g., sparsity in the angular/beamspace domain, particularly at high frequencies that occur in mmWave or THz bands in ultra-massive MIMO systems [12], that can be leveraged to design lightweight NNs of reduced sizes.

In this work, we provide the following contributions:

  • We propose a novel channel estimation algorithm with the currently top-rated DM as generative prior.

  • To achieve low complexity and memory overhead together with high robustness in the inference, we design a DM with a lightweight CNN by learning the channel distribution in the sparse angular domain.

  • We show the asymptotic convergence of the DM-based channel estimator to the MSE-optimal CME.

  • We evaluate the proposed DM-based estimator on different channel models, showing its versatile applicability. Furthermore, even though the presented DM has a drastically reduced number of parameters and online computational complexity compared to state-of-the-art estimators, it exhibits a better estimation performance.

SECTION II.

Preliminaries

A. MIMO System Model

Consider an N_{\mathrm {tx}} -antenna mobile terminal (MT) sending N_{\mathrm {p}} pilots to the N_{\mathrm {rx}} -antenna BS, yielding\begin{equation*} {\boldsymbol {Y}} = {\boldsymbol {H}} {\boldsymbol {P}} + {\boldsymbol {N}} \in \mathbb {C}^{{N_{\mathrm {rx}}} \times N_{\mathrm {p}}} \tag {1}\end{equation*} View SourceRight-click on figure for MathML and additional features.with {\boldsymbol {H}}\in \mathbb {C}^{{N_{\mathrm {rx}}}\times {N_{\mathrm {tx}}}} being the wireless channel following an unknown distribution, {\boldsymbol {P}} \in \mathbb {C}^{{N_{\mathrm {tx}}} \times N_{\mathrm {p}}} is the unitary pilot matrix, i.e., {\boldsymbol {P}}{\boldsymbol {P}}^{\mathrm {H}} = {\mathrm {\mathbf {I}}} , and \boldsymbol {N} is additive white Gaussian noise (AWGN) with N_{\mathrm {p}} columns {\boldsymbol {n}}_{i}\sim {\mathcal {N}}_{\mathbb {C}}({\boldsymbol {0}}, \eta ^{2}{\mathrm {\mathbf {I}}}) . We consider N_{\mathrm {p}} = {N_{\mathrm {tx}}} , i.e., full pilot observations. We interchangeably use vectorized expressions {\boldsymbol {h}} =\rm {vec}({\boldsymbol {H}})\in {\mathbb {C}}^{N} with N={N_{\mathrm {rx}}}{N_{\mathrm {tx}}} . We consider \mathbb {E} [{\boldsymbol {h}}] = {\boldsymbol {0}} and \mathbb {E}[\|{\boldsymbol {h}}\|_{2}^{2}] = N to define \rm {SNR}({\boldsymbol {Y}}) = 1/\eta ^{2} . Although we discuss a MIMO system, the proposed DM-based estimator can be deployed for various system instances.

B. Diffusion Model as Generative Prior

We briefly review the formulations of DMs from [5]. Given a data distribution {\boldsymbol {h}} \mathrel {\mathrel {\mathop :}\hspace {-0.0672em}=}{\boldsymbol {h}}_{0} \sim p({\boldsymbol {h}}_{0}) , the DM’s forward process produces latent variables {\boldsymbol {h}}_{1} through {\boldsymbol {h}}_{L} by adding Gaussian noise at step \ell with the hyperparameters \alpha _{\ell } \in (0,1) such that\begin{equation*} {\boldsymbol {h}}_{\ell }=\sqrt {\alpha _{\ell }} {\boldsymbol {h}}_{\ell -1} + \sqrt {1-\alpha _{\ell }} {\boldsymbol {\varepsilon }}_{\ell -1} = \sqrt {\bar {\alpha }_{\ell }} {\boldsymbol {h}}_{0} + \sqrt {1-\bar {\alpha }_{\ell }} {\boldsymbol {\varepsilon }}_{0} \tag {2}\end{equation*} View SourceRight-click on figure for MathML and additional features.with {\boldsymbol {\varepsilon }}_{i} \sim {\mathcal {N}}_{\mathbb {C}}({\boldsymbol {0}}, {\mathrm {\mathbf {I}}}) and \bar {\alpha }_{\ell } = \prod _{i=1}^{\ell } \alpha _{i} . The reverse process is, exactly like the forward process, a Markov chain but with parameterized Gaussian transitions\begin{equation*} p_{\boldsymbol {\theta }}\left ({{{\boldsymbol {h}}_{\ell -1} | {\boldsymbol {h}}_{\ell }}}\right) = {\mathcal {N}}_{\mathbb {C}}\left ({{{\boldsymbol {h}}_{\ell -1}; {\boldsymbol {\mu }}_{\boldsymbol {\theta }}({\boldsymbol {h}}_{\ell }, \ell), \sigma ^{2}_{\ell }{\mathrm {\mathbf {I}}}}}\right), \tag {3}\end{equation*} View SourceRight-click on figure for MathML and additional features.motivated by the fact that the forward and reverse processes of a Markov chain have the same functional form [13]. The transitions in (3) are generally intractable to compute; thus, they are learned via a NN through the variational inference principle by utilizing the forward posteriors, which are tractable when conditioned on {\boldsymbol {h}}_{0} [5], i.e.,\begin{equation*} q\left ({{{\boldsymbol {h}}_{\ell -1} | {\boldsymbol {h}}_{\ell }, {\boldsymbol {h}}_{0}}}\right)={\mathcal {N}}_{\mathbb {C}}\left ({{{\boldsymbol {h}}_{\ell -1}; \tilde {\boldsymbol {\mu }}({\boldsymbol {h}}_{\ell }, {\boldsymbol {h}}_{0}), \sigma _{\ell }^{2}{\mathrm {\mathbf {I}}}}}\right), \tag {4}\end{equation*} View SourceRight-click on figure for MathML and additional features.where the conditional variance \sigma _{\ell }^{2} = {}\frac {(1-\alpha _{\ell })(1 - \bar {\alpha }_{\ell -1})}{1 - \bar {\alpha }_{\ell }} is a constant. Thus, a NN is trained to parameterize the conditional mean {\boldsymbol {\mu }}_{\boldsymbol {\theta }}({\boldsymbol {h}}_{\ell },\ell) from (3), approximating \tilde {\boldsymbol {\mu }}({\boldsymbol {h}}_{\ell }, {\boldsymbol {h}}_{0}) in (4) for a given latent {\boldsymbol {h}}_{\ell } at its input. We denote this NN function as f_{{\boldsymbol {\theta }},\ell }^{(L)}({\boldsymbol {h}}_{\ell }) \mathrel {\mathrel {\mathop :}\hspace {-0.0672em}=}{\boldsymbol {\mu }}_{\boldsymbol {\theta }}({\boldsymbol {h}}_{\ell },\ell) . The training of the DM is performed by maximizing the evidence lower bound (ELBO) on the log-likelihood \log p({\boldsymbol {h}}_{0}) , cf. [5] for a detailed derivation. The diffusion steps are interpreted as different SNR-steps by defining the DM’s SNR of step \ell as\begin{equation*} \rm {SNR}_{\mathrm {DM}}\left ({{\ell }}\right) = \frac {\mathbb {E}\left [{{\|\sqrt {\bar {\alpha }_{\ell }} {\boldsymbol {h}}_{0}\|_{2}^{2}}}\right ]}{\mathbb {E}\left [{{\|\sqrt {1 - \bar {\alpha }_{\ell }} {\boldsymbol {\varepsilon }}_{0}\|_{2}^{2}}}\right ]} = \frac {\bar {\alpha }_{\ell }}{1 - \bar {\alpha }_{\ell }}, \tag {5}\end{equation*} View SourceRight-click on figure for MathML and additional features.cf. (2), which monotonically decreases for increasing \ell .

SECTION III.

Channel Estimation

The MSE-optimal channel estimator is the CME {\mathbb {E}}[{\boldsymbol {h}}|{\boldsymbol {y}}] , cf. [14, Appendix A.3]. The CME is generally intractable to compute if the prior distribution p({\boldsymbol {h}}) is unknown, motivating the utilization of the DM as a generative prior.

A. Diffusion-Based Channel Estimator

Wireless channels have unique structural properties that differ from natural signals in other domains, e.g., images. A well-known property is that the channel can be transformed into the angular/beamspace domain representation via a Fourier transform [14, Sec. 7.3.3]. In massive MIMO, the angular domain representation is sparse or highly compressible, especially if the number of multipath propagation clusters and the angular spread are low, which is the case, e.g., in mmWave or THz bands [12]. Consequently, learning the channel distribution in the sparse angular domain reasonably enables the DM to require fewer parameters, which, in turn, makes the training and inference faster and more stable. However, we note that we do not make any specific assumptions about the sparsity level or the structure of the channel distribution, making the proposed approach viable for any propagation scenario.

Therefore, we transform the channels from a given training dataset \mathcal {H} = \{{\boldsymbol {H}}_{m}\}_{m=1}^{M_{\mathrm {train}}} with M_{\mathrm {train}} training samples into the angular domain as \tilde {\mathcal {H}} = \{\tilde {\boldsymbol {H}}_{m} = \rm {fft}({\boldsymbol {H}}_{m})\}_{m=1}^{M_{\mathrm {train}}} , where \tilde {\boldsymbol {H}}_{m} = \rm {fft}({\boldsymbol {H}}_{m}) represents the two-dimensional fast Fourier transform (FFT). Afterward, the DM is trained offline in the usual way by maximizing the ELBO, cf. [5].

For the online channel estimation, we adopt the reverse process from [11] with several changes. First, the pilot matrix is decorrelated by computing the least squares (LS) solution\begin{equation*} \hat {\boldsymbol {H}}_{\mathrm {LS}} = {\boldsymbol {Y}} {\boldsymbol {P}}^{\mathrm {H}} = {\boldsymbol {H}} + \tilde {\boldsymbol {N}} \tag {6}\end{equation*} View SourceRight-click on figure for MathML and additional features.where \tilde {\boldsymbol {N}} = {\boldsymbol {N}} {\boldsymbol {P}}^{\mathrm {H}} is AWGN with variance \eta ^{2} due to \boldsymbol {P} being unitary. Assuming knowledge of the observation’s SNR, the LS estimate is normalized as \hat {\boldsymbol {H}}_{\mathrm {init}} = (1 + \eta ^{2})^{-{}\frac {1}{2}} \hat {\boldsymbol {H}}_{\mathrm {LS}} since the deployed DM is variance-preserving, cf. (2). Afterward, the observation is transformed into the angular domain via \hat {\boldsymbol {H}}_{\mathrm {ang}} = \rm {fft}(\hat {\boldsymbol {H}}_{\mathrm {init}}) . Note that the noise distribution is unaltered by the unitary Fourier transformation.

Due to the same functional form of the DM’s forward process (2) and the (decorrelated) noisy observation (6), we can interpret the LS solution (6) as an intermediate step in the DM. Utilizing the SNR description of the DM in (5), the DM’s step \hat {\ell } that best matches the observation’s SNR is found as\begin{equation*} {\hat {\ell }} = \mathop {\arg \,\min }\limits _{\ell } |\rm {SNR}\left ({{\boldsymbol {Y}}}\right) - \rm {SNR}_{\mathrm {DM}}\left ({{\ell }}\right)|. \tag {7}\end{equation*} View SourceRight-click on figure for MathML and additional features.Subsequently, the DM’s reverse process is initialized by setting \hat {\boldsymbol {H}}_{\hat {\ell }} = \hat {\boldsymbol {H}}_{\mathrm {ang}} . As a consequence, the higher the observation’s SNR is, the less DM reverse steps have to be performed and, in turn, the lower the latency of the channel estimation. This is in sharp contrast to the work in [10] where the inference process is initialized with i.i.d. Gaussian noise and a full reverse sampling process is employed, irrespective of the SNR.

After initializing the DM’s SNR-step \hat {\ell } , the stepwise conditional mean of (3) is iteratively forwarded from \ell ={\hat {\ell }} down to \ell = 1 , without drawing a stochastic sample from p_{\boldsymbol {\theta }}({\boldsymbol {h}}_{\ell -1}|{\boldsymbol {h}}_{\ell }) , ultimately yielding an estimate of {\boldsymbol {H}}_{0} . This can be denoted by the concatenation of the NN functions\begin{equation*} \hat {\boldsymbol {H}}_{0} = f_{{\boldsymbol {\theta }},1}^{(L)}\left ({{ f^{(L)}_{{\boldsymbol {\theta }}, 2}(\cdots f^{(L)}_{{\boldsymbol {\theta }}, {\hat {\ell }}}(\hat {\boldsymbol {H}}_{\hat {\ell }}) \cdots)}}\right) = f^{(L)}_{{\boldsymbol {\theta }}, 1:{\hat {\ell }}}\left ({{\hat {\boldsymbol {H}}_{\hat {\ell }}}}\right). \tag {8}\end{equation*} View SourceRight-click on figure for MathML and additional features.Finally, the resulting estimate is transformed back into the spatial domain via the two-dimensional inverse FFT, yielding \hat {\boldsymbol {H}} = \rm {ifft}(\hat {{\boldsymbol {H}}_{0}}) . The complete offline training and online estimation procedures are concisely summarized in Algorithm 1.

Algorithm 1 Channel Estimation via a DM as Generative Prior

Offline DM Training Phase

Require:

Training dataset \mathcal {H} = \{{\boldsymbol {H}}_{m}\}_{m=1}^{M_{\mathrm {train}}}

1:

Transform into angular domain \tilde {\mathcal {H}} = \{\rm {fft}({\boldsymbol {H}}_{m})\}_{m=1}^{M_{\mathrm {train}}}

2:

Train DM \{f_{{\boldsymbol {\theta }},\ell }^{(L)}\}_{\ell = 1}^{L} with \tilde {\mathcal {H}} , cf. [5]

Online DM-based Channel Estimation Phase

Require:

\{f_{{\boldsymbol {\theta }},\ell }^{(L)}\}_{\ell = 1}^{L} , \boldsymbol {Y} , \eta ^{2} , \boldsymbol {P}

3:

Compute LS estimate \hat {\boldsymbol {H}} \leftarrow {\boldsymbol {Y}} {\boldsymbol {P}}^{\mathrm {H}}

4:

Normalize observation’s variance \hat {\boldsymbol {H}} \leftarrow (1 + \eta ^{2})^{-{}\frac {1}{2}} \hat {\boldsymbol {H}}

5:

Transform into angular domain \hat {\boldsymbol {H}} \leftarrow \rm {fft}(\hat {\boldsymbol {H}})

6:

{\hat {\ell }} = \arg \,\min _{\ell } |\rm {SNR}({\boldsymbol {Y}}) - \rm {SNR}_{\mathrm {DM}}(\ell)|

7:

Initialize DM reverse process \hat {\boldsymbol {H}}_{\hat {\ell }} \leftarrow \hat {\boldsymbol {H}}

8:

for \ell ={\hat {\ell }} down to 1 do

9:

\hat {\boldsymbol {H}}_{\ell -1} \leftarrow f_{{\boldsymbol {\theta }},\ell }^{(L)}(\hat {\boldsymbol {H}}_{\ell })

10:

end for

11:

Transform back into spatial domain \hat {\boldsymbol {H}} \leftarrow \rm {ifft}(\hat {\boldsymbol {H}}_{0})

B. Asymptotic Optimality

The work in [11] has proved the convergence of a similar DM-based estimator as considered in our case to the ground-truth CME if the number of diffusion steps L of the DM grows large under the assumption of a well-trained DM. We demonstrate that the asymptotic convergence analysis provided in [11] is applicable to the channel estimation problem, i.e., under the mild assumptions of [11, Th. 4.4] and for every given observation \boldsymbol {Y} it holds that\begin{equation*} \lim _{L\to \infty } \left \|{{\mathbb {E}\left [{{{\boldsymbol {H}} |{\boldsymbol {Y}}}}\right ] - \rm {ifft}\left ({{f_{{\boldsymbol {\theta }},1:{\hat {\ell }}}^{(L)}\left ({{\rm {fft}\left ({{\tfrac {1}{\xi }{\boldsymbol {Y}}{\boldsymbol {P}}^{\mathrm {H}}}}\right)}}\right)}}\right) }}\right \| = 0 \tag {9}\end{equation*} View SourceRight-click on figure for MathML and additional features.where \xi = \sqrt {1 + \eta ^{2}} . The proof is a direct consequence of [11, Th. 4.4] by using the bijectivity of the pilot decorrelation and the Fourier transforms. This asymptotic performance guarantee justifies using the proposed inference strategy in contrast to the existing work in [10], where a full reverse process with stochastic resampling is employed for every observation.

We note that the asymptotic behavior holds for any given SNR value of the observation; however, in a practical system, it can be reasonably assumed that the BS receives pilots from a limited SNR range, raising the question of how many diffusion steps are practically necessary for a strong estimation performance. As we show in Section IV, already a moderate to small number of diffusion steps L, in contrast to the score-based channel estimator from [10], is sufficient to outperform state-of-the-art generative prior-aided channel estimators.

C. Diffusion Model Network Architecture

Instead of utilizing a sophisticated NN architecture commonly used for DMs, cf. [5], we design a lightweight CNN. Enabled by the sparse structure of typical wireless channel distributions in the angular domain, cf. Section III-A, the lightweight design addresses the importance of low memory overhead and computational feasibility in real-time wireless channel estimation. The network architecture is detailed in Fig. 1, which is explained in the following. We use parameter sharing across all DM steps, i.e., only a single network is deployed for all DM steps. To this end, a Transformer sinusoidal position embedding of the SNR information is utilized to yield \bar {\boldsymbol {\ell }}\in \mathbb {R}^{C_{\mathrm {init}}} , cf. [15, Sec. 3.5] for details, which is, after going through a linear layer, subsequently split into a scaling vector {\boldsymbol {\ell }}_{\mathrm {s}}\in \mathbb {R}^{C_{\mathrm {max}}} and a bias vector {\boldsymbol {\ell }}_{\mathrm {b}}\in \mathbb {R}^{C_{\mathrm {max}}} .

Fig. 1. - DM architecture with a lightweight CNN and positional embedding of the SNR information. The NN parameters are shared across all DM steps.
Fig. 1.

DM architecture with a lightweight CNN and positional embedding of the SNR information. The NN parameters are shared across all DM steps.

After stacking the real- and imaginary parts of the input \hat {\boldsymbol {H}}_{\ell } into two convolutional channels, we employ two 2D convolution layers with kernel size of {k} = 3 in both dimensions and rectified linear unit (ReLU) activation, gradually increasing the number of convolution channels up to C_{\mathrm {max}} , after which the SNR embedding is connected. Afterward, three 2D convolution layers map to the estimate \hat {\boldsymbol {H}}_{\ell -1} by gradually decreasing the convolutional channels. The proposed network architecture was found by random search over the hyperparameters and outlines a flexible adaptation to different configurations. We choose the same linear schedule of \alpha _{\ell } as in [11, Table I]. Further details and hyperparameters can be found in the publicly available simulation code.1

SECTION IV.

Numerical Results

We consider a massive MIMO scenario with ({N_{\mathrm {rx}}}, {N_{\mathrm {tx}}}) = (64, 16) . For the DM, we choose C_{\mathrm {init}}=16 and C_{\mathrm {max}} = 64 . All data-aided approaches are trained on M_{\mathrm {train}} = 100{,}000 training samples and the MSE is normalized by N = {N_{\mathrm {rx}}}{N_{\mathrm {tx}}} .

A. Channel Models

First, we work with the 3rd Generation Partnership Project (3GPP) spatial channel model [16] where the random vector \boldsymbol {\delta } collects the angles of arrival/departure and path gains of the non-line-of-sight (NLOS) propagation clusters between a MT and the BS to construct the channel covariance matrix {\boldsymbol {C}}_{\boldsymbol {\delta }} , cf. [1, eq. (26)]. For every channel sample, we draw a new \boldsymbol {\delta } and subsequently draw the sample as {\boldsymbol {h}} | {\boldsymbol {\delta }} \sim {\mathcal {N}}_{\mathbb {C}}({\boldsymbol {0}}, {\boldsymbol {C}}_{\boldsymbol {\delta }}) , which results in an overall non-Gaussian channel distribution [1].

Second, version 2.4 of the QuaDRiGa channel simulator [17] is used. We simulate an urban macrocell scenario at a frequency of 6 GHz. The BS’s height is 25 meters, covering a 120° sector. The distances between the MTs and the BS are 35–500 meters, and we consider a line-of-sight (LOS) scenario with additional multi-path components. The BS and the MTs are equipped with a uniform linear array with half-wavelength spacing. The channels are post-processed to remove the effective path gain [17].

B. Baseline Approaches

We compare the DM-based estimator with the following classical and generative prior-aided baselines. The LS solution in (6) is denoted by “LS”. A linear minimum mean square error (MMSE) estimate based on the global sample covariance matrix {\boldsymbol {C}} = {}\frac {1}{M_{\mathrm {train}}}\sum _{m=1}^{M_{\mathrm {train}}} {\boldsymbol {h}}_{m}{\boldsymbol {h}}_{m}^{\mathrm {H}} is computed as \hat {\boldsymbol {h}}_{\mathrm {Scov}} = {\boldsymbol {C}}({\boldsymbol {C}} + \eta ^{2}{\mathrm {\mathbf {I}}})^{-1} {\boldsymbol {y}} , labeled “Scov”. Assuming genie knowledge of the ground-truth covariance matrix {\boldsymbol {C}}_{\boldsymbol {\delta }} from the 3GPP model, we evaluate the estimator \hat {\boldsymbol {h}}_{\mathrm {genie}} = {\boldsymbol {C}}_{\boldsymbol {\delta }}({\boldsymbol {C}}_{\boldsymbol {\delta }} + \eta ^{2} {\mathrm {\mathbf {I}}})^{-1} {\boldsymbol {y}} , which yields a utopian bound for all estimators, cf. [1], labeled “genie”.

We additionally compare the proposed DM to the GMM-based estimator from [1] with either full covariance matrices with {K} = 128 components (“GMM”) or the Kronecker version thereof with (K_{\mathrm {rx}}, K_{\mathrm {tx}}) = (16, 8) components (“GMM Kron”), cf. [18]. We also evaluate the score-based channel estimator from [10], labeled “Score”, using a convolutional RefineNet architecture with {D} = 6 residual blocks with {J} = 7 layers and {W} = 32 channels after the first layer, a maximum channel size of C_{\mathrm {max}} = 128 , a kernel size of {k} = 3 , and L_{\mathrm {sc}}=7\cdot 10^{3} reverse steps with resampling. After training, we perform a hyperparameter search on the test data and take the best MSE value of all reverse steps, yielding a genie-aided bound on its performance. For our simulations, we have found \alpha _{0} = 10^{-8} and \beta = 10^{-2} , cf. [10, eq. (16)], to yield the best performance. Since the GAN-based estimator [3] is outperformed by the “Score” model [10], it is not evaluated in this letter.

C. Complexity and Memory Analysis

The number of necessary model parameters, determining the memory overhead, and the online complexity for the channel estimation are analyzed in Table I. The “Scov” estimator requires N^{2} = N^{2}_{\mathrm {rx}}N_{\mathrm {tx}}^{2} parameters, which scales badly in massive MIMO systems. Additionally, pre-computation of the individual filters for each SNR value is necessary to achieve the stated order of complexity, which might be intractable to achieve in practice due to memory issues. The “GMM” model has an intractably high number of parameters, thus being prone to overfitting, as seen later. Therefore, the stated order of complexity of the K linear MMSE filters with pre-computed inverses for each SNR value as proposed in [1] may not be realistic to achieve in massive MIMO. Although the “GMM Kron” version needs much fewer model parameters, it has the same online complexity, cf. [18], and thus the same memory issues to enable parallelization and precomputation. The “Score” model from [10] also has a high number of parameters. In addition, the online complexity, dominated by a vast number of forward passes through a sophisticated deep NN, is very high, detrimentally affecting the latency.

TABLE I Memory and Complexity Analysis for ({N_{\mathrm {rx}}}, {N_{\mathrm {tx}}}) = (64, 16)
Table I- Memory and Complexity Analysis for 
$({N_{\mathrm {rx}}}, {N_{\mathrm {tx}}}) = (64, 16)$

Compared to the discussed baselines, the proposed DM with {L} = 100 DM steps has a low memory overhead with several orders fewer parameters. In addition, it exhibits a convenient scaling in the number of antennas due to the usage of FFTs and convolutional layers. Since the truncation of inference steps depends on the SNR, the number of forward passes {\hat {\ell }} \lt L~\ll ~L_{\mathrm {sc}} is much lower than in the comparable score-based model [10], yielding a much lower overall complexity. An evaluation of \hat {\ell } is shown in Fig. 4. Together with the great potential for parallelization on GPUs due to exceptionally low memory overhead, this highlights the excellent scaling properties of the proposed DM in massive MIMO applications, achieving a practicably viable online complexity.

Fig. 2. - MSE performance for the 3GPP channel model with three propagation clusters and 
${L} = 100$
 DM steps.
Fig. 2.

MSE performance for the 3GPP channel model with three propagation clusters and {L} = 100 DM steps.

Fig. 3. - MSE performance for the QuaDRiGa channel model for 
${L} = 100$
.
Fig. 3.

MSE performance for the QuaDRiGa channel model for {L} = 100 .

Fig. 4. - MSE performance over the number of DM steps L (top) and of the intermediate DM steps (bottom) for 
${L} = 300$
 (solid) and 
${L} = 100$
 (dashed) for the QuaDRiGa model.
Fig. 4.

MSE performance over the number of DM steps L (top) and of the intermediate DM steps (bottom) for {L} = 300 (solid) and {L} = 100 (dashed) for the QuaDRiGa model.

D. Performance Evaluation

Fig. 2 assesses the MSE performance for the 3GPP model, cf. Section IV-A, with three propagation clusters. The best estimator over the whole SNR range is the proposed DM with {L} = 100 steps, achieving an estimation performance close to the utopian bound “genie” and thus showing that the asymptotic optimality from Section III-B is achievable already with a moderate number of DM steps L. Additionally, we evaluate the case of a mismatch in the SNR information where the ground-truth SNR is corrupted with a uniformly distributed offset in the range [−3,3]dB, labeled “DM ± 3dB”, showing only marginal performance losses and highlighting the great robustness of the proposed DM for channel estimation. For “DM spatial”, we trained a DM with the exact same architecture and hyperparameters in the spatial domain, i.e., without transforming into the sparse angular domain, leading to a significant performance loss and showing the great impact of this transformation step. We note that a deeper network with more parameters is necessary to achieve the same performance as the presented lightweight model. The “Score” model, being comparably good as the DM in the high SNR regime, suffers in performance in the low SNR regime. The “GMM” variant is highly overfitting due to the immense number of model parameters, cf. Table I; in contrast, the “GMM Kron” version performs better, but with a considerable gap to the DM estimator, especially in the high SNR regime.

Fig. 3 evaluates the QuaDRiGa model, cf. Section IV-A. The qualitative results are similar to those of Fig. 2; however, the “Score” and the “GMM Kron” methods are performing almost equally well in this case. Moreover, the DM has an even larger gap to the baseline approaches with up to 5 dB gain in SNR compared to the “Score” model, even in the case of a mismatch in the SNR information. Again, the “DM spatial” approach shows a significant performance loss to the proposed variant with the angular domain transformation. In combination with the low memory overhead and computational complexity, cf. Table I, this highlights the great potential of the proposed DM estimator in practical applications.

Fig. 4 evaluates the MSE for the QuaDRiGa model for varying total DM steps L (top) and over the intermediate DM channel estimates in the reverse process from \ell ={\hat {\ell }} to \ell = 0 (bottom). On the one hand, it can be observed that already a low number of DM steps L is sufficient for reasonable performance, saturating beyond {L} = 100 for all SNR values, demonstrating that a moderate number of DM steps L is sufficient for a strong estimation performance. On the other hand, the intermediate channel estimates of the DM’s reverse process are almost monotonically improving in performance, validating the stable reverse process.

SECTION V.

Conclusion

This letter introduced a novel MIMO channel estimator based on the DM as generative prior that is provably asymptotically MSE-optimal. It has been shown that through learning the channel distribution in the highly compressible angular domain and employing an estimation strategy that has a lower latency toward higher SNR values, the proposed DM-based estimator unifies low memory overhead together with low computational complexity, in addition to a significantly improved estimation performance compared to state-of-the-art estimators based on generative priors.

References

References is not available for this document.