Introduction
Generative models have shown great success in learning complex data distributions and subsequently leveraging this prior information for wireless communication applications. This success is built on the great importance of inferring knowledge of the unknown and generally complex channel distribution of, e.g., a base station (BS) environment through a representative dataset. Consequently, the development of advanced channel estimation methodologies has ensued, primarily relying on state-of-the-art generative models such as Gaussian mixture models (GMMs) [1], mixture of factor analyzers (MFAs) [2], generative adversarial networks (GANs) [3], or variational autoencoders (VAEs) [4].
Recently, DMs [5] and score-based models [6] have been identified among the most powerful generative models. Both models are closely related by learning the data distribution through corrupting clean samples with additive (Gaussian) noise and learning the reverse process to generate new samples from pure noise. Advantages over alternative generative models are their great training stability, powerful generative ability (e.g., beating GANs [7]), and their natural modeling of a full SNR range, allowing for great generalization abilities. However, the huge computational overhead associated with these models, i.e., a large number of neural network (NN) forward passes together with resampling after each step in the reverse process, makes their direct application non-trivial in a real-time application like channel estimation.
Nevertheless, DMs have been used in wireless communications, e.g., for channel coding [8] and semantic communications [9]. The work in [10] proposed to utilize a score-based model to perform channel estimation through posterior sampling. However, the approach has several disadvantages that hinder its usage in practical applications, e.g., a high number of network parameters in the order of millions, a huge number of reverse steps in the order of thousands that have to be evaluated online, and the stochasticity of the estimate.
Recently, a deterministic denoising strategy was proposed in [11] that utilizes a DM where the observation’s SNR level is matched with the corresponding intermediate DM step, drastically reducing the number of reverse steps without requiring resampling. Moreover, this denoising strategy was shown to be asymptotically MSE-optimal if the number of DM steps grows large. For practical distributions, it was shown that already a moderate number of DM steps is sufficient for achieving a strong denoising performance close to the utopian bound of the conditional mean estimator (CME). Despite these advantages, the deployed NN based on a sophisticated architecture with parameters in the order of millions might still be impractical to be used for channel estimation. However, multiple-input multiple-output (MIMO) channels contain structural properties, e.g., sparsity in the angular/beamspace domain, particularly at high frequencies that occur in mmWave or THz bands in ultra-massive MIMO systems [12], that can be leveraged to design lightweight NNs of reduced sizes.
In this work, we provide the following contributions:
We propose a novel channel estimation algorithm with the currently top-rated DM as generative prior.
To achieve low complexity and memory overhead together with high robustness in the inference, we design a DM with a lightweight CNN by learning the channel distribution in the sparse angular domain.
We show the asymptotic convergence of the DM-based channel estimator to the MSE-optimal CME.
We evaluate the proposed DM-based estimator on different channel models, showing its versatile applicability. Furthermore, even though the presented DM has a drastically reduced number of parameters and online computational complexity compared to state-of-the-art estimators, it exhibits a better estimation performance.
Preliminaries
A. MIMO System Model
Consider an \begin{equation*} {\boldsymbol {Y}} = {\boldsymbol {H}} {\boldsymbol {P}} + {\boldsymbol {N}} \in \mathbb {C}^{{N_{\mathrm {rx}}} \times N_{\mathrm {p}}} \tag {1}\end{equation*}
B. Diffusion Model as Generative Prior
We briefly review the formulations of DMs from [5]. Given a data distribution \begin{equation*} {\boldsymbol {h}}_{\ell }=\sqrt {\alpha _{\ell }} {\boldsymbol {h}}_{\ell -1} + \sqrt {1-\alpha _{\ell }} {\boldsymbol {\varepsilon }}_{\ell -1} = \sqrt {\bar {\alpha }_{\ell }} {\boldsymbol {h}}_{0} + \sqrt {1-\bar {\alpha }_{\ell }} {\boldsymbol {\varepsilon }}_{0} \tag {2}\end{equation*}
\begin{equation*} p_{\boldsymbol {\theta }}\left ({{{\boldsymbol {h}}_{\ell -1} | {\boldsymbol {h}}_{\ell }}}\right) = {\mathcal {N}}_{\mathbb {C}}\left ({{{\boldsymbol {h}}_{\ell -1}; {\boldsymbol {\mu }}_{\boldsymbol {\theta }}({\boldsymbol {h}}_{\ell }, \ell), \sigma ^{2}_{\ell }{\mathrm {\mathbf {I}}}}}\right), \tag {3}\end{equation*}
\begin{equation*} q\left ({{{\boldsymbol {h}}_{\ell -1} | {\boldsymbol {h}}_{\ell }, {\boldsymbol {h}}_{0}}}\right)={\mathcal {N}}_{\mathbb {C}}\left ({{{\boldsymbol {h}}_{\ell -1}; \tilde {\boldsymbol {\mu }}({\boldsymbol {h}}_{\ell }, {\boldsymbol {h}}_{0}), \sigma _{\ell }^{2}{\mathrm {\mathbf {I}}}}}\right), \tag {4}\end{equation*}
\begin{equation*} \rm {SNR}_{\mathrm {DM}}\left ({{\ell }}\right) = \frac {\mathbb {E}\left [{{\|\sqrt {\bar {\alpha }_{\ell }} {\boldsymbol {h}}_{0}\|_{2}^{2}}}\right ]}{\mathbb {E}\left [{{\|\sqrt {1 - \bar {\alpha }_{\ell }} {\boldsymbol {\varepsilon }}_{0}\|_{2}^{2}}}\right ]} = \frac {\bar {\alpha }_{\ell }}{1 - \bar {\alpha }_{\ell }}, \tag {5}\end{equation*}
Channel Estimation
The MSE-optimal channel estimator is the CME
A. Diffusion-Based Channel Estimator
Wireless channels have unique structural properties that differ from natural signals in other domains, e.g., images. A well-known property is that the channel can be transformed into the angular/beamspace domain representation via a Fourier transform [14, Sec. 7.3.3]. In massive MIMO, the angular domain representation is sparse or highly compressible, especially if the number of multipath propagation clusters and the angular spread are low, which is the case, e.g., in mmWave or THz bands [12]. Consequently, learning the channel distribution in the sparse angular domain reasonably enables the DM to require fewer parameters, which, in turn, makes the training and inference faster and more stable. However, we note that we do not make any specific assumptions about the sparsity level or the structure of the channel distribution, making the proposed approach viable for any propagation scenario.
Therefore, we transform the channels from a given training dataset
For the online channel estimation, we adopt the reverse process from [11] with several changes. First, the pilot matrix is decorrelated by computing the least squares (LS) solution\begin{equation*} \hat {\boldsymbol {H}}_{\mathrm {LS}} = {\boldsymbol {Y}} {\boldsymbol {P}}^{\mathrm {H}} = {\boldsymbol {H}} + \tilde {\boldsymbol {N}} \tag {6}\end{equation*}
Due to the same functional form of the DM’s forward process (2) and the (decorrelated) noisy observation (6), we can interpret the LS solution (6) as an intermediate step in the DM. Utilizing the SNR description of the DM in (5), the DM’s step \begin{equation*} {\hat {\ell }} = \mathop {\arg \,\min }\limits _{\ell } |\rm {SNR}\left ({{\boldsymbol {Y}}}\right) - \rm {SNR}_{\mathrm {DM}}\left ({{\ell }}\right)|. \tag {7}\end{equation*}
After initializing the DM’s SNR-step \begin{equation*} \hat {\boldsymbol {H}}_{0} = f_{{\boldsymbol {\theta }},1}^{(L)}\left ({{ f^{(L)}_{{\boldsymbol {\theta }}, 2}(\cdots f^{(L)}_{{\boldsymbol {\theta }}, {\hat {\ell }}}(\hat {\boldsymbol {H}}_{\hat {\ell }}) \cdots)}}\right) = f^{(L)}_{{\boldsymbol {\theta }}, 1:{\hat {\ell }}}\left ({{\hat {\boldsymbol {H}}_{\hat {\ell }}}}\right). \tag {8}\end{equation*}
Algorithm 1 Channel Estimation via a DM as Generative Prior
Offline DM Training Phase
Training dataset
Transform into angular domain
Train DM
Online DM-based Channel Estimation Phase
Compute LS estimate
Normalize observation’s variance
Transform into angular domain
Initialize DM reverse process
for
end for
Transform back into spatial domain
B. Asymptotic Optimality
The work in [11] has proved the convergence of a similar DM-based estimator as considered in our case to the ground-truth CME if the number of diffusion steps L of the DM grows large under the assumption of a well-trained DM. We demonstrate that the asymptotic convergence analysis provided in [11] is applicable to the channel estimation problem, i.e., under the mild assumptions of [11, Th. 4.4] and for every given observation \begin{equation*} \lim _{L\to \infty } \left \|{{\mathbb {E}\left [{{{\boldsymbol {H}} |{\boldsymbol {Y}}}}\right ] - \rm {ifft}\left ({{f_{{\boldsymbol {\theta }},1:{\hat {\ell }}}^{(L)}\left ({{\rm {fft}\left ({{\tfrac {1}{\xi }{\boldsymbol {Y}}{\boldsymbol {P}}^{\mathrm {H}}}}\right)}}\right)}}\right) }}\right \| = 0 \tag {9}\end{equation*}
We note that the asymptotic behavior holds for any given SNR value of the observation; however, in a practical system, it can be reasonably assumed that the BS receives pilots from a limited SNR range, raising the question of how many diffusion steps are practically necessary for a strong estimation performance. As we show in Section IV, already a moderate to small number of diffusion steps L, in contrast to the score-based channel estimator from [10], is sufficient to outperform state-of-the-art generative prior-aided channel estimators.
C. Diffusion Model Network Architecture
Instead of utilizing a sophisticated NN architecture commonly used for DMs, cf. [5], we design a lightweight CNN. Enabled by the sparse structure of typical wireless channel distributions in the angular domain, cf. Section III-A, the lightweight design addresses the importance of low memory overhead and computational feasibility in real-time wireless channel estimation. The network architecture is detailed in Fig. 1, which is explained in the following. We use parameter sharing across all DM steps, i.e., only a single network is deployed for all DM steps. To this end, a Transformer sinusoidal position embedding of the SNR information is utilized to yield
DM architecture with a lightweight CNN and positional embedding of the SNR information. The NN parameters are shared across all DM steps.
After stacking the real- and imaginary parts of the input
Numerical Results
We consider a massive MIMO scenario with
A. Channel Models
First, we work with the 3rd Generation Partnership Project (3GPP) spatial channel model [16] where the random vector
Second, version 2.4 of the QuaDRiGa channel simulator [17] is used. We simulate an urban macrocell scenario at a frequency of 6 GHz. The BS’s height is 25 meters, covering a 120° sector. The distances between the MTs and the BS are 35–500 meters, and we consider a line-of-sight (LOS) scenario with additional multi-path components. The BS and the MTs are equipped with a uniform linear array with half-wavelength spacing. The channels are post-processed to remove the effective path gain [17].
B. Baseline Approaches
We compare the DM-based estimator with the following classical and generative prior-aided baselines. The LS solution in (6) is denoted by “LS”. A linear minimum mean square error (MMSE) estimate based on the global sample covariance matrix
We additionally compare the proposed DM to the GMM-based estimator from [1] with either full covariance matrices with
C. Complexity and Memory Analysis
The number of necessary model parameters, determining the memory overhead, and the online complexity for the channel estimation are analyzed in Table I. The “Scov” estimator requires
Compared to the discussed baselines, the proposed DM with
MSE performance for the 3GPP channel model with three propagation clusters and
MSE performance over the number of DM steps L (top) and of the intermediate DM steps (bottom) for
D. Performance Evaluation
Fig. 2 assesses the MSE performance for the 3GPP model, cf. Section IV-A, with three propagation clusters. The best estimator over the whole SNR range is the proposed DM with
Fig. 3 evaluates the QuaDRiGa model, cf. Section IV-A. The qualitative results are similar to those of Fig. 2; however, the “Score” and the “GMM Kron” methods are performing almost equally well in this case. Moreover, the DM has an even larger gap to the baseline approaches with up to 5 dB gain in SNR compared to the “Score” model, even in the case of a mismatch in the SNR information. Again, the “DM spatial” approach shows a significant performance loss to the proposed variant with the angular domain transformation. In combination with the low memory overhead and computational complexity, cf. Table I, this highlights the great potential of the proposed DM estimator in practical applications.
Fig. 4 evaluates the MSE for the QuaDRiGa model for varying total DM steps L (top) and over the intermediate DM channel estimates in the reverse process from
Conclusion
This letter introduced a novel MIMO channel estimator based on the DM as generative prior that is provably asymptotically MSE-optimal. It has been shown that through learning the channel distribution in the highly compressible angular domain and employing an estimation strategy that has a lower latency toward higher SNR values, the proposed DM-based estimator unifies low memory overhead together with low computational complexity, in addition to a significantly improved estimation performance compared to state-of-the-art estimators based on generative priors.