Introduction
Following the commercial launch of fifth-generation (5G) networks in 2020, the vision and plan of sixth-generation (6G) that aims to provide communication services for the 2030s has started [1]. 6G wireless systems are anticipated to attain a peak data rate of 1 Tbps, a tenfold increase from that of 5G, with an expectation of spectral efficiency doubling to 60 bps/Hz. There is also an expectation of significant enhancement in end-to-end performance, with a packet error of
Millimeter-wave (mm-Wave) (30-300 GHz) communications below 100 GHz have been utilized in 5G systems to support higher data rates. However, it is challenging for mm-Wave systems to achieve a Tbps-level data rate due to the limited bandwidth (up to 20 GHz) and spectral efficiency (below 30 bps/Hz) [2]. Hence, the Terahertz (THz) band (0.1 - 10 THz) is nominated among all available frequency bands due to its ultra-wide bandwidth of up to hundreds of GHz, which is mostly unlicensed [2]. However, THz band is exposed to strong free space path loss due to high operating frequency and severe absorption loss because of gaseous concentration [3]. On the other hand, the ultra-high operating frequency allows the integration of thousands of tightly packed antennas within an area of 1 mm2, making it possible for the ultra-massive multiple-input multiple-output (UM-MIMO) systems to achieve high beamforming gain to compensate for the high path loss to enable Tbps wireless communications [4]. Moreover, extremely large-scale antenna arrays can be used for spatial multiplexing to increase spectral efficiency [5].
Beamforming and spatial multiplexing rely heavily on accurate channel state information (CSI) obtained through channel estimation (CE). However, CE in the THz band has significant challenges and the conventional CE techniques are not able to provide satisfactory performance due to the following reasons. First, channel modeling in the THz band is more complex due to near-field effect [6], [7], [8]. Second, the computational complexity is extremely high, considering the huge channel matrix and the dictionary matrix while using conventional compressed sensing (CS)-based methods. Moreover, conventional schemes suffer from high communication overhead since prior knowledge of channel statistics is often needed while it usually remains unavailable [9].
A. Related Works
In the following, we present state-of-the-art CE schemes considering both traditional methods and emerging deep learning (DL)-based techniques.
1) Conventional Methods
Beyond the conventional least squares (LS) and linear minimum mean square error (LMMSE) solutions, traditional CE methods involve dictionary-based CS. More precisely, those methods assume that angle of arrival (AoA) and angle of departure (AoD) are taken from a fixed grid and exploit the channel sparsity in angular domains, where CS algorithms such as orthogonal matching pursuit (OMP) [10] and approximate message passing (AMP) [11] can be utilized. Moreover, the authors in [12] propose a generalized simultaneous OMP (GSOMP) algorithm, a variant of the traditional OMP algorithm, by exploiting the common support property and considering the CE problem as a generalized multiple measurement vector (GMMV) problem, where multiple sensing matrices are employed. To overcome the off-grid problem where the actual AoA and AoD deviate from the grid, in [13], the estimates of the angles obtained from GSOMP is refined using electromagnetic (EM)-based methods. Due to the reduced communication distance, several works are tailored for cross-field CE problems. In [14], a simultaneous OMP (SOMP)-based algorithm with dictionary reduction is proposed to solve the cross-field CE problem with lower complexity. In [15], the authors propose an on-grid polar-domain SOMP (P-SOMP) algorithm to estimate channels in the near field effectively. To accommodate the model mismatch for the line-of-sight (LoS) path in the near field, in [16], a LoS-NLoS-separated CE algorithm is proposed where the former is based on parameter estimation and the latter on P-SOMP. LoS sensing-based CE in unmanned aerial vehicle (UAV)-assisted system is also investigated in [17].
Conventional dictionary-based methods rely heavily on accurate channel modeling and dictionary building. Methods are being developed by either using more accurate channel models or refining the dictionary to mitigate the model mismatch problems. These methods may not perform as well as in the lower frequency band, since the antenna dimension goes higher in the THz band, where the dictionary becomes larger and the computation load is much heavier.
2) Deep-Learning-Based Methods
The success of DL in various fields makes it a promising candidate for MIMO CE. DL-based CE algorithms can be categorized into two classes: data-driven methods and model-driven methods. Data-driven approaches, also known as black-box methods, aim to provide an end-to-end mapping from the received signal to the full channel matrix or its corresponding parameters. The authors in [18] utilize the deep kernel learning (DKL) algorithm by using Gaussian process regression, where second-order statistics are learned using a multilayer perceptron (MLP) neural network. Moreover, deep convolutional neural network (DCNN) is extensively exploited in the literature [19], [20], [21], [22] since channel matrices are analogous to pixel-based images, whose intrinsic properties can be captured by convolutional layers. In [19], the channel is initially coarsely estimated using AMP and is then fed to a DCNN as refinement. In addition, insignificant neurons are pruned to reduce inference complexity. In [20], the authors propose a federated learning (FL)-based framework for CE to reduce communication overhead. The works mentioned above leverage the far-field dictionary, which results in high estimation errors in the near-field. To tackle the near-field CE problem, the channel matrix is built first upon the planar wave assumptions in [21]. Then it is multiplied by a correction matrix representing the phase difference between planar and spherical waves. In [22], the authors create a near-field dictionary with an additional dimension for distance, creating a denser grid with higher complexity. Apart from the DCNN structures, generative adversarial network (GAN)-based CE schemes have emerged in recent years [23], [24], [25], [26]. The authors in [23] propose a GAN-based CE framework for wideband channels. GAN is trained to learn the channel distribution from data and generate channel samples from the learned distribution. Then, gradient descent and its variants are used to find the best-generated channel based on the measurements. The authors in [24] and [25] employ the Wasserstein GAN with gradient penalty (WGAN-GP) algorithm to stabilize training and proposing an FL-based algorithm that distributes the training to multiple users, respectively. In [26], the authors propose a score-based generative model to avoid adversarial training, relying on less stringent assumptions regarding the low-dimensional characteristics of wireless channels. DL-based CE in orthogonal frequency-division multiplexing (OFDM) and orthogonal time frequency space (OTFS) are studied in [27] and [28], respectively. In contrast to the data-driven approaches mentioned above, model-driven methods intertwine DL with domain knowledge by expanding traditional algorithms and substituting non-linear estimators (NLEs) with a neural network. The initial work of deep unfolding methods truncates a set of algorithms to a fixed number of layers, imposing instability and unaffordable training costs [29], [30]. In [9], a fixed point network (FPN) featuring a cyclical topology that offers an adjustable trade-off between accuracy and computational complexity is proposed based on the orthogonal approximate message passing (OAMP) algorithm. In addition to minimizing the mean-square error (MSE) loss in training, a constraint on input and output is imposed to enhance the stability of the algorithm. In [31], a joint data-driven and model-driven CE approach is investigated under imperfect hardware consideration. Despite all the merits of DL-based methods, current DL-based estimators often prioritize estimation accuracy while neglecting computational complexity, which is critical in practical implementation. In this paper, we aim to maintain high estimation accuracy while minimizing the inference complexity by using a similar principle of meta-learning but with only one task dataset [32].
B. Contribution
The GAN-based estimator [23], [24], [25], [33] has demonstrated superior performance to traditional methods such as OMP [34], least absolute shrinkage and selection operator (LASSO) [35], and Expectation-maximization Gaussian-mixture Approximate Message Passing (EM-GM-AMP) [36], as well as DL-based methods such as Resnet [37]. Specifically, [23] shows that GAN estimator achieves more than 5 dB lower normalized mean squared error (NMSE) than ResNet while using only 6% of model parameters. In low compression-ratio scenarios, [24] demonstrates that GAN estimator outperforms LASSO and EM-GM-AMP by more than 5 dB margin. The superiority of GAN estimator is further validated in [25] across all considered 3GPP delay profiles (CLD-A through E), where it consistently achieves higher accuracy than EM-GM-AMP. The GAN model can create a highly compressed latent representation for high-dimensional channels. It also offers lower communication overhead and high estimation accuracy, especially at low signal-to-noise ratio (SNR). However, GAN-based estimator is not optimized for CS tasks. First, the reconstruction of GAN-based estimator is slow, involving hundreds to thousands of gradient descent steps with several random restarts [38]. Second, GAN is difficult to train due to the nature of the problem, i.e., finding a Nash equilibrium between two neural networks [39]. The adversarial loss designed also provides little insight into the model performance in CS, making it difficult to evaluate the model during training. Third, training convergence is slow, since most computation resources are spent on the discriminator, while the performance of the CS model only relies on the generator.
To address the aforementioned problems with GAN estimators, we propose deep compressed sensing (DCS), a CE framework for frequency-selective THz UM-MIMO CE tasks. The main contribution can be summarized as follows.
We propose DCS, a CE framework based on generative neural networks for THz UM-MIMO systems. We train a neural network that can produce realistic THz channel samples, eliminating complex channel modeling, especially in the near field.
We demonstrate how to integrate the CS framework into training a generative neural network. Our model is trained to adapt to the CE task with at least 8 dB lower NMSE compared to conventional estimators, namely LS, LMMSE, and OMP. Furthermore, our model achieves around 3 dB lower NMSE in comparison to the GAN estimator while reducing the number of online inference steps by one order of magnitude, which solves the most detrimental problem of the GAN estimator.
We design a loss function that accounts for both the reconstruction error and enforces the restricted isometry property (RIP) to ensure successful channel reconstruction with high probability. Moreover, the designed loss function provides a good measure of model performance compared to adversarial loss in GAN training, which is less related to the performance in CE tasks.
Our model waives the training of a discriminator that consumes the majority of the training computation. The task-aware nature of our model results in 4 times faster training convergence compared to GAN.
C. Notations and Structure of the Paper
Throughout this paper, we adhere to the following notation: A represents a matrix, a is a scalar, and a is a vector. The second norm of a is denoted as
The remainder of the paper is organized as follows. In Section II, the system model is described. In Section III, the proposed DCS-based channel estimator is explained. In Section IV, the numerical results are presented. Finally, in Section V, conclusions are provided.
System Model
In this section, we consider a multi-carrier THz UM-MIMO communication system with hybrid beamforming and combining techniques [40], as shown in Figure 1.
Illustration of a THz UM-MIMO system with
A. Array-of-Subarrays (AoSA)
We assume that transmitter (Tx) and receiver (Rx) both deploy an array-of-subarrays (AoSA) structure with planar antenna arrays distributed on the Y-Z plane of their local Cartesian coordinates, where the origins lie in the centers of Tx and Rx. The total number of sub-arrays (SAs) is
B. Signal Model
To estimate wideband THz channels, we adopt OFDM and allocate pilot signals with \begin{align*} \mathbf {x}[k]=& \mathbf {F}_{\mathrm {RF}} \mathbf {F}_{\mathrm {BB}}[k] \mathbf {s}[k], \\=& \mathbf {F}[k]\mathbf {s}[k], \tag {1}\end{align*}
Due to the partially-connected structure in which the AEs in a SA shares a unique RF chain, the analog beamforming matrix \begin{align*} \mathbf {F}_{\mathrm {RF}}= \left [{{\begin{array}{cccc} \mathbf {f}_{\mathrm {RF}}^{1} & \quad \mathbf {0} & \quad \cdots & \quad \mathbf {0} \\ \mathbf {0} & \quad \mathbf {f}_{\mathrm {RF}}^{2} & \quad \cdots & \quad \mathbf {0} \\ \vdots & \quad \vdots & \quad \ddots & \quad \vdots \\ \mathbf {0} & \quad \mathbf {0} & \quad \cdots & \quad \mathbf {f}_{\mathrm {RF}}^{N_{\mathrm {RF},\mathrm {t}}} \end{array}}}\right ] \text {,}~ \tag {2}\end{align*}
\begin{equation*} \mathbf {f}_{\mathrm {RF}}^{j}\left [{{i}}\right ] = \frac {1}{\sqrt {N_{\mathrm {AE},\mathrm {t}}}} e^{j \psi _{i, j}}, \forall i \in \left \{{{1, 2,\ldots, N_{\mathrm {AE},\mathrm {t}}}}\right \}, \tag {3}\end{equation*}
The received signal at the Rx is processed by the analog combiner \begin{equation*} \mathbf {y}[k]=\mathbf {C}^{\mathrm {H}}[k]\mathbf {H}[k] \mathbf {x}[k]+\mathbf {C}^{\mathrm {H}}[k]\mathbf {n}[k], \tag {4}\end{equation*}
C. Channel Model
The frequency-selective THz channel model is built upon the division of multiple subcarriers, each consisting of multiple sub-bands. Within this framework, we consider the presence of both LoS and non-line-of-sight (NLoS) components for each subcarrier. Considering a delay tap with length \begin{equation*} \mathbf {H}^{u}=\mathbf {H}_{\mathrm {LoS}}^{u} + \mathbf {H}_{\mathrm {NLoS}}^{u}, \tag {5}\end{equation*}
In near-field UM-MIMO systems, each Tx-Rx AE pair experiences unique propagation paths for LoS components due to spherical wave propagation [16]. Thus, we model LoS channel under the geometric free space propagation assumption for each pair of Tx and Rx AEs instead of using array response vectors as commonly used in Saleh-Valenzuela (SV) model. Specifically, letting \begin{equation*} \mathbf {H}_{\mathrm {LoS}}^{u}\left [{{n_{\mathrm {r}}, n_{\mathrm {t}}}}\right ] = g_{0}\left ({{f_{k}, d_{n_{\mathrm {r}}, n_{\mathrm {t}}}}}\right) \delta \left ({{u T_{s}-\tau _{n_{\mathrm {r}}, n_{\mathrm {t}}}}}\right), \tag {6}\end{equation*}
\begin{equation*} \mathbf {H}_{\mathrm {NLoS}}^{u} = \sum _{\ell =1}^{L} g_{\ell } \mathbf {a}_{\mathrm {r}} \mathbf {a}_{\mathrm {t}}^{\mathrm {H}} \delta \left ({{u T_{s}-\tau _{\ell }}}\right), \tag {7}\end{equation*}
\begin{align*} \mathbf {a}_{\mathrm {t}}\left ({{\theta _{\mathrm {t}}^{\ell }, d_{\mathrm {t}}^{\ell }}}\right)=& \frac {1}{\sqrt {N_{\mathrm {t}}}}\left [{{e^{-\mathrm {j} \frac {2 \pi }{\lambda }\left ({{d_{\mathrm {t}}^{\ell }(1)-d_{\mathrm {t}}^{\ell }}}\right)}, \ldots, e^{-\mathrm {j} \frac {2 \pi }{\lambda }\left ({{d_{\mathrm {t}}^{\ell }\left ({{N_{\mathrm {t}}}}\right)-d_{\mathrm {t}}^{\ell }}}\right)}}}\right ]^{\mathrm {H}}, \tag {8}\\ \mathbf {a}_{\mathrm {r}}\left ({{\theta _{\mathrm {r}}^{\ell }, d_{\mathrm {r}}^{\ell }}}\right)=& \frac {1}{\sqrt {N_{\mathrm {r}}}}\left [{{e^{-\mathrm {j} \frac {2 \pi }{\lambda }\left ({{d_{\mathrm {r}}^{\ell }(1)-d_{\mathrm {r}}^{\ell }}}\right)}, \ldots, e^{-\mathrm {j} \frac {2 \pi }{\lambda }\left ({{d_{\mathrm {r}}^{\ell }\left ({{N_{\mathrm {r}}}}\right)-d_{\mathrm {r}}^{\ell }}}\right)}}}\right ]^{\mathrm {H}}, \tag {9}\end{align*}
The frequency-domain channel response is related to the time-domain response via Fourier transform (FT) as\begin{align*} \mathbf {H}[k]=& \sum _{u=0}^{N_{u}-1} \left ({{\mathbf {H}_{\mathrm {LoS}}^{u} + \mathbf {H}_{\mathrm {NLoS}}^{u}}}\right) e^{-j \frac {2 \pi k}{K} u}, \\=& \mathbf {H}_{\mathrm {LoS}}[k] + \mathbf {H}_{\mathrm {NLoS}}[k], \\=& \mathbf {H}_{\mathrm {LoS}}[k] + \sum _{\ell =1}^{L} g_{\ell } \mathbf {a}_{\mathrm {r}} \mathbf {a}_{\mathrm {t}}^{\mathrm {H}} e^{-j 2 \pi \frac {k B}{K} \tau _{\ell }}, \tag {10}\end{align*}
\begin{equation*} \mathbf {H}_{\mathrm {LoS}}[k]\left [{{n_{\mathrm {r}}, n_{\mathrm {t}}}}\right ] = g_{0}\left ({{f_{k}, d_{n_{\mathrm {r}}, n_{\mathrm {t}}}}}\right) e^{-j 2 \pi \frac {k B}{K} \tau _{n_{\mathrm {r}}, n_{\mathrm {t}}}}. \tag {11}\end{equation*}
D. Problem Formulation
Here, channel coherence time is assumed much longer than the symbol duration and is divided into two stages: training and data transmission. The assumption is valid since the symbol duration is on the order of picoseconds [2], while the channel coherence time spans milliseconds [45]. For the sake of simplicity, from here on the subcarrier index is dropped.
During the training phase, \begin{equation*} \mathbf {y}_{m}=\mathbf {C}_{m}^{\mathrm {H}}\mathbf {H}\mathbf {x}_{m}+\mathbf {C}_{m}^{\mathrm {H}}\mathbf {n}_{m}, \tag {12}\end{equation*}
Vectorizing the channel matrix, we obtain the linear system as\begin{align*} \mathbf {y}_{m}=& \underbrace {\left ({{\mathbf {x}_{m}^{\top }\otimes \mathbf {C}_{m}^{\mathrm {H}}}}\right)}_{\boldsymbol {\Phi }_{m}} \mathbf {h}+ \mathbf {C}_{m}^{\mathrm {H}}\mathbf {n}_{m} \\=& \boldsymbol {\Phi }_{m} \mathbf {h}+\mathbf {C}_{m}^{\mathrm {H}}\mathbf {n}_{m}, \tag {13}\end{align*}
\begin{align*} \underbrace {\left [{{\begin{array}{c} \mathbf {y}_{1} \\ \mathbf {y}_{2} \\ \vdots \\ \mathbf {y}_{N_{p}} \end{array}}}\right ]}_{\tilde {\mathbf {y}}}= \underbrace {\left [{{\begin{array}{c} \boldsymbol {\Phi }_{1} \\ \boldsymbol {\Phi }_{2} \\ \vdots \\ \boldsymbol {\Phi }_{N_{p}} \end{array}}}\right ]}_{\boldsymbol {\Phi }} \mathbf {h} + \underbrace {\mathrm {diag}\left ({{\mathbf {C}_{1}^{\mathrm {H}},\ldots,\mathbf {C}_{N_{p}}^{\mathrm {H}}}}\right)}_{\boldsymbol {\Psi }} \underbrace {\left [{{\begin{array}{c} \mathbf {n}_{1} \\ \mathbf {n}_{2} \\ \vdots \\ \mathbf {n}_{N_{p}} \end{array}}}\right ]}_{\tilde {\mathbf {n}}}. \tag {14}\end{align*}
Simplifying the expression, we obtain the linear system used for CE as\begin{equation*} \tilde {\mathbf {y}}=\boldsymbol {\Phi } \mathbf {h}+\boldsymbol {\Psi }\tilde {\mathbf {n}}, \tag {15}\end{equation*}
Conventional methods can be used to solve the linear system. The LS solution can be given as [46]\begin{equation*} \hat {\mathbf {h}}_{\mathrm {LS}}=\boldsymbol {\Phi }^{\dagger }\tilde {\mathbf {y}}, \tag {16}\end{equation*}
\begin{equation*} \hat {\mathbf {h}}_{\mathrm {LMMSE}} = \mathbf {R}_{h h} \boldsymbol {\Phi }^{\mathrm {H}} \left ({{\boldsymbol {\Phi } \mathbf {R}_{h h} \boldsymbol {\Phi }^{\mathrm {H}} +\boldsymbol {\Psi } \mathbf {R}_{nn} \boldsymbol {\Psi }^{\mathrm {H}} }}\right)^{-1} \tilde {\mathbf {y}}, \tag {17}\end{equation*}
Deep Compressed Sensing for Thz Channel Estimation
In this section, we propose a novel CE framework, DCS, based on a task-aware generative neural network. First, we provide the core idea and inference process of DCS. Then, we propose a novel training algorithm Targeting the drawbacks of GAN estimators.
A. Structural Constraint via Neural Networks
Conventional CS schemes utilize the channel sparsity in the angular domain and impose a structural constraint to the channel matrix to successfully solve the underdetermined system. However, their performance is limited, since the reconstruction is highly dependent on how well the actual AoAs, AoDs, and distances match the constructed grid. Second, THz system may operate in the near-field region due to the joint effect of wavelength and array aperture [47], resulting in an additional dimension in channel modeling, which expands the search space of the dictionary and increases the algorithm execution time. Third, the SV model may not accurately capture LoS characteristics, causing a higher estimation error [16]. Although in [16], the authors propose a LoS-NLoS-separated estimation algorithm, the additional computation for estimating the LoS channel makes the inference time even more critical. Finally, some unknown properties of THz channels are still not yet captured by the mathematical model; thus, the performance in practice is even worse. However, these properties could be learned by DL models from the data.
The idea of DCS is to replace the sparsity constraint with the structural constraint imposed by a generative neural network, e.g., variational autoencoder (VAE) and GAN, which provides a mapping from a latent representation to the signal space. Thus, instead of requiring sparse signals, the neural network implicitly constrains its output in a low-dimensional manifold via its weights and biases learned from the data [48]. In other words, the channel estimate belongs to a space defined by a neural network \begin{equation*} \hat {\mathbf {h}} = G_{\vartheta }\left ({{\mathbf {z}}}\right), \tag {18}\end{equation*}
B. Inference
In the context of CE, the neural network \begin{align*} \hat {\mathbf {z}}=& \underset {\mathbf {z}}{\arg \min }\left \|{{\tilde {\mathbf {y}}-\boldsymbol {\Phi }G_{\vartheta }\left ({{\mathbf {z}}}\right)}}\right \|_{2}^{2} \\=& \underset {\mathbf {z}}{\arg \min }\,{\mathcal {L}}_{\vartheta }\left ({{\tilde {\mathbf {y}}, \boldsymbol {\Phi },{\mathbf {z}}}}\right), \tag {19}\end{align*}
\begin{equation*} \hat {\mathbf {h}} = G_{\vartheta }\left ({{\hat {\mathbf {z}}}}\right). \tag {20}\end{equation*}
C. Training
The main goal of training is to learn the channel distribution from a dataset of real THz channels. We first introduce GAN training preliminary before we present our training method.
1) GAN Preliminaries
A GAN can be trained to learn the channel distribution from a channel dataset. Training of GAN involves a competition between two networks, i.e., a generator G that converts a noise source into a fake sample and a discriminator/critic D to differentiate genuine and generated samples. The discriminator D is trained using both true and fake channel samples to encourage D to discriminate against them. The generator G is trained to produce higher quality samples to fool the discriminator into classifying the fake samples as valid. Through this alternating training process, both networks progressively improve, leading G to generate increasingly realistic channel samples. Formally, the training involves an adversarial game of the following min-max problem as\begin{equation*} \min _{G} \max _{D} \underset {\mathbf {h} \sim \mathbb {P}_{r}}{\mathbb {E}}\left [{{\log \left ({{D(\mathbf {h})}}\right)}}\right ]+\underset {\tilde {\mathbf {h}} \sim \mathbb {P}_{g}}{\mathbb {E}}\left [{{\log \left ({{1-D\left ({{\tilde {\mathbf {h}}}}\right)}}\right)}}\right ], \tag {21}\end{equation*}
While GAN-based CE has demonstrated superior performance over traditional techniques [23], [24], [25], it faces several critical limitations. The primary drawback is its poor run-time efficiency, as GAN training is not optimized for CS tasks. Reconstruction with a GAN estimator is slow and sensitive to the initial latent vector that is sampled from a known distribution. Reconstruction of a single sample typically requires hundreds to thousands of gradient descent steps with multiple random restarts [38], making it impractical for THz CE where channel coherence time is limited to milliseconds. Furthermore, GAN training suffers from computational inefficiency, with substantial resources devoted to training a discriminator that is discarded after training. The model’s ability for CE is also difficult to evaluate during training, as the adversarial loss only indicates the discriminator’s perceived similarity between learned and real distributions.
To address limitations while maintaining the advantages of structural constraints by generative neural networks, we propose to integrate the CS framework into the training process. This approach enables the neural network to not only learn the THz channel distribution but also to facilitate rapid inference by jointly optimizing the latent vector and training the generator.
2) Deep Compressed Sensing
We propose that training the latent optimization process in (19) could enhance the run-time efficiency while maintaining an equivalent level of estimation accuracy. This approach involves back-propagating through gradient descent to update the model parameters
Algorithm 1 DCS Training Algorithm
Require: Initial generator parameters
while
for
Make noiseless measurements of the channel
Sample
for
end for
end for
Compute
end while
This approach incorporates a dual-loop framework designed to optimize both the latent vector z and the model parameters \begin{equation*} {\mathbf {z}} \leftarrow {\mathbf {z}}-\alpha _{1} \nabla _{\mathbf {z}} {\mathcal {L}}_{\vartheta }\left ({{\tilde {\mathbf {y}}, \boldsymbol {\Phi },{\mathbf {z}}}}\right), \tag {22}\end{equation*}
\begin{equation*} \mathbf {z} \leftarrow \mathbf {z}/\|\mathbf {z}\|_{2}. \tag {23}\end{equation*}
The outer loop updates the model parameter \begin{equation*} {\mathcal {L}}_{G}=\mathbb {E}_{\mathbf {h} \sim \mathbb {P}_{\mathrm {r}}}\left \{{{\rm {NMSE}\left [{{\mathbf {h},G_{\vartheta }\left ({{\hat {\mathbf {z}}}}\right)}}\right ]}}\right \}, \tag {24}\end{equation*}
However, merely minimizing (24) would fail since the generator would exploit the measurement matrix \begin{equation*} \left ({{1-\delta }}\right)\|\mathbf {h}\|_{2}^{2} \leq \|\boldsymbol {\Phi } \mathbf {h}\|_{2}^{2} \leq \left ({{1+\delta }}\right)\|\mathbf {h}\|_{2}^{2}, \tag {25}\end{equation*}
\begin{align*} {\mathcal {L}}_{F}=\mathbb {E}_{\mathbf {h}_{1}, \mathbf {h}_{2}\sim \left \{{{\mathbb {P}_{\mathrm {r}}, \mathbb {P}_{\mathrm {g}}}}\right \}}\left [{{\left ({{\left \|{{\boldsymbol {\Phi }\left ({{\mathbf {h}_{1}-\mathbf {h}_{2}}}\right)}}\right \|_{2}-\left \|{{\mathbf {h}_{1}-\mathbf {h}_{2}}}\right \|_{2}}}\right)^{2}}}\right ], \tag {26}\end{align*}
\begin{equation*} \vartheta \gets \vartheta - \alpha _{2} \nabla _{\vartheta }\left ({{{\mathcal {L}}_{G} + {\mathcal {L}}_{F}}}\right), \tag {27}\end{equation*}
The overall DCS algorithm for THz CE is given in Algorithm 2. During the offline phase, we train the DCS model using a THz channel dataset. Then, the model is uploaded to the Rx for online channel inference without local training.
Algorithm 2 Channel Estimation via DCS
Require: Channel dataset.
Train DCS model using Algorithm 1
Extract the model
for Each channel coherent time do
Sample
for
end for
end for
3) Remarks
The proposed CE framework learns the channel distribution via a generative model that captures the underlying statistical properties of THz channels. More importantly, by constraining the number of optimization steps T to be small, we enforce an implicit optimization of the generator’s latent space. In contrast, the latent space of a GAN is not optimized specifically for CE tasks, resulting in unpredictable convergence behavior with varying numbers of optimization steps and high sensitivity to initial points. Moreover, our model waives the training of a discriminator, which typically consumes the majority of computing resources in GAN training while being discarded during the inference phase. Lastly, our training loss function provides insights into how the model performs in CE tasks since the NMSE is directly incorporated into the optimization objective, indicating the estimation accuracy. It is hard to predict the performance of a GAN during the training phase simply because the adversarial loss only indicates the discriminator’s perceived similarity between real and learned channel distribution, without directly measuring the estimation accuracy.
Performance Evaluation
In this section, we evaluate the proposed DCS framework in terms of convergence, estimation error, and run-time efficiency. The estimation error is evaluated using NMSE given as\begin{equation*} \mathrm {NMSE}=\frac {\sum _{k=0}^{K-1}\|\mathbf {h}[k]-\hat {\mathbf {h}}[k]\|_{2}^{2}}{\sum _{k=0}^{K-1}\|\mathbf {h}[k]\|_{2}^{2}}. \tag {28}\end{equation*}
A. Dataset Preparation
Since there are no datasets in standards yet, several THz channel simulators or community dataset can be adopted to obtain training datasets, e.g., TeraMIMO [8], Remcom Wireless Insite [53], and DeepMIMO [54].
In this paper, we adopt TeraMIMO, a statistical THz channel simulator built on the measurement data in [55], [56] for low THz band. The main simulation parameters are summarized in Table 2. In our setup, the Tx and Rx antenna aperture \begin{equation*} \frac {2\left ({{D_{\mathrm {t}}+D_{\mathrm {r}}}}\right)^{2}}{\lambda } = 1.35\,\mathrm {m}. \tag {29}\end{equation*}
We use TeraMIMO to generate the channel dataset in the frequency domain, where the channel realization is of size \begin{align*} \mathbf {H}\left [{{:,c,:,:}}\right ] \gets \frac {\mathbf {H}\left [{{:,c,:,:}}\right ]-\mathrm {mean}~\left [{{c}}\right ]}{\mathrm {std}~\left [{{c}}\right ]}, \forall c = 1, 2,\ldots, 2K, \tag {30}\end{align*}
\begin{align*} \mathrm {mean}\left [{{c}}\right ]=& \frac {\sum _{n_{\mathrm {t}}}^{N_{\mathrm {t}}}\sum _{n_{\mathrm {r}}}^{N_{\mathrm {r}}}\sum _{n}^{N}\mathbf {H}~\left [{{n,c,n_{\mathrm {r}},n_{\mathrm {t}}}}\right ]}{N_{\mathrm {t}}N_{\mathrm {r}}N}, \tag {31}\\ \mathrm {std}\left [{{c}}\right ]=& \sqrt {\frac {\sum _{n_{\mathrm {t}}}^{N_{\mathrm {t}}}\sum _{n_{\mathrm {r}}}^{N_{\mathrm {r}}}\sum _{n}^{N}\left |{{\mathbf {H}\left [{{n,c,n_{\mathrm {r}},n_{\mathrm {t}}}}\right ]-\mathrm {mean}\left [{{c}}\right ]}}\right |^{2}}{N_{\mathrm {t}}N_{\mathrm {r}}N-1}}. \tag {32}\end{align*}
To generate the measurement matrix
B. Neural Network Architecture & Training
We use deep convolutional GAN (DCGAN) [59] for the generator and discriminator and provide a general neural network structure that can be used for different input sizes. The neural network architecture for the generator G and discriminator D is summarized in Table 3, following standard notation in PyTorch. In image processing tasks, a
The optimized hyperparameters in our experiment are summarized in Table 4. We use Adam optimizer [49] for all optimization processes. Both models use a learning rate of 0.1 for the latent optimization and 0.0002 for the model update. Although we use the same batch size for both models, the GAN model is trained for more epochs because it takes longer to converge according to empirical results. For GAN training, we update the discriminator (critic) 5 times for every single generator update
All experiments are performed in Ibex computing clusters at King Abdullah University of Science and Technology (KAUST) with an AMD EPYC 7713P 64-core CPU and an NVIDIA A100 GPU. The dataset is generated in a MATLAB (R2022a) environment, while the training and testing are implemented in a PyTorch environment.
C. Numerical Results
1) Latent Dimension
To evaluate the pure compression capability of our generative model, we first perform experiments at high SNR (40 dB), where measurement noise has a negligible impact. This quasi-noiseless setting allows us to isolate and analyze the fundamental trade-off between latent dimension and reconstruction accuracy. Figure 3 shows the relationship between NMSE and compression ratio
NMSE of DCS as a function of pilot length for various latent dimensions:
2) Optimization Steps
We investigate the impact of optimization steps T on the gradient descent process during inference. As shown in Figure 4, increasing the number of steps from 5 to 20 improves NMSE across all SNR regimes, indicating better convergence to the optimal latent vector. However, beyond 20 steps, the improvement diminishes or even degrades at low SNR. To better understand this performance degradation beyond 20 steps at low SNR, we examine the convergence trajectories for a testing sample using 40 steps at SNR
3) Pilot Length
To evaluate pilot efficiency, we examine the performance of our model with varying pilot lengths during inference while maintaining a fixed training pilot length of
Figure 6 demonstrates both the strong generalization capacity and pilot efficiency of our approach across different pilot lengths. Even at low SNR of −5 dB, the model exhibits good performance with
In the following experiments, we use the latent dimension of 100, the number of optimization steps of 20, and the pilot length of 100 as default unless otherwise stated.
4) Random Restarts
We show an example of channel inference using DCS and GAN in Figure 7. We test both models with five random restarts, while each restart in GAN and DCS takes 100 and 20 steps, respectively. It is clear from Figure 7 that GAN estimator is highly dependent on the initial point. While the second, third, and fifth starting points can result in a NMSE of around −13 dB, the inference from the first and fourth initial points exhibits an extremely high estimation error of around 0 dB. This demonstrates why random restarts are necessary for GAN estimator, i.e., the impact of bad initial points is detrimental. On the other hand, our model barely correlates with the initial points, with the NMSE for all restarts showing no measurable difference. Thus, we claim that our model does not need random restart, which is a key to reducing the total number of steps.
An example NMSE trajectory with 5 random restarts for GAN and DCS, showing GAN estimator is highly dependent on initial points.
For the sake of fairness, we compare our DCS estimator to GAN with different configurations, as shown in Table 5. The average NMSE is tested under SNR = 0 dB. With identical configuration (20 steps and 1 trial), GAN achieves −1.61 dB NMSE, which is practically unusable compared to −14.33 dB for DCS. Although increasing both steps per restart and the number of restarts can enhance GAN’s performance, we observe diminishing returns beyond 50 steps with 10 restarts (500 total steps). The improvement from 500 to 1000 total steps is merely 0.11 dB, which does not justify doubling the computational cost. Therefore, we adopt the configuration of 50 steps and 10 restarts for GAN in subsequent experiments. Even with this optimized setting, GAN‘s performance (−11.75 dB) still falls short of DCS by approximately 2.6 dB, despite using 25 times more steps. Even with just 10 steps, DCS achieves −12.82 dB, surpassing GAN by 1 dB while using only 2% of the steps. The results clearly demonstrate that DCS not only provides better estimation accuracy but also does so with remarkably fewer steps compared to GAN-based estimator.
5) Estimation Accuracy
We compare our proposed DCS estimator with several benchmarks: standard LS, LMMSE with estimated second-order statistics, OMP with on-grid angular dictionary, and pre-trained GAN estimator with 10 random restarts and 50 steps per restart. Our model is evaluated with 20 steps without random restart. As shown in Figure 8, our method demonstrates superior performance across all SNR ranges from −20 dB to 15 dB. Consistent with [19], OMP with on-grid angular dictionary performs poorly for THz channels, especially in the near field, showing worse performance than LMMSE. Both LS and OMP lag significantly behind, with nearly 20 dB higher NMSE at low SNR and over 12 dB gap at high SNR. Our method outperforms LMMSE by approximately 8 dB in NMSE, while avoiding the high computational complexity of matrix inversion in LMMSE. While the GAN achieves similar NMSE at −20 dB SNR, our method shows increasing advantages as SNR improves, maintaining at least a 2.5 dB lead starting from −5 dB SNR. Notably, since both GAN and DCS use identical generator neural network architectures, the computation per step is equivalent, and this superior performance is achieved with only 20 optimization steps compared to GAN’s 500 steps, demonstrating both better accuracy and higher efficiency.
6) Computation Complexity
We summarize the computation complexity of the aforementioned CE algorithms in Table 6. The computation complexity of DCS model in the online inference stage mainly comes from computing the gradient in (22). The gradient can be written as\begin{equation*} \nabla _{\mathbf {z}}\left \|{{\tilde {\mathbf {y}}-\boldsymbol {\Phi }G_{\vartheta }\left ({{\mathbf {z}}}\right)}}\right \|_{2}^{2} = -2\left ({{\tilde {\mathbf {y}}-\boldsymbol {\Phi }G_{\vartheta }(\mathbf {z})}}\right)^{\top }\boldsymbol {\Phi }\nabla _{\mathbf {z}}G_{\vartheta }\left ({{\mathbf {z}}}\right). \tag {33}\end{equation*}
To compute the forward pass, i.e., compute
From this complexity analysis, traditional LS and LMMSE methods exhibit cubic complexity
7) Training Convergence
The convergence performance of the proposed DCS estimator is shown in Figure 9 and Figure 10 by plotting the training and testing loss as a function of the iteration count and the NMSE performance of the model for various training epochs. One epoch refers to a complete training cycle using the entire training dataset. Considering 5000 training data points and a batch size of 100, one epoch in Figure 10 consists of 50 iterations in Figure 9. As shown in Figure 9, the training and loss align well, indicating that there is no overfitting problem for our model. Furthermore, our proposed model presents accelerated training convergence compared to GAN estimator. With only 15 training epochs, DCS already achieves equivalent NMSE to the fully converged GAN estimator. DCS reaches optimal performance around epoch 150 and maintains stable performance thereafter, while GAN requires 600 epochs and exhibits larger ongoing fluctuations. This significantly reduces training overhead compared to a GAN estimator, with 4 times faster convergence. The fast convergence is due to the task-aware nature of our proposed model, i.e., our model learns fast inference within a limited number of steps during the training process. In contrast, the GAN training only focuses on learning the distribution of the channel dataset, setting the inference task alone, and making its computation expensive. The slow convergence of GAN is also due to the massive computation resources spent training a discriminator, which is unfortunately useless after training.
Another remark of our proposed model is the introduction of a more meaningful loss function related to the inference task, where a lower loss indicates better model performance. However, as shown in Figure 11, the baseline GAN model does not have a task-specific loss due to the adversarial game between the generator and the discriminator. The adversarial losses merely indicate the dynamics between the generator and discriminator during training, but do not directly reflect the channel estimation accuracy. This is evident when comparing Figure 11 with Figure 10, where despite the fluctuating adversarial losses, the NMSE performance steadily improves and converges to −11.56 dB after 600 epochs. This disconnect between the adversarial training objectives and the actual channel estimation performance highlights a limitation of the GAN approach, as the model optimization is not directly guided by the channel estimation accuracy metrics.
8) Generalization Capability
In [23], the generalization capability of GAN estimator was evaluated through testing with varying numbers of clusters and rays. While our dataset inherently incorporates random numbers of clusters and rays, we extend the generalization analysis to two additional out-of-distribution (OOD) scenarios: measurement distribution shift and channel distribution shift.
For measurement distribution shift, as shown in Figure 6, we train our model with
It is worth noting that OOD generalization is not a critical concern for our approach. As a model-free estimation scheme, the generative model learns directly from data without requiring explicit channel models. This means the model can be efficiently retrained when deployment scenarios differ significantly from the training distribution, providing a flexible solution for practical applications.
Conclusion
In this paper, we propose DCS model for THz UM-MIMO CE. Unlike GAN-based estimator, our model is tailored for CE tasks and inherently learns fast inference without random restart by jointly optimizing latent dimension and network parameters. The proposed DCS model has superior performance compared to conventional techniques, with at least 8 dB lower NMSE. In addition, our model promisingly overcomes the detrimental problems of the GAN-based channel estimator, while showing even better performance. First and foremost, DCS provides around 3 dB lower estimation error compared to GAN-based estimator, with one order of magnitude lower computation complexity. Second, we design an informative loss function that allows easier model evaluation during training, addressing a common challenge in GAN. Finally, our model empirically achieves 4 times faster training convergence compared to GAN. In future work, we plan to extend our framework to include end-to-end system performance metrics such as bit error rate (BER) through hardware-aware co-design, considering practical aspects of beamforming and decoding schemes in resource-constrained systems.