Terahertz-Band Channel and Beam Split Estimation via Array Perturbation Model

For the demonstration of ultra-wideband bandwidth and pencil-beamforming, the terahertz (THz)-band has been envisioned as one of the key enabling technologies for the sixth generation networks. However, the acquisition of the THz channel entails several unique challenges such as severe path loss and beam-split. Prior works usually employ ultra-massive arrays and additional hardware components comprised of time-delayers to compensate for these loses. In order to provide a cost-effective solution, this paper introduces a sparse-Bayesian-learning (SBL) technique for joint channel and beam-split estimation. Specifically, we first model the beam-split as an array perturbation inspired from array signal processing. Next, a low-complexity approach is developed by exploiting the line-of-sight-dominant feature of THz channel to reduce the computational complexity involved in the proposed SBL technique for channel estimation (SBCE). Additionally, based on federated-learning, we implement a model-free technique to the proposed model-based SBCE solution. Further to that, we examine the near-field considerations of THz channel, and introduce the range-dependent near-field beam-split. The theoretical performance bounds, i.e., Cramér-Rao lower bounds, are derived both for near- and far-field parameters, e.g., user directions, beam-split and ranges. Numerical simulations demonstrate that SBCE outperforms the existing approaches and exhibits lower hardware cost.


I. INTRODUCTION
The definition of the THz band varies among different IEEE communities.While IEEE Terahertz Technology and Applications Committee focus on 0.3 − 3 THz band, the IEEE Transactions on Terahertz Science and Technology Journal studies 0.3 − 10 THz [1,2].On the other hand, 0.1 − 10 THz band has been used for THz band in the recent studies on wireless communications including a large overlap with millimeter wave (mmWave), e.g., 0.03 − 0.3 THz [1].Furthermore, the Federal Communication Commission (FCC) has designated the frequency bands of 116−123 GHz, 174.8−182GHz, 185 − 190 GHz and 244 − 246 GHz for the unlicensed use for the future spectrum horizon [3].
Compared to its mmWave counterpart, the THz-band channel exhibits several THz-specific challenges that should be taken into account.These unique THz features, among others, include high path loss due to spreading loss and molecular absorption, shorter transmission rage, distance-dependent bandwidth and beam-split (see, e.g., Fig. 1) [1,4,5].Furthermore, the THz channels are extremely-sparse.Hence, they are usually characterized as line-of-sight (LoS)-dominant and non-LoS (NLoS)-assisting models [1,[6][7][8][9][10][11].In comparison, the mmWave channel involves both LoS and NLoS paths, which are significant while LoS path is dominant in THz model [12].In addition, to compensate the severe path loss, analogous to massive multiple-input multiple-output (MIMO) array in mmWave design, ultra-massive (UM) MIMO array configurations are proposed for THz [2,13].The UM-MIMO design comprises huge number of antennas which are denselypositioned (e.g., 5 × 5 cm 2 ) due to extremely short wavelength [5].As a result, the usage of dedicated radio-frequency (RF) chain for each antenna element involves extreme hardware cost and labor.The hybrid design configurations, i.e., joint usage of analog and digital beamformers seems to be major possible solution as it was first envisioned for mmWave massive MIMO systems in 5G [14][15][16][17][18].In order to exploit low cost system design, the hybrid architecture involve a few (large) number of digital (analog) beamformers.Although the digital beamformers are subcarrier-dependent (SD), the analog phase shifters are designed as subcarrier-independent (SI) components.This causes beam-split phenomenon, i.e., the beams generated by the analog beamformer according to a single frequency split at different subcarriers and they look at different directions (Fig. 1) because of ultra-wide bandwidth and large number of antennas [7,19].In mmWave, at which the subcarrier frequencies are relatively closer than THz, beam-squint is the term broadly used to describe the same phenomenon [12,20,21].The beam-split affects the system performance and causes severe degradations in terms of spectral efficiency, normalized mean-squared-error (NMSE) and bit-error-rate (BER).For instance, the beam-split is approximately 4 • (0.4 • ) for 0.3 THz with 30 GHz (60 GHz with 1 GHz) bandwidth, respectively for a broadside user [5].Fig. 1b shows beam-split (in degrees) with respect to the carrier frequency for different bandwidths.Large beam-split occurs at lower frequencies since the bandwidth is relatively larger with respect to the carrier frequency, and it becomes smaller as the carrier frequency increases.With the aforementioned THz-specific features, THz channel estimation is regarded as an even more challenging problem than that of mmWave.In prior studies, THz channel estimation has been investigated in [19,[22][23][24][25][26][27][28][29][30].However, most of these works either ignore the effect of beam-split [22][23][24][25]31] or consider only the narrowband case [26][27][28], without exploiting the ultra-wide bandwidth property, the key reason of climbing to THz-band.Further to that, an machine learning (ML)-based SBL architecture used in [32] for mmWave channel estimation in the presence of beam-squint assumes SI steering vectors to construct the wideband channel.Hence, its performance is limited and, indeed, no performance improvement was observed by enhancing the grid resolution.In fact, the beam-split compensation requires certain signal processing or hardware techniques to be handled properly.
Despite the prominence of THz channel estimation, in the literature, there are only a few recent works on beam-split mitigation.The existing solutions are categorized into two classes, i.e., hardware-based techniques [29] and algorithmic methods [19,30].The first category of solutions consider employing time delayer (TD) networks together with phase shifters to realize virtual SD analog beamformers to mitigate beam-split.In particular, [29] devises a generalized simultaneous orthogonal matching pursuit (GSOMP) technique by exploiting the SD information collected via TD network hence achieves close to minimum MSE (MSE) performance.However, these solutions require additional hardware, i.e., each phase shifter is connected to multiple TDs, each of which consumes approximately 100 mW, which is more than that of a phase shifter (40 mW) in THz [5].The second category of solutions do not employ additional hardware components.Instead, advanced signal processing techniques have been proposed to compensate beam-split.Specifically, an OMP-based beam-split pattern detection (BSPD) approach was proposed in [19] for the recovery of the support pattern among all subcarriers in the beamspace and construct oneto-one match between the physical and spatial (i.e., deviated due to beam-split in the beamspace) directions.Also, [33] proposed an angular-delay rotation method, which suffers from coarse beam-split estimation and high training overhead due to the use of complete discrete Fourier transform (DFT) matrix.
In [30], a beamspace support alignment (BSA) technique was introduced to align the deviated spatial beam directions among the subcarriers.Although both BSPD and BSA are based on OMP, the latter exhibits lower NMSE for THz channel estimation.Nevertheless, both methods suffer from inaccurate support detection and low precision for estimating the physical channel directions.
Due to short transmission range in THz, near-field sphericalwave models are considered for THz applications [5,34].In far-field model, the transmitted signal arrives at the users as plane-wave whereas the plane wavefront is spherical in the near-field if the transmission range is shorter than the Fraunhofer distance [34,35].Thus, the near-field effects should be taken into account for accurate channel modeling.In [34], THz near-field beamforming problem has been considered and TD network-based approach was proposed while the THz channel was assumed to be known a prior.In a different line of research, the beam-squint problem is investigated in [36] for the massive MIMO system with low resolution analog-digital converters.
In this paper, THz channel estimation in the presence of beam-split is examined.By exploiting the extreme sparsity of THz channel, we first approach the problem from the sparse recovery optimization perspective.Then, we introduce a novel approach to model the beam-split as an array perturbation as inspired by the array imperfection models, e.g., mutual coupling, gain-phase mismatch, in array signal processing context [37][38][39].The array perturbation model allows us to establish a linear transformation between the nominal (constructed via physical directions) and actual (beam-split corrupted) steering vectors.Next, we propose a sparse Bayesian learning (SBL) approach to jointly estimate both THz channel and beam-split.The SBL method has been shown to be effective for sparse signal reconstruction from underdetermined observations [40][41][42].Compared to other existing signal estimation techniques, e.g., multiple signal classification (MUSIC) [43] and the p -norm (0 ≤ p ≤ 1) techniques such as compressed sensing (CS) [44], SBL outperforms these techniques in terms of precision and convergence [45].Although SBL has been widely used for both channel estimation [39] and directionof-arrival (DoA) estimation [38,46], the proposed SBL-based channel estimation (SBCE) approach differentiates from the these studies by jointly estimating multiple hyperparameters, e.g., physical channel directions and beam-split as well as array the perturbation-based beam-split model.The proposed SBCE approach is advantageous in terms of complexity since it does not require any additional hardware components as in TD-based works, and it exhibits close to optimum performance for both THz channel and beam-split estimation.Furthermore, the estimation of beam-split allows one to design the hybrid beamformers in massive/UM MIMO systems more accurately in a simpler way.The main contributions of this work are summarized as follows: 1) We propose a novel approach to model beam-split as an array perturbation, and design a transformation matrix between the nominal and actual steering vectors in order to estimate both THz channel and beam-split.2) An SBL approach is devised to jointly estimate the physical channel directions and beam-split.In order to reduce the computational complexity involved in SBL iterations, we exploit the LoS-dominant feature of THz channel and design the array perturbation matrix based on a single unknown parameter, i.e., the beam-split.Thus, instead of designing the array perturbations via a full matrix with distinct elements, we consider a diagonal structure, that can be easily represented via only beamsplit knowledge.3) We also propose a model-free approach based on federated learning (FL) to ease the communication overhead while maintaining satisfactory NMSE performance.Different from our preliminary work in [30], we consider SBCE method for data labeling, hence it is called sparse Bayesian FL (SBFL).4) We investigate the near-field considerations for THz transmission, and derive the near-field beam-split, which is both range-and direction-dependent. 5) In addition, the theoretical performance bounds for both physical channel directions and beam-split estimation are examined and the corresponding Cramér-Rao lower bounds (CRBs) have been derived.
Paper Organization: The rest of the paper is organized as follows.In Sec.II, we present the array signal model and formulate the THz channel estimation problem.Next, our proposed SBCE approach is introduced in Sec.III.The nearfield model for beam-split and the proposed SBFL approach are given in Sec.IV and Sec.V, respectively.We present extensive numerical simulations in Sec.VI, and finalize the paper with conclusions in Sec.VII.

II. SIGNAL MODEL
We consider a downlink scenario for multi-user wideband MIMO system with OFDM (orthogonal frequency division multiplexing) employing M subcarriers.We assume that the base station (BS) employs N T antennas together with N RF RF chains to communicate with K single-antenna users.Hence, the data symbols, is constructed in the time domain by employing an M -point inverse DFT (IDFT).Then, the SI analog beamformers F RF ∈ C NT×NRF (N RF = K < N T ) are used to process the cyclicprefix (CP)-added signal.The analog beamformers, which are SI and realized with phase-shifters, have constant-modulus constraint, i.e., F = M K enforces the transmission power.We can define the N T × 1 transmitted signal as which is then received at the kth user as where w k [m] ∈ C corresponds to the additive white Gaussian noise (AWGN), and we have w k [m] ∼ CN (0, µ 2 ).

A. THz Channel Model
The THz channel is modeled by combination of a single LoS path together with a few NLoS paths, which are weak due to reflection, scattering and refraction [1,[6][7][8].In addition, the ray-tracing techniques for THz communication also assume the extreme sparsity of the THz channel and state that the channel is dominated by the LoS component for the graphenebased nano-transceivers [1] while the other channel models, e.g., 3GPP channel model [47,48] are also widely used for THz transmission.Then, the THz delay-d MIMO channel is given in time domain as where α l k ∈ C denotes the complex path gain and L is the total number of paths.The time delay of the lth path is denoted by τ k,l , for which l = 1 corresponds to the LoS path, and p(•) denotes the pulse-shaping function for T s -signaling.Also, a(ϑ k,l ) ∈ C NT denotes the array steering vector corresponding to the physical direction for the lth path of the kth user ϑ k,l = sin θk,l with θk,l In particular, for a uniform linear array (ULA) 1 , the steering vector a(ϑ k,l ) ∈ C NT is defined as where λ c = c0 fc , for which c 0 is speed of light and f c denotes the carrier frequency.Furthermore, d = λc 2 the halfwavelength array element spacing.Performing M -point DFT of the delay-d channel in (3) yields where D ≤ M is the CP length [22,49].Define the mth subcarrier frequency as , where B is the bandwidth.
Due to large bandwidth in THz systems, the channel direction ϑ k,l becomes SD in spatial domain since f m = f c [7,19,50].Therefore, the physical direction ϑ k,1 deviates to the spatial direction θ k,m,l , which is defined as Notice that ϑ k,l = θ k,m,l if f m = f c , i.e., the narrowband scenario.Therefore, we define the channel vector of the kth user at the mth subcarrier in frequency domain as where a (θ k,m,l ) ∈ C NT denotes the SD array steering vector under the effect of beam-split and it is defined as Then, the channel model in ( 7) can be expressed in a compact form as where In the following lemma, we show that the beam-split causes deviations in the channel directions such that maximum array gain is achieved at the beam-split deviated directions. 1 Although ULA is considered in this work, the proposed approach can straightforwardly be extended for other array geometries, e.g., uniform rectangular array (URA), array-of-subarray (AoSA) [1,13] or group-of-subarrays (GoSA) [2].
Lemma 1.Let a (θ m ) and a(ϑ) be the actual and nominal steering vectors for an arbitrary physical direction ϑ and subcarrier m ∈ M as defined in ( 8) and ( 4), respectively.Then, a(θ m ) achieves the maximum array gain, i.e., Notice that both spatial and physical directions converges as θ k,m,l ≈ ϑ k,l if f m ≈ f c , as shown in Fig. 1(c).This allows one to design SI analog beamformers (F RF ) for m ∈ M in conventional wideband mmWave systems [12,21,51].However, beam-split implies that the physical directions ϑ k,l deviate and completely split from the spatial directions θ k,m,l as the system bandwidth widens, as shown in Fig. 1(b).Hence, we define the beam-split as the difference between the spatial and physical directions as which only depends on the frequency ratio fm fc and physical direction ϑ k,l .

B. Problem Formulation
Due to the impact of beam-split, the spatial channel directions differ at each subcarrier.Therefore, a subcarrier-wise channel estimation is required to accurately estimate the channel.In downlink, the channel estimation stage is performed simultaneously by all the users during channel training.Since the BS employs hybrid beamforming architecture, it activates only a single RF chain in each channel use to transmit the known pilot signals during channel acquisition [7,19,22].Then, the BS employs P beamformer vectors as Hence, the pilot signals of P frames are used to reconstruct the channel.By combining the received signals at the singleantenna users for p = 1, • • • , P , the P × 1 received signal at the kth user is where we have Defining the identical pilot signals for all subcarriers, i.e., B = F[m] and S[m] = I P [29,49,52], (12) becomes Estimating the channel from ( 13) is performed via the traditional least squares (LS) and MMSE techniques as where } is the channel covariance matrix.The estimators in ( 14) may need additional priori information, e.g., covariance matrix or require P ≥ N T , which can be computationally prohibitive for the computation of matrix inversions as well as entailing high channel training overhead and time for large arrays.Further to that, the LS approach does not take into account the effect of beam-split, and prior information on the channel covariance is needed in MMSE estimator.
Hence, our goal is to efficiently estimate h k [m] for m ∈ M in the presence of beam-split when the number of received pilot signals is small.To this end, we exploit the sparsity of the THz channel and introduce an SBL-based approach in the following.

III. SBL FOR THZ CHANNEL ESTIMATION
The SBL method has been shown to be effective for sparse signal reconstruction.In this section, we first present the sparse THz signal model and introduce the proposed array perturbation model and an efficient approach for joint THz channel and beam-split estimation.

A. Sparse THz Channel Model
Due to employing UM number of antennas to compensate the significant path loss in THz frequencies, the channel is extremely-sparse (L N T ) in the beamspace [6,13].In order to exploit the sparsity of THz-band, the channel in ( 7) is expressed as where NT to represent the SI beamformer matrix at the BS.Since there is a single RF chain at the user, the received pilot data during P frames are stacked in a P × 1 vector as By exploiting the sparsity of x k [m], the following 1 -norm sparse recovery (SR) problem is formulated, i.e., where the residual term is bounded by ≤ µ P + κ √ 2P , for which κ corresponds to an adjustable tuning parameter in order to control the corruption due to noise [37,44].One can solve (17) for the L-sparse vector xk [m], and the channel can be reconstructed as ĥSR which does not include beam-split correction.

B. Array Perturbation Based Beam-Split Model
Due to beam-split, the true physical channel directions are observed with deviations in the spatial domain at different subcarrier frequencies since frequency of the digital beamformer changes while the frequency of the analog beamformer remains the same.Therefore, conventional channel estimation approaches fail and the effect of beam-split should be taken into account.To accurately mitigate the effect of beam-split, we approach the problem from an array calibration point of view, wherein the beam-split is modeled as an array perturbation, e.g., mutual coupling, gain/phase mismatch [37][38][39].Define U k [m] ∈ C NT×NT as the array perturbation matrix which maps the nominal array steering matrix, i.e., A k to A k [m] which is perturbed due to SD beam-split.Then, the channel model in ( 7) is where provides a linear transformation between the steering matrices corresponding to physical and spatial directions as Now, our aim is to rewrite the received signal y k [m] as the linear combination of receiver output corresponding to the physical channel directions and some perturbation term.Therefore, we first write where Qk [m] is an N T × N 2 T matrix with the following structure, i.e., where , which includes the array perturbation terms in U k [m].Now, we can rewrite the received signal model in ( 16) by using (20) and (21) as Now, our aim is to rewrite the received signal model with the combination of the nominal steering vectors (i.e., BA k ) and the array perturbation parameters (i.e., u).
The perturbed array formulation in (23) explicitly shows the relationship between the received signal y k [m] and the perturbation parameters corresponding to the beam-split u k [m].Hence, one can minimize the fitting error between y k [m] and Pk xk [m] to determine the physical channel directions as well as estimating the perturbation due to beam-split, as will be introduced in the following.

C. SBL
In the proposed SBL framework, we rewrite (23) in an overcomplete form.Hence, only in this part, we first drop the subscript (•) k and [m] for notational simplicity, and discuss the channel estimation stage for multi-user multi-subcarrier case later.Next, we use Q ∈ C P ×N 2 T instead of Qk [m], and define P = BF ∈ C P ×N as the overcomplete version of Pk ∈ C P ×L covering the whole angular domain with φ n , n = 1, • • • , N .Also, we have P = BUF ∈ C P ×N , i.e., the overcomplete version of the perturbed steering matrix 23) is expressed as follows where x, u and w are N × 1, N 2 T × 1 and P × 1 vectors, respectively, as defined earlier with removed subscripts.Now, we introduce a new set of variables σ = [σ 1 , • • • , σ N ] T as the variance of sparse vector x ∼ CN (0, Σ), where Σ = diag{σ}.Then, we derive the statistical dependence of the received signal y on the unknown parameters, i.e., the sparse vector x, perturbation parameter u, and the noise variance µ 2 .Toward this end, we look for the maximum likelihood estimate of these parameters to reconstruct the channel vector from the support of x as ĥ = Dx.
The likelihood function of the received signal in ( 24) is By using (25), we can write the probability density function (pdf) of y with respect to the hyperparameters σ, u and µ 2 as where Π y ∈ C P ×P is the covariance of y as and R y = yy H ∈ C P ×P .Define the unknown parameters in a hyperparameter set S = {σ, u, µ 2 }.Then, S is estimated by maximizing the pdf in (26), which is non-concave and computationally intractable due to the nonlinearities.Hence, we resort to the expectationmaximization (EM) algorithm, which iteratively converges to a local optimum [38][39][40][41][42].Each EM iteration comprises two steps: E-step for inference, and M-step for hyperparameter estimation by maximizing the Bayesian expectation of the complete probability p(y, x, σ, u, µ 2 ).Specifically, the E-step involves the evaluation of the log-likelihood of the complete probability p(y, x, σ, u, µ 2 ), i.e., L y, x; σ, u, µ 2 .Then in the M-step, it is maximized to estimate the hyperparameters {σ, u, µ 2 }.This is done by first computing the partial derivations of L y, x; σ, u, µ 2 with respect to each of the hyperparameters and equating them to zero.Finally, we obtain the closed-form expressions to update each of the hyperparameters as discussed below.
1) E-Step (Inference): In the E-step of the SBL algorithm, the log-likelihood of the probability p(y, x, σ, u, µ 2 ) is evaluated, i.e, L y, x; σ, u, µ 2 = E{ln p(y, x; σ, u, µ 2 )}, (28) which can be written as L y, x; σ, u, µ 2 = E{ln p(y|x; u, µ 2 ) + ln p(x; σ)}, (29) since p(y|x; u, µ 2 ) and p(x; σ) do not depend on σ and {u, µ 2 }, respectively [38,39].Then, the first term of ( 29) can be obtained from (25) as and the second term in ( 29) is given by since x ∼ CN (0, Σ) [53].Combining ( 30) and ( 31), ( 28) becomes 2) M-Step (Hyperparameter Estimation): Now, we consider estimating the hyperparameter set S by maximizing L{y, x; σ, u, µ 2 }.To this end, we first compute the partial derivatives of (32) with respect to the unknown parameters as where n = 1, • • • , N .By setting the above derivatives to zero, we can get the following expressions, i.e., The expression E{ y − P x 2 2 } in (36) can be written as by utilizing the posterior density of x is p(x|y) ∼ CN (z, Π), Furthermore, the (i, j)th element of the first expectation term in (38) is computed as where M i ∈ C P ×NT denotes the derivative of BU with respect to u i as [46].In other words, the ith column of Q is given by Further, using (24), we can compute E{xx H } in (40) as Then, (40) becomes Also, the ith element of the expectation E Q H (y − Px) in ( 38) is obtained as T .Finally, we write the expectation-free expressions for the EM algorithm in closed-form as for n = 1, • • • , N and i, j = 1, • • • , N2 T .By computing ( 46)-( 48) iteratively, one can estimate the parameters σ, u and µ 2 .Since the EM algorithm is proved to be convergent [38][39][40][41], most of the terms in σ and x converge to 0, yielding to a sparse profile.

D. Low-Complexity Approach for Array Perturbation Update
Updating u involves the computation of (Q (38) which are computationally prohibitive due to large number of unknown parameters, i.e., N 2 T , especially when the number of antennas is high.Therefore, we propose a low-complexity approach in the following.
Instead of employing an N T × N T full matrix U k [m] in (18) to represent the array perturbations, we exploit the LoSdominant feature of the THz channel, we assume that L = 1 by neglecting the NLoS paths, which are approximately 10 dB weaker than the LoS paths [1,8].This allows us to model where a k [m] ∈ C NT and a k ∈ C NT correspond to the perturbed and nominal array steering vectors for the LoS paths, respectively.In (49), corresponds to the perturbation due to beam-split and its ith entry is defined as Notice that this approach involves only a single unknown parameter, i.e., ∆ k [m] to perform the update from a k [m] to a k [m] whereas N 2 T parameters are involved in (38).Furthermore, the transformation in ( 49) also allows us to accurately estimate the beam-split as presented in the following lemma.Lemma 2. Let θ k,m be the spatial channel direction, then the beam-split introduced at the mth subcarrier is uniquely recovered as where Proof: Consider the steering vector a(θ k,m ).Then, using (11), the ith entry of a(θ k,m ) is given by Now, we find the angle of ] T } to ensure not losing information due to the [−π, π] periodicity of the exponential 2 .Using Ω k [m] and (11), the ith element of the steering vector corresponding to the physical direction Substituting ( 53) and ( 52) into (49 Then, taking average of Lemma 2 allows us to implement the SBL iterations with significantly lower complexity.We present the algorithmic steps of SBCE in Algorithm 1.In particular, the SBCE is  Π y (t) = P (t) Σ (t) P (t) H + µ 2 (t) I P .9: 10: 11: 12: Update σ 13: Find φ (t) 14: Construct a(φ Update C (t) as a( fm fc φ Update P (t) = BC (t) D.

17:
Update 55)-( 60), find the refined directions θk .24: Beam-split estimate: k [m] involves the array perturbations due to beam-split and providing a linear transformation between the nominal and actual steering vectors as a( fm fc φ Once the SBCE converges, the beam-split and physical direction θk are estimated.Then, the channel support from ( 15) is reconstructed as In the following, we discuss refining the estimated physical directions.

E. Refined Direction Estimation
The estimation accuracy of the direction estimates obtained via the support of x is subject to the angular resolution of the selected grid, i.e., ρ = 1 N .Hence, once the EM algorithm terminates, we use the resulting estimates and search for the refined directions by employing a finer grid [38,39].Define the finer grid for the lth physical direction ϑ l as where the grid interval is 2ϕ.Then, we rewrite the covariance of the received signal Π y as where g (ϑ l ) = BCa(ϑ l ) ∈ C P and η l is the power of the lth signal.Π y\l denotes to the covariance matrix in ( 27) by excluding the columns of P corresponding to ϑ l .By substituting ( 55) into ( 26), we can obtain the refined directions by maximizing (25), wherein we substitute η l and ϑ l as { θl , ηl } = arg max η l ,ϑ l f (η l , ϑ l ), where f (η l , ϑ l ) is for which the equivalent minimization problem is given by { θl , ηl } = arg min η l ,ϑ l f (η l , ϑ l ), where To simplify (57), we first set the derivative ∂ f (η l ,ϑ l ) ∂η l to zero, which yields and insert (58 where ġ (ϑ l ) = ∂g (ϑ l ) ∂ϑ l .Using (59), we can write the refined direction estimation problem as

F. Computational Complexity
The complexity of SBCE is mainly due to the matrix computations in Algorithm 1, which are ) and O(N P + P N 2 + P 2 N ) (Step 17), respectively.Hence, the overall complexity order is O(T P N (5N 2 + 3P 2 + P N 3 T + 2) + 2P 3 + 3N 3 ).In addition, the complexity of the direction refinement process in Sec.III-E is proportional to the number of grid points to compute (60).

IV. NEAR-FIELD CONSIDERATIONS
Due to operating at high frequencies as well as employing extremely small array aperture, THz-band transmission may encounter near-field phenomenon for close-proximity users.Specifically, the far-field model involves the reception of the transmitted signal at the users as plane-wave whereas the plane wavefront is spherical in the near-field if the transmission range is shorter than the Fraunhofer distance, i.e., R = 2G 2 fc c0 where G is the array aperture [4,54,55].For ULA, we have G = (N T −1) c0 2fc , for which the Fraunhofer distance becomes R ≈ 2fc for large N T .Compared to its far-field counterpart, the near-field model involves additional range parameter.Define r as the distance between the user and the array center at the BS.Then, the ith element of the steering vectors for ULA in near-and far-field are given by [34] [a NF (ϑ, r)] i = e −j2π fm c 0 (r (i) −r) (61) [a FF (ϑ)] i = e j2π d fm c 0 where r (i) is the distance between the user and the ith BS antenna as Applying Taylor series expansion to the right hand side of (63) [35,54], we get where the last term tends to 0 in the far-field scenario, i.e., e −j2π fm c 0 ((r−(i−1) dϑ)−r) = e jπ fm fc (i−1)ϑ .Now, we calculate beam-split in near-field.Using (64) and d = c0 2fc , we rewrite (61) as )) = e jπ(i−1)θr,m,i , where θ r,m,i denotes the distance-dependent near-field spatial channel direction as Then, the near-field beam-split is defined as Notice that the near-field beam-split also depends on the antenna index i with an additional distance-dependent term while it converges to far-field beam-split i.e., ∆ r,i [m] ≈ ∆[m] as the user distance increases since lim r→inf fmc0(i−1) cos 2 θ 4f 2 c r = 0.

V. SBFL FOR THZ CHANNEL ESTIMATION
Since the THz system involves the UM number of antennas, the channel estimation stage may be computationally complex, for which the data-driven techniques, such as ML and FL can be employed to ease the computational burden.Hence, we propose learning-based approach, i.e., SBFL, for THz channel estimation.While traditional ML method can also be considered, FL is communication-efficient since it does not involve dataset transmission [56].Therefore, we consider an FL-based training scheme to provide a communicationefficient solution for THz channel estimation.
Denote ξ ∈ R Q as the vector of real-valued model parameters, and define D k as the dataset of the kth user.By training the model ξ on D k allows us to construct the nonlinear function f (ξ) as stands for the number of samples for the kth dataset.Also, k , respectively, denote the input and output data for the ith sample of the kth dataset.Thus, we have

A. Training
To begin with, we construct the input-output data used for training.We define the input data by X k ∈ R P ×3 , which involves the real, imaginary parts and angle of the received pilot signal y k [m].Note that we use a threechannel input data for enhanced feature extraction performance [56,57].As a result, we have In addition, the channel estimate obtained via SBCE, i.e., ĥk [m], is used to construct the output data.Thus, we define the output as where i = 1, . . ., D k and k = 1 . . ., K. In (68), L k (ξ) = k || 2 F denotes to the loss function at the kth user and the wireless channel estimate at the kth user is given by f (X k .The FL problem in (68) is effectively solved via stochastic gradient descent algorithms, for which the learning model parameters ξ is iteratively computed by each user, and then aggregated at the cloud server via BS 3 .Therefore, we consider the model update rule for the jth iteration as for j = 1, . . ., J. Here, the gradients of the model parameter ξ j and ε is the learning rate.

B. Communication Overhead
Similar to the previous works, the communication overhead is defined as the amount of transmitted data during model training [57,59,60].In particular, the overhead of CL mainly due to the dataset transmission of K users whereas the overhead of FL involves the model parameter transmission of K users for J iterations.As a result, we define the overhead of CL and FL as T FL = 2QJK and

VI. NUMERICAL EXPERIMENTS
In this section, the performance of our SBCE technique is evaluated and compared with the state-of-the-art techniques such as LS and MMSE in (14), GSOMP [29], OMP, BSPD [19] and BSA [30].Note that MMSE is selected as benchmark since it relies on the channel covariance matrix at each subcarrier, hence it presents beam-splitfree performance.The NMSE is computed as NMSE = For SBFL, the FL toolbox in MATLAB is employed for data generation and training.The learning model is a CNN with 12 layers and Q = 603, 648 learnable parameters [30,57].Specifically, the first layer has the size of P × 3 for input.The {2, 4, 6, 8}th layers are convolutional layers, and they are constructed with 128 kernels of size 3 × 3.Each convolutional layer is followed by a normalization layer in the {3, 5, 7, 9}th layers.There is a fully-connected layer in the 10th layer of size 1024, which is followed by a dropout layer.The last 12th layer is the output layer of size 2N T ×1.During data generation, 100 channel realizations are generated.In order to provide robust performance against the imperfections in the receive data, we added AWGN on the input received pilot data for three signalto-noise-ratio (SNR) levels, i.e., SNR = {15, 20, 25} dB and 50 noisy realizations [57].The CNN is trained for J = 100 with the rate of ε = 0.001.Fig. 2 shows the NMSE performance of the competing algorithms with respect to SNR.We observe that the proposed SBCE approach attains better NMSE as compared to other algorithmic-based approaches, i.e., GSOMP, BSA and BSPD.The direct application of both OMP and LS yields poor NMSE since they do not involve any specific mechanism to mitigate beam-split.While BSPD and BSA are beam-split-aware techniques, they suffer from inaccurate support detection, which leads to precision loss as SNR increases.Although GSOMP employs SD dictionary matrices with TDs, SBCE exhibits lower NMSE than that of GSOMP.This is mainly due to two reasons that SBCE performs better support recovery than GSOMP, which is an OMP-based algorithm [45,53,61], and GSOMP does not include refined direction estimation stage.In order to assess impact of refined direction estimation, Fig. 3 shows the channel estimation performance with coarse and refined direction estimates.During simulations the same coarse dictionary is employed for OMP, GSOMP and SBCE.Then, refined direction estimation method in Sec.III-E is run based on the estimate obtained from each algorithm.We can see that no significant improvement is obtained for OMP algorithm since it does not take into account the impact of beam-split.However, improved NMSE is achieved, especially in high SNR, for GSOMP and SBCE.Again, we obtain similar observations as in Fig. 2, that is, SBCE has better performance than that of GSOMP in both coarse and refined direction scenario thanks to its accurate support recovery.
In order to demonstrate the physical direction estimation performance, RMSE of the estimated path directions is presented in Fig. 4.While the proposed SBCE approach follows the CRB closely, the remaining methods suffer from grid error and yield approximately 0.1 • RMSE.
Comparing SBCE and SBFL, we observe that the latter , which exhibits approximately 11 times lower than that of CL.Thus, the proposed SBFL approach is an efficient tool especially for THz-related datasets, which usually involve large number of antennas.In Fig. 5, beam-split estimation RMSE is shown with respect to SNR.As it is seen, the proposed approach effectively estimates the beam-split and follows the statistical lower bound CRB, which is derived in Appendix C, with a slight performance gap (e.g., ∼ 0.01 • in 0 dB).The channel estimation techniques, i.e., GSOMP, BSPD and BSA, however, do not have the capability to obtain the beam-split.
Fig. 6 shows the channel estimation NMSE against bandwidth for the interval of [1,30] GHz at 300 GHz carrier frequency.We can see that all of the algorithms provide approximately 0.02 NMSE for the bandwidth B < 7 GHz.However, the performance of LS and OMP degrades as the bandwidth widens since these method do not involve beamsplit mitigation.Also, BSPD yields a slight NMSE loss at wide bandwidth while BSA has robust performance.On the other hand, the proposed SBCE method enjoys robustness against the increase of the bandwidth.
Finally, we investigate the deviation from far-field model in terms of transmission range and frequency.Fig. 7   various frequencies and corresponding Fraunhofer distances.As it is seen, the NMSE curve crosses the Fraunhofer distance approximately at 0.0013 for all frequencies at different ranges.Specifically, NMSE crosses the Fraunhofer distance at 32 m for f c = 300 GHz, which shows that the near-field model should be considered for r < 32 m when N T = 256.Note that this distance is smaller if rectangular arrays are used.For instance, a 16×16 URA has the array aperture of G = 16 √ 2 d leading to R = 0.256 m, which yields much safer use of farfield model in shorter distance.Fig. 7(b) illustrates the channel estimation NMSE with respect to transmission range when farfield model is used in GSOMP and SBCE.We observe that relatively large error for r < R due to mismatch between the far-and near-field steering vectors.Although fixing the direction information may yield less error [29], which only due to range-mismatch, the mismatch in the range also causes inaccurate direction estimates.

VII. CONCLUSIONS
In this work, we proposed an SBL-based approach for joint THz channel and beam-split estimation.The proposed method is based on modeling the beam-split as an array perturbation.We have shown that the proposed SBCE approach effectively estimates THz channel as compared to existing methods without requiring additional hardware components such as TD networks.The SBCE technique is also capable of effectively estimating the beam-split.This is particularly helpful to design the hybrid beamformers more easily with the of beam-split.Also, a model-free technique, is introduced to realize the from ML perspective for computationaland communication-efficiency.Furthermore, we examined the near-field considerations of the THz channel and evaluate the performance of SBCE with far-field model.The estimation of the near-field THz channel is one particular problem that we reserve to study for future work.

APPENDIX A PROOF OF LEMMA 1
The array gain varies across the whole bandwidth as By using ( 8), ( 70) is rewritten as The array gain in (71) implies that most of the power is focused only on a small portion of the beamspace due to the power-focusing capability of ξ(a), which substantially reduces across the subcarriers as |f m − f c | increases.Furthermore, |ξ(µ m )| 2 gives peak when µ m = 0, i.e., f c θ m − m ϑ = 0. Thus, we have θ m = η m ϑ, which completes the proof.
APPENDIX B DERIVATION OF (39) In order to obtain (39), we utilize the Bayes rule to obtain the posterior density of x as p(x|y) ∼ CN (z, Π) where the mean and the variance are and respectively [38,62].Now, E{ y − P x 2 2 } is rewritten as In order to compute E{(x − z)(x − z) H }, we first obtain E{zz H } by utilizing (72) as where Π from (73) can be written via applying matrix inversion lemma as

APPENDIX C CRAM ÉR-RAO LOWER BOUND
In order to the CRB, we assume a single user case, i.e., K = 1 without loss of generality.The CRB expressions can easily be extended for the K > 1 case by using the decoupledness of the unknown parameters of multiple users.Define the unknown parameter vector as the physical directions and beam-splits corresponding to L paths.Hence, we have a 2L×1 unknown vector as Consider the log-likelihood function of the joint pdf in (26) with respect to v as In order to obtain the CRB, we first need to find the Fisher information matrix (FIM), which measures the amount of information contained for the unknown variables.Let FIM ∈ R 2L×2L be the FIM.Then, the (i, j)th entry of FIM is calculated as the second derivative of L(v) as Since E{R y } = Π y , (85) is rewritten as which is the general FIM expression for Gaussian signals.
Then, by following the steps in [37,63], the CRB expressions corresponding to the (i, j)th entries of the FIM is given by where M = Σ H A H Π −1 y A Σ ∈ C L×L , where Σ is an L × L matrix comprised of signal powers.K ij ∈ C L×L includes the derivative of the actual steering matrix A with respect to the unknown parameters, and it is given by In particular, the ith entry of the derivative of a (ϑ l ), i.e., the lth column of A with respect to the physical direction θl and beam-split ∆ l are respectively given by where we have via ( 8) and ( 11) that [a ] i = e jπ(i−1) fm fc sin θl = e jπ(i−1)(∆ l +sin θl ) .
In case of near-field scenario, the range parameters are also added to the unknown vector defined in (83), hence its size becomes 3L × 1.By following the steps from (84) to (90), the CRB for the near-field scenario can be derived, wherein the derivatives of the steering vector element in (65), i.e., [a NF ] i = e j(π fm fc (i−1)(ϑ l − c 0 (i−1) cos 2 θl 4fc r )) = e jπ(i−1)θr,m,i = e jπ(i−1)(∆ r,l +sin θl ) with respect to θl , r l and near-field beamsplit ∆ r,l are

Fig. 1 .
Fig. 1.THz-band characteristics: (a) path loss (in dB) due to molecular absorption for various transmission ranges, (b) beam-split (in degrees) with respect to the carrier frequency fc ∈ [50, 1000] GHz for different bandwidths, and (c) normalized array gain with respect to spatial direction at low, center and high end subcarriers for (Left) 3.5 GHz, B = 0.1 GHz; (Middle) 28 GHz, B = 2 GHz; and (Right) 300 GHz, 30 GHz, respectively.
Notation: (•) T and (•) H to represent the transpose and conjugate-transpose operations, respectively.For a matrix A; [A] ij and [A] k correspond to the (i, j)th entry and kth column while A † denotes the Moore-Penrose pseudo-inverse of A. A unit matrix of size N is represented by I N , ∇ represents the gradient operation, ξ(a) = sin N πa N sin πa is the Dirichlet sinc function, and Tr{•} stands for the trace operation.||•|| 0 , ||•|| 1 , || • || 2 and || • || F denote the 0 , 1 , 2 and Frobenius norms, respectively.

1 N
an overcomplete dictionary matrix composed of the steering vectors denoted by a(φ n ) ∈ C NT with φ n = 2n−N −for n = 1, • • • , N .Hence, the resolution of D is ρ = 1/N .Next, we assume that the users collects only P = N RF pilot signals, where L ≤ P N T .Further, we define B ∈ C P ×NT with |[B] i,j | = 1 √

2 ,
(a)  shows the normalized difference between near-and far-field steering vectors, i.e., with respect to transmission range for

Fig. 7 .
Fig. 7. (a) NMSE between near-and far-field steering vectors, and (b) channel estimation NMSE for the usage of far-field model with respect to transmission range.