Quadratic Displacement Operators—Theory and Application to Millimeter-Wave Channel Tracking

First, a family of quadratic displacement operators based on group Fourier Transform has been proposed for joint distribution analysis. Second, considering the quadratic displacement operators, a novel millimeter-wave massive MIMO channel tracking has been proposed in Time-Angle (TA) plane. Due to a poor scattering environment, millimeter-wave communication suffers from an ill-conditioned channel matrix. Computationally effective compressed sensing has been proposed to estimate a few dominant paths. These methods rely on orthogonal pilot signals to estimate the channel reliably. However, applying compressed-sensing solutions to the continuous channel leads to significant pilot overhead. To reduce the pilot overhead, one needs to consider the channel dynamics, particularly the spectral overlap between the channel samples. Considering that channel spectral properties can be represented as a joint time-angle distribution, we propose a novel quadratic displacement operator in TA-plane. Considering group Fourier transform in TA-plane, we show that the subderivative of an angular parameter is the dual group of the original angular signal. For a group transform defined over the finite Abelian group, with respect to the finite-dimensional antenna arrays, the purposed time-angle representation matches the cover space of manifold defined in the virtual channel model recommended by A. Sayeed. In TA-plane, the quadratic displacement operator maps old time-angle coordinate onto a new coordinate by taking to account the local spectral properties of the channel samples. By applying the purposed quadratic displacement operator to the Saleh-Valenzuela channel model, the time evolution operator for directional millimeter-wave channel has been derived. Assuming independently identically distributed multipath contribution, we show that channel Wigner distribution is the direct sum of the cluster-wise Wigner distribution. Accordingly, we have shown that the continuous linear time-variant channel transfer function can be represented as the Hadamard product between Wigner distribution of the channel and Saleh-Valenzuela model. Considering the fact that the density matrix is a Weyl correspondence of Wigner distribution, a union of subspaces method has been employed to construct the density matrix for channels samples with slightly different states. Numerical results have been presented to evaluate the performance of the proposed channel tracking method.


I. INTRODUCTION
T HE main challenge in millimeter-wave signal processing arises due to the sparsity of channel. While sub-GHz channels are multipath reach, directional communication schemes consist of few dominant multipaths. Consequently, millimeter-wave channel estimation turns to be a challenge because of its extremely ill-conditioned matrix representation. Considering large Transmitter (TX) and Receiver (RX) antennas, estimating all channel elements while it has been concentrated in a lower dimension is not efficient from computational and communicational points.
Compressed sensing-based solutions are recommended to overcome the sparsity challenge. However, they require orthogonal pilot transmission in each channel coherent block, which leads to computational and pilot overheads. If channel varies significantly, i.e., the Wide-Sense Stationary (WSS) and Uncorrelated Scattering (US) assumptions are satisfied over very short coherent blocks, then it needs to be estimated over shorter time and frequency intervals to keep beams between TX and RX aligned. Since wireless channels are continuous physical quantities, one can relieve channel estimation overhead by considering channel dynamics in terms of spectral overlap [1], [2]. The spectral overlap or correlation determines how fast eigenfunctions vary in TF-plane, and it can be used to obtain priori knowledge about channel states to update angular parameters, which reduces estimation overhead.
In this work, we propose quadratic displacement operators in TA-plane. The quadratic operators are applied to millimeter-wave channel to propose a novel channel tracking method. The notation is as in the following. ⊕ represents direct sum, and ⊗ is Kronecker product. ·, · is inner product, and || stands for vertical concatenation. For matrix A, A T denotes matrix transpose, A * is conjugate transpose, and vec (A) represents vectorized matrix. a T and a * are the transpose and conjugate transpose of vector a. The subscript c, l are used for cluster and subpath. The subscript t, and r specify a variable as T X and RX. The subscripts H refers to a channel, and H c refers to subchannel corresponding to cluster c. An operator is represented as A α and α is a displacement parameter. A superscript v, q, and w refer to virtual channel, quadratic operator, and Wigner distribution. Sets are included inside a curved brackets, such as {V }. A notation L 2 denotes square summable signal in the Hilbert-space. RF stands for Radio Frequency chain. SVD refers to Singular Value Decomposition. NMSE stands for Normalized Mean Square Error. SE refers to the Spectral Efficiency. DM and WD refer to density matrix and Wigner distribution. Gaus. denotes Gaussian transform.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ SRHT stands for Subsampled Randomized Hadamard Transform. In this letter, the expressions "tracking", "localization", "displacement" are used interchangeably.
In [3], the authors have recommended updating channel parameters using maximum likelihood. The last state of the channel is obtained by optimizing the log-likelihood function and applying the gradient ascent function. Although the recommended method is promising, its performance degrades rapidly at low SNR regions. The authors in [4] have recommended a separable channel tracking model where the channel is constructed as the product of priori estimated channel and time-variant steering vectors. The temporal variation of the steering vectors has been obtained by modeling AoD and AoA as autoregressive random processes. The value of convergence in mean square is computed locally to either perform channel estimation or tracking.
One can leverage the geometry of the RX trajectory. In [5], a geometry-stochastic approach has been recommended to model AoD and AoA as time-varying signals. The authors have derived a linear model for AoD and AoA by obtaining the slopes for certain RX trajectory. A similar approach has been adopted by the NYUSIM millimeter-wave channel simulator [6] to model spatial consistency for the 5G network. The linear variant model relies on deterministic geometric assumptions about the RX trajectory, which may not be satisfied at the tracking time.
Using a side-channel for channel tracking has been recommended for vehicular communication since the vehicle can afford to be equipped with additional hardware, such as GPS and radar. However, since a user device may not be equipped with the required additional hardware and software, a sidechannel-based solution may not be applicable for all scenarios. In [7], the authors have recommended beamforming by solving the state-space equation, where initial states are obtained via a GPS. The main drawback of this method is that additional steps are necessary to integrate GPS data into communication data, adding to the complexity of the problem. Joint radarmillimeter-wave channel estimation for indoor scenarios has been proposed in [8]. The authors have recommended probing the environment using a series of chirp signals to determine the reflection and delay profiles of the environment. Then, the best paths from the TX to RX are determined by classifying targets in the room. The recommended method's main challenge is its feasibility under the variety of reflection scenarios. In addition, feedback links are required to inform the TX and RX about the selected paths, which reduce the achievable spectral efficiency.
Fingerprinting approaches have been proposed in the literature to find optimum beams [9], [10]. Because of limited space, we do not discuss them in detail in this work.

B. Union of Subspaces
Conventionally, Nyquist sampling has been used to sample a signal that lies on a single subspace. However, in many applications such as multiband and directional communications, the signal lies in a lower-dimensional subspace that can be represented as the union of disjoint subspaces. For these signals, classical reconstruction of signal using Nyquist rate is not the most efficient sampling method. In [11], [12], a union of subspaces has been recommended to sample the signal using a low-dimension sampling operator that can almost perfectly reconstruct the signal. Using union of subspaces, a sparse space H with s dominant components and additional structure of statistically independent subspaces can be reconstructed as a direct sum of irreducible subspaces

C. Wigner Distribution, Weyl Symbol, and Density Matrix
The Wigner distribution defined in (2) is a joint distribution analysis tool widely used to study non-stationary signals W (t, f ) has some favorable properties. The most important property is finite support property, that is, for finite support signals, W (t, f ) is zero before and after the signal and does not scramble the spectral properties of the window and signal. The density matrix is an important class of positive definite operators that we use in section VII-B to compute Wigner distribution of a channel. Density matrix ρ ∈ L 2 -space is a probability distribution of the eigenfunctions of a system. It has been shown that the Wigner distribution has a Weyl symbol with respect to density matrix in the phase-space plane [13]. The Weyl correspondence of W (t, f ) with respect to density matrix is a projective operator where Θ (t) is a function quantized by the density matrix, and ψ Θ(t) = ρ, Θ is the set of basis in Hilbert-space Ψ obtained by projecting Θ (t) onto density matrix ρ.

III. MAIN IDEAS AND CONTRIBUTIONS
The main idea that we aim to answer in this work can be expressed in the following question. Can we design a measurement operator or equivalently learn a dictionary to update the current states of channel directly from the received signal? Although conventional compressed sensing algorithms such as Orthogonal Matching Pursuit (OMP) can estimate the current state of channel by projecting the received signal onto the sampling matrix, they use the entire spectrum of dictionary matrix. However, with a prior knowledge about the last states of a channel, one can predict a solution space that contains only the most possible spectrum. Consequently, one can decrease the computational and communicational overheads by avoiding unnecessary beamforming. The important point is the fact that the Doppler shift and changes in scatterer locations are reflected in the slope of angular function through

Observation 3.1 (Channel Dynamics and Tangent Bundles):
On the one hand, Θ (t) can be approximated locally using tangents such that local dynamics of channel can be modeled by the first subderivative operator D (1) θ for angular parameter θ at the neighborhood of a certain point. Then, we can assume that there exists a local solution space that contains all candidate tangents, as shown in Fig. 1a for Θ (t) by linearly approximating a path between θ 1 and θ 2 using the local tangents. Then, all points on line S 1 can be tracked using the best candidate tangent, as shown in Fig.1b. A set of all disjoint optimum tangents along all points on multipath angular functions provides tangent bundles of a channel. Consequently, a piece-wise linear approximation of the angular path using local tangents mitigates the channel estimation pilot overhead significantly such that the path between θ 1 and θ 2 can be modeled with eight estimations.
On the other hand, since the spatial frequencies lie in the column space of the dictionary matrix, local tangent bundles of Θ (t) are in correspondence with the eigenfunctions of channel. Knowing the possible tangents helps to measure support sets Ω AoD , and Ω AoA directly from the received signal. Then, channel tracking turns to determine the best tangent in a solution space. In order to find the optimum local tangent, channel needs to be estimated at two locations to determine the local dynamics of each angular function. In other words, infinitesimal generators need to be frequently estimated along the angular functions to find the optimum tangent bundles.
Observation 3.2 (Tangent Operators): Modeling the local dynamics of a system using tangents is not a new idea and has been known as the Taylor power series. In this work, we will show that localized parameters can be derived quadratically (5) enables us to define quadratic phase components in (28) and derive a family of quadratic displacement operators. Considering joint time-angle representation of the signal, we purpose to design quadratic displacement operators using a quadratic tangent operator. In particular, the quadratic modulation operator is represented by Wigner distribution, and it acts on a received signal to update support sets Ω AoD and Ω AoA . Unlike conventional compressed sensing algorithms such as Orthogonal Matching Pursuit (OMP), where the full spectral domain is projected on a measured signal to estimate a few dominant parameters, the purposed quadratic modulation operator projects received signal onto the solution space that contains the best tangent bundles to extract a few numbers of spectral components. This idea is formulated in the following Theorem.

Observation 3.3 (Subderivative Operators as Dual Groups):
Using group Fourier transform [14], we prove that the first-order subderivative operator is the dual group of the angular function Θ (t) with respect to modulation function m (t) = t. Equivalently, the local tangent bundles defined using infinitesimal generators are dual of the Θ (t) in a given manifold.
Theorem 3.1 (Channel Tracking Using Quadratic Localization Operators:) Let y ∈ L 2 R d be a measurement on channel transfer function H (t, f ). Also, let M w H be quadratic modulation operator acting on channel transfer function H (t, f ) displace covariantly in time-angle space with respect to displacement parameters (t, θ t , θ r ) → (t , θ t , θ r ) where θ t ∈ [−ι/2, ι/2] C and θ r ∈ [−ι/2, ι/2] C are TX and RX angular displacement, and C is the total number of i.i.d. multipath clusters. The warping operator W ρ is the unitary representation of a time translation of H (t, f ) with respect to τ ∈ R C , such that (W ρ H) ((t, f ) = H (t − τ , f). Then, a sequence of local extrema Ω AoD and Ω AoA are obtained as where M w y is a modified quadratic modulation operator that acts directly on the received signal y. Similarly, ρ y is a modified density matrix directly acts on y. Here, the expression directly is used to differentiate M w y that acts on the received signal from M w that acts on H (t, f ). Also, the notation refers to equivalent relation. M w H is the channel Wigner distribution for cluster c with respect to quadratic modulation operator where T and D are given in (36). Proof: The proof is given in section IV-B-(50), and section VII.

Theorem 3.2 (Continuous Linear Time-Variant Channel):
Let H (t, f ) be a channel transfer function. Also, let P q p , M w H , and W ρ be one-sided quadratic covariant displacement operator, a channel Wigner distribution, and a Warping operator, respectively. Considering time-angle displacement parameter p = (τ, θ), the continuous linear time-variant channel can be represented as the Hadamard product of the channel's Wigner distribution and Saleh-Valenzuela channel as In (8), denotes Hadamard produce, and ⊕ C−1 c=0 M w Hc W ρ c is the time evolutionary statement.
Proof: The proof is given in section IV-B and (49).

Observation 3.3 (Solution for Wigner Distributions):
In section VII, two solutions have been proposedwhere τ ∈ R C is a translation in time to compute the Wigner distribution of a channel. The first solution is based on the direct computation of Wigner distribution M w H from estimated AoD and AoA. In order to apply M w H to the received signal y, we purpose dimension matching using Subsampled Randomized Hadamard Transform (SRHT) SRHT matches dimension of M w H with y without changing the internal structure of the M w H . Accordingly, we will drive M w y,T X and M w y,RX that are acting on measurement vector y directly to track support sets Ω AoD and Ω AoA , respectively.
The second solution utilizes density matrix to compute ρ y,T X and ρ y,RX as equivalent but not equal operators of M w y,T X and M w y,RX . The density matrices ρ y,T X and ρ y,RX can be computed as direct sum of distinct eigenfunctions sets estimated at t − τ 1 and t + τ 2 .
Observation 3.4 (Non-Associative Property of Quadratic Displacement Operators:) In section IX, we prove that quadratic localization operators is non-associative and it is commutative in the lowest possible category of displacement. It will be proved that, the displacement operators developed in section IV-A (as a subcategory of quadratic operators) are commutative and associative for millimeter-wave channel with a short memory. However, at a higher category, quadratic operators obey internally and externally commutative property but not intra-commutative.
To the best of our knowledge, we are the first group to (1) derive Wigner distribution as a time evolution operator in the angle domain for the direction millimeter-wave communication and (2) formulate commutative and non-associative properties of quadratic localization operators.

A. Subderivative-Based Time-Angle Displacement Operators
We have employed tangent operators to approximate angular function Θ (t) at the neighborhood of certain coordinates (t, θ). In the following, generalized tangent approximation S (K) ρ,θ is defined as a polynomial segment that approximates K-times continuous differentiable function Θ (t) as a linear sum of subderivative operators D (K) θ computed locally for certain delay components. However, in the remaining of the paper, we will set K = 1. In the following discussion, H (t, f ) represents channel transfer function in Hilbert space L 2 C Nr×Nt for a wireless channel with TX and RX antennas of size N t and N r . ρ,θ acting on Θ (t) is defined as where D (K) θ is the subderivative operator representing group D := (D, ). For K = 1, (10) decays to (5). The resultant approximation error is directly proportional to the radius of a open neighborhood at a point t. Note that a set {D 1} defines a tangent bundle for a given angular function. Accordingly, the set of tangent bundles for C angular functions defines the tangent bundles for the antenna manifold.
In order to apply Definition 4.1 to steering vectors, directional cosine term is substituted with first-order tangent operator S (1) ρ,θ Θ (t) and the result has the following properties: where directional cosine is given as sin (θ). Fig. 2b shows examples for on a unit circle, S 1 . It follows that S 1 is the cover space of an angular function. For a multipath channel with C clusters, the cover space is a C-dimensional torus.
Proof: The modulation operator acting on H (t, f ) can be obtained as where m (t) = t is the modulation function, D and ψ T (m (t)) = t − τ are the homomorphisms for groups D and T, θ is the angular parameter, τ is time translation parameter, d is antenna spacing, λ is wavelength, and n is antenna element position. As depicted in Fig. 2, (12) has been localized at the certain point in TA-plane through intersect statement In general, ψ T (m (t)) and ψ D (θ) do not commute. Then, we need to include uncertainty statement as If time and angle variables commute, the uncertainty statement e j2πn d λ D (0) θ ρ2/2 disappears. For the underspread millimeterwave channel, τ max ν max 1 implies that M θ is commutative operator.
The exponential characteristic function is necessary to localize the tangent bundles at certain coordinate (τ, θ) as shown in Fig. 2. The principle bundle from [−ι/2, ι/2] into a unique point on S 1 is given as The domain of directional cosine sin (θ) needs to be defined over θ ∈ [−ι/2, ι/2], otherwise the identity and inverse elements on S 1 will not be unique. For a domain θ ∈ If the first tangent approximation error is negligible, modulation operation will be the unitary representation of the group Equivalently, we can say that (1) binary operation θ 1 θ 2 on [−ι/2, ι/2] is commutative, (2) inverse and identity elements are unique, and (3) where C is the number of i.i.d. multipaths, and M θi is The unitary representation for TX and RX in angle domain can be decomposed to the finite numbers of irreducible subspaces as where θ denotes either θ t or θ r , and U is the unitary DFT matrix. For doubly directional communication, a joint displacement in sparse angle domain is given as U r M θr M * θt U * t . Considering group Fourier structure of modulation operator, 1 Reader should note that m −1 indicates Fourier structure over finite field for steering vector. it follows that t represents signal domain and subderivative operator, D (1) θ , is the dual group of the Θ (t).

Lemma 4.2 (Warping Operator):
The warping operator W ρ corresponding to modulation function m (t) = t is a unitary representation of group (T, ) homomorphic to (R, +) through ψ T (τ ). Warping operator is equivalent to translation acting on H (t, f ) as Proof: Warping function with respect to displacement parameter τ ∈ T and modulation function m (t) = t is defined as If linear space H can be decomposed as independently identically distributed (i.i.d.) subspaces H 0 , H 1 , . . . , H C−1 , then each subspace can be described by restricted representation (16), W ρ is completely reducible and can be decomposed as Proof: where τ = t 0 + wτ r , t 0 refers to channel estimation time, τ r is channel tracking interval, and w ∈ [1, W ] is the tracking window index (refer to section VIII and Fig. 5).
In the next theorem, we show that the covariant displacement operator is a projective representation acting on H (t, f ). For this, we have adopted the following theorem from [16].
To prove composition law, let g 1 • g 2 = (τ 1 τ 2 , θ 1 θ 2 ), then we have where (a) is because (D, ) and (T, ) do not commute. If millimeter-wave channel varies slowly and channel experiences negligible delay-Doppler spread, i.e., τ max ν max 1, then the product statement D As shown in Fig. 3 and implied by (24), the representation error of G g is proportional to an area of triangle with sides D (1) θ and τ 2 , where D (1) θ is directly determined by instantaneous rate of variation of Θ (t) and has a range of −2/τ r ≤ D (1) θ = Δθ/τ ≤ 2/τ r . From (16) and (21), unitary and completely reducible covariant displacement operator G is obtained as a direct sum of restricted representations for covaraint displacement parame- Unitary equivalent of G g is formulated as

B. Expansion of Linear Operators to Quadratic Displacement Operators
This section expands tangent (10), modulation (11), warping (18), and covariant displacement (22) operators into their quadratic counterparts in (30), (31), and (38), (40), respectively. We have also proposed a one-sided quadratic covariant operator in (42) that models joint-distribution displacement as a product of quadratic modulation and warping operators. While quadratic modulation and warping operators respectively map marginal distributions onto angular and time spaces, the quadratic covariant displacement operator and its one-sided version map time-angle joint distribution onto new coordinates in the given joint time-angle distribution. Unlike their counterpart in section IV-A, the quadratic operators map old coordinates into new ones in a quadratic form. This leads to well-known Wigner distribution and provides a powerful tool to analyze joint-distribution signals.
Let assume that angular parameters of the channel are in quadratic form Θ(t) = at 2 +bt+c, where a, b, c ∈ R. At some arbitrary coordinate (t, θ), the first-order tangent operator is given in (5). In order to derive the quadratic tangent operator, let dt ρ . By substituting (5) into first order derivative, we obtain where Θ (±τ ) = D θ cannot be simplified further because of the localization restriction. We refer to this property as quadratic localization is not intra-commutative. We will discuss further the commutative and associative properties of quadratic operators in section IX. From (28), we define quadratic complex exponential function for joint time-angle distribution as where τ −1 = −τ and θ −1 = −θ are the inverse elements of groups T and D. Then, a class of quadratic tangent operators Q (1) η acting locally on Θ (t) is defined as where S (1) * ρ2,θ −1 is defined in (5). Note that because of the non-associative property of the (28) (refer to section IX), S The lack of association justifies defining quadratic operators over quasigroup. The quasigroup definition does not impose any additional structure onto the homomorphic relation ψ Proof: The important difference between the quadratic modulation operator M q θ and modulation operator M θ is that the former is defined on the non-associative quasiogroup while the latter is defined on the group. As a result of non-associative property, composition low for a quadratic modulation operator with four parameters A, B, A * , B * obeys the homomorphism e (A+B)+(A * +B * ) = e [A+B, . The parenthesis implies that the parameters A, B, A * , and B * cannot be interleaved.
For channel tracking, in general, we expect the tracking error to increase as the dynamics of channel increases. As shown in Fig. 3, the error is directly proportional to the ΔD = |D θ1 − D θ2 |. If the error in (33) approaches zero, i.e., if D (1) θ1 = D (1) θ2 , then channel tracking points are lying on the same tangent for Δτ = 0. In this work, we are interested in a scenario that D (1) If the error is large, Theorem 4.1 should be rephrased for distinct θ 1 and θ −1 2 as By taking the Fourier transform of (29) and setting τ 1 = τ 2 = τ/2, Wigner distribution for quadratic modulation operator is obtained as where D and T are Wigner distribution for quadratic modulation operators is obatined by extending (35) with respect to joint TX-RX angular parameters Θ t (t) and Θ r (t) as Accordingly, the unitary equivalent of (35) and (37) is defined as UM w θ U * . Theorem 4.2 (Quadratic Warping Operator): Quadratic warping operator as a projective representation of T 2 , + is acting on H (t, f ) from left with respect to modulation f (t) = t as Proof: Note the quadratic warping operator is the covaraince of the channel transfer function.  To prove composition law, let g = g 1 • g 2 = τ −1 1 τ 2 , θ θ −1 , then we can write where (a) is due error given in (33), (b) is because M θ and W ρ may not commute, and (c) is because as D (1) θ2 → D (1) θ1 , then the error goes to zero.
Theorem 4.4 (One-Sided Quadratic Covariant Displacement Operator:) A class of one-sided quadratic covariant displacement operators P q p for p = {τ, θ} as a unitary representation is acting on H (t, f ) as Proof: The proof is similar to the Theorem IV-B.

V. MILLIMETER-WAVE CHANNEL TRACKING USING QUADRATIC OPERATORS
This section applies quadratic displacement operators to track directional millimeter-wave channel. Assume that TX and RX are equipped with Uniform Linear Array antennas of size N t and N r . Let C N t N r and L be the number of clusters and subpaths in each cluster. Discrete Saleh-Valenzuela channel is expressed as the linear combination of i.i.d. multipaths In (43), τ c,l and ν c,l are the delay and Doppler shift per subpath. In addition, a t (θ t,c,l ) and a r (θ r,c,l ) are the TX and RX steering vectors that defined for a given desired directions θ = υ + θ as where θ stands for both θ t and θ r , and N is the array size. Let Θ t (t) = υ t (t) + θ t and Θ r (t) = υ r (t) + θ r be functions that denote the time-variant AoD and AoA. We update steering vectors by applying quadratic modulation operator M q θ from left to the steering vectors to obtain a time-variant steering vector. Finally, we derive a continuous linear time-variant channel model for doubly-directional channel where the steering vectors' time evolution characteristic is modeled via quadratic modulation operator in the TA-plane. From (31), we obtain modulated steering vectors in (45), where directional cosine terms are replaced by their correspondence quadratic tangent operators By expanding Q ρ −1 1 ρ2,θθ −1 Θ (t) and changing variables τ 1 → τ 1 /2 and τ 2 → τ 2 /2 (to be consistent with the time-frequency representation in the literature), we obtain for n ∈ [0, . . . , N − 1] where N = N t for TX and N = N r for RX. From (46) and (45), the time-variant steering vector is given as where parenthesis indicate the order of operation since the quadratic modulation operators are not intra-commutative. The adjoint operator of (47) is defined as M q * θ = M q −θ . Finally, we update discrete MIMO channel model in (43) using time-variant steering vector in (47) and its conjugate for RX and TX, respectively. The continuous linear time-variant channel transfer function is obtained as P q p H (t, f ) In (49), the subscript H c stands for subspace related to cluster c. P q p is a reducible representation that can be decomposed as the direct sum of irreducible restricted representation of subspaces. Note that the subspaces are disjoint, i.e., Then, the restricted Wigner distribution for joint and marginal TX-RX steering vectors per cluster c and subpath l are Hc maps (t, θ t,c,l , θ r,c,l ) to a new coordinate (t + τ, θ t,c,l + υ t,c,l , θ r,c,l + υ r,c,l ). For i.i.d. multipaths, (50)-(52) can be formulated as a direct sum of subspaces as M w It is possible to compute the Wigner distribution directly by estimating channel at t 0 − ρ1 2 and t 0 + ρ2 2 . Alternatively, one can compute the Wigner distribution using density matrix. Furthermore, we need to apply M w H directly to the received signal y.

A. Sparse Problem Formulation for Channel Estimation
The unitary transform of (43) in the angle domain can be obtained using DFT matrices U t and U r as Let F ∈ N t × N t rf and Z ∈ N r × N r rf denote RF processors for TX and RX a hybrid analog-digital beamforming networks with N t rf and N r rf RF chains. Considering OFDM like scheme, at each frame m, TX transmits N t rf pilot signal S m , each length of N s . The filtered received signal at frame m is obtained as the superposition of the transmitted N t rf × N s signals as where e m ∼ N (0, σ n ) is the zero mean noise component. We can apply Kronecker vectorization as vec (ABC) = C T ⊗ A vec (B) to (54) to obtain where ⊗ is Kronecker product. Substitute (53) into (55), we have We represents measurement matrix by φ m = S T m F T m ⊗ Z * m , fixed DFT dictionary matrix by Ψ = (U * t ⊗ U r ), and h v = vec (H v ). By assumeing block fading during the channel estimation stage, the concatenated filtered received signal for M pilots is given as where Φ = [φ 0 ||φ 1 || · · · ||φ M−1 ] and e = [e 0 ||e 1 || · · · ||e M−1 ] are vertically concatenated measurement matrices and noise vector, and || is the vertical concatenation.

A. Direct Solution for Wigner Distribution
In order to apply quadratic modulation operator to y ∈ M N r rf × 1, the dimensions of M w Hc ∈ N r × N t need to be manipulated by a proper ratio. For this purpose, y needs to be reshaped such that its row dimension can be reshaped into N r rf N where N = N t for TX and N = N r for RX. The dimension manipulation needs to preserve the action of the M w matrices. We employ Subsampled Randomized Hadamard Transform (SRHT) [17] Hadamard matrix, and D is N r ×N r diagonal matrix with i.i.d complex Gaussian random entries. For TX, the dimensions are similarly manipulate with respect to N t . The dimensions of M w Hc , and corresponding TX-RX Wigner distributions can be properly adjusted as The pair of tracked angular parameters Ω AoD , Ω AoA can be derived by projecting y onto M w t and M w r , which selects the s-strongest entries as Due to the Kronecker product, (59) generates multiple redundant samples of original Wigner distributions. This can be addressed by selecting the s-strongest entries from the first row of the projection matrix M w y y. While this is the main disadvantage of the proposed method, the computation complexity of dimensional manipulation is much less than conventional estimation algorithms such as OMP since the Kronecker product mostly performs multiplication by zeros.

B. Density Matrix
Knowing density matrix ρ, one can sample received signal directly to measure the eigenfunctions of channel. Density matrix has the following properties 1) ρ = ρ H , that is, density matrix is Hermitian.
2) ρ is positive semi-definite, that is, of eigenvalues corresponding to v t,i ∈ V t and v r,i ∈ V r . Then, TX and RX density matrices are defined as Since SVD are computationally costly, two SVDs can be combined to apply one SVD to full channel covariance matrix R = h v h * v . Then, density matrices in (61) represent the full channel and {V r } and {V t } are left and right singular vectors of the channel. Let consider a system that consists of two subsystems estimated at t + ρ 2 and t − ρ 2 . Then, density matrix needs to consider two subsystems, which we formulate in the following.

Definition 7.3 (Density Matrix for Two Channel Samples):
Considering the union of subspaces in Definition 7.2, TX and RX density matrices for combination of two subsystems {V 1 } and {V 2 } is defined as Dimension of the density matrices need to be manipulated to match dimension of y as Knowing density matrices, support sets are updated as

C. Channel Gain Estimation
The cluster gains α c related to each support is obtained using the least square approximation h Ω = Ψ H Ωtr Ψ Ωtr −1 Ψ H Ωtr y, where the Ω tr = (Ω AoA − 1) N r + Ω AoD is the set that contains the tracked supports.

VIII. CHANNEL ESTIMATION-TRACKING TIMING
The Channel estimation-tracking has two stages. At the first stage, channel parameters are estimated passing (57) to Orthogonal Matching Pursuit (OMP) [18] algorithm. In the channel tracking stage, channel parameters are directly measured by applying a density matrix to the second moment of the received signal. The proposed channel tracking method relies on spectral overlap between channel samples. Since the spectral overlap between channel samples disappears as the  distance between channel samples increases, channel estimation and tracking should be handled considering proper timing to preserve correlation between samples.
We recommend channel estimation-tracking timing shown in Fig. 4, where Wigner distribution and density matrices are constructed using estimated channels with a proper interval between them. Fig. 5 shows an example of channel estimation and tracking timing for an angular function. Note the channel tracking performance is determined by how well the tangent operator approximates the open neighborhood of a point t 0 .

IX. NON-ASSOCIATIVE PROPERTY OF QUADRATIC LOCALIZATION OPERATOR
The important takeover of this section is the fact that quadratic localization is internally and externally commutative but not intra-commutative, and it is non-associative. This implies that quadratic localization operators are defined for sets with either very simple structure, e.g., qausigroups, or they are defined for abstract magma. We are interested studying two classes of operators in particular, e A+B , and e (A+B)+(A * +B * ) . For simplicity, we use string notations A + B, and (A + B) (A * + B * ). We aim to show four key properties: (1) displacement operators in section IV-A are commutative, (2) quadratic localization operators are inter-commutative and externally commutative, (3) quadratic modulation operator is neither intra-commutative, nor (4) associative. For a system with two parameters A and B, non-commutative is denoted as  θ can not be canceled, that is, (65) implies an important conclusion that the quadratic localization is non-associative.

A. Simulation Setup
This section reports numerical results for the directional millimeter-wave channel tracking using quadratic displacement operators. The central frequency is set to 28GHz, and channel sampling interval is set to 4.46ms. The estimation is performed by sending orthogonal pilot signals in channel estimation stage and is then tracked through windows of 4 samples (≈ 20ms), and 32 samples (≈ 144ms). The TX and RX antenna sizes are set to N t = 64 and N r = 64, each with N t rf = N r rf = 4 RF chains. The number of clusters is fixed to 3. We examine two scenarios as shown in Fig. 6: (i) spatial consistency and (ii) fast chirp signal with a rate of 60 and the initial frequency of 30Hz. The fast chirp function imitates a high mobility scenario.

B. Spectral Efficiency
We evaluate the bandwidth utilization of the proposed channel tracking by measuring SE bits s Hz in dB as SE dB = log 2 det I Nt rf + Pt σn ZHF F * H * Z * , where I Nt rf is the identity matrix, P t is the received power, and σ n is the noise variance. Fig. 7a and 7b show the mean SE as a function of SNR for OMP, Density Matrix (DM in (63)-(64)), and Wigner distribution (WD in (59)-(60)) through Gaussian and SRHT dimension manipulations. We notice that DM provides the best performance with both Gaussian and SRHT. At 0dB, OMP and DM spectral efficiency reaches %79 and %70 of the perfect SE. At 10dB, the ratio of recovered SE increases to %87 and %77 of the perfect SE for OMP and DM, respectively. We also observe that adaptive and WD have SE substantially lower than OMP and DM. At 0dB, both adaptive and WD algorithms (either Gaussian or SRHT) can reach about %56 of the perfect SE. Fig. 7b shows SE as a function of tracking window W. Apparently, W = 32 slightly decreases SE compared to W = 4. We noticed that DM for W = 32 shows better SE than WD for W = 4. This is due to the fact that DM method is more noise-robust than WD because DM computation can be categorized as subspace methods such as Multiple Signal Classification (MUSIC), which are known to have optimum performance under noisy conditions.
Since channel tracking depends on the correctness of the estimated support sets, it should be evaluated as a function of the number of pilot signals. As shown in Fig. 7c, SE increases as the number of pilot signals increases with a slight slope. For 90 pilot signals and W = 4, OMP and DE achieve about %81 and %72 of the perfect SE. Obviously, increasing the number of pilot signals does not improve the tracking SE significantly, and a good SE has been achieved for DM with 60 pilot signals. Fig. 7 compares SE for W = 4 and W = 32. DM tracking SE for W = 32 follows approximately the same increasing slope of W = 4. Fig. 8 evaluates SE for spatial consistency and fast chirp at W = 4. As shown in Fig. 6b, fast chirp AoA function denotes the worst-cast channel dynamics by sweeping the full angular range rapidly in 4s. At 0dB, DM recovers %72 of the perfect SE for the spatial consistency. Nevertheless, for the fast chirp, recovered SE ratio decreases using DM to %27. As expected, the performance of quadratic displacement operators degrades as the dynamics of the channel increases. In practice, we may expect more moderate channel dynamic where DM demonstrate an acceptable SE performance. Obviously, channel tracking using quadratic displacement operators saves block resources by the order of W × #Pilot Signals. For 5604 channel samples and W = 4, pilot overhead decreases by %75 while it recovers %77 of the perfect SE. Obviously, Gaussian and SRHT generates the same results. For W = 4 at 0dB, NMSE of −0.5dB has been achieved. For W = 32, the NMSE increases to 0.6dB. Consistent with SE demonstrated in Fig. 7b, the quadratic displacement method has a better performance than the adaptive estimation-tracking.

C. Normalized Mean Square Error
We have noticed that the Gaussian and SRHT WD methods show NMSE results that do not match their SE performance. The reason for this observation is the variable spectral power of the estimated supports, i.e., the marginal distributions get weaker or stronger due to the death and birth procedure of the clusters. Since the order of the entries is critical to generating the correct product space of Ω AoD and Ω AoA , WD error does not reflect the actual error. Unlike WD, density matrix preserves the correct order in Ω AoD and Ω AoA because the generated product space inherits the joint-distribution through the Kronecker product of eigenfunctions. This is possible because the coordinates of the eigenfunction v i determine the location of the dominant pivots of Ω AoD and/or Ω AoA . To compute the correct joint-distribution of the Ω AoD and Ω AoA , one either has to have prior knowledge about the correct order of the support sets in the product space or the correct support order has to be obtained through eigendecomposition. Obviously, the former is not a practical assumption, and the latter is equivalent to a density matrix. Since SE inherently is computed considering the left and right singular vectors, SE reported for WD sounds reliable.

XI. DISCUSSION
We have noticed that at both low and high SNR regions, the SE loss due to tracking is marginal. For a time-variant channel with moderate dynamics, we expect channel tracking performance close to the OMP channel estimation. In order to observe the advantage of channel tracking using quadratic displacement operators, one needs to consider the pilot overhead reduction. We conclude that the proposed channel tracking methods can update the channel parameter while decreasing the pilot overhead without significantly penalizing SE.
To optimize the performance of the double directional channel tracking methods, the vector space metrics,such as SE, are more reliable than the coordinate-wise ones such as NMSE. The reason is that SE can be computed reliably using marginal distributions.

XII. CONCLUSION
We proposed the quadratic displacement operators for joint distribution analysis. In particular, we applied them to directional millimeter-wave channel tracking. We showed that the continuous linear time-variant millimeter-wave channel could be expressed as the Hadamard product of the Wigner distribution and Saleh Valenzuela channel. Two solutions were discussed to compute Wigner distribution of channel. First, Wigner distribution was computed directly from estimated channels. Second, we computed winger distribution using density matrix.
There is a great opportunity to utilize the quadratic displacement operators for dictionary learning by considering solution space. In addition, the proposed methodology can be extended to real-time integral path learning on manifolds. The applications of quadratic displacement operators include but are not limited to millimeter-wave and Tera-Hz communication, automotive radar, surface movement radar, precision approach radar, and in general, user localization at K bands radar. In particular, the proposed quadratic displacement operators can extrapolate the object location in short and mid-range radar systems. Also, the quadratic displacement operators can be used for surface imaging at millimeter-wave and Terahertz frequencies.