Tensor Dictionary Manifold Learning for Channel Estimation and Interference Elimination of Multi-User Millimeter-Wave Massive MIMO Systems

Millimeter Wave (mmWave) massive multiple-input multiple-output (MIMO) systems with hybrid analog-digital architectures can greatly increase system capacity and communicate with multiple users at the same time. Accurate channel estimation is crucial for multi-user communications, but its accuracy is limited as the number of antennas and users increases. In the matrix high-dimensional operation for multi-user channel estimation, not only is it computationally intensive, but also the estimation accuracy is low. It makes our work turn to channel estimation of the user group within a certain region to improve the accuracy of estimation. In this paper, we propose a tensor dictionary manifold learning method for channel estimation and interference elimination of the multi-user mmWave massive MIMO system. A multi-user digital-analog mixed received signal model is presented. The tensor dictionary manifold learning scheme is proposed to model the received signal as a third-order low-rank tensor to handle the high-dimensional user, antenna, and channel. After segmentation, clustering and manifold learning, multiple tensor dictionary manifold models containing a group of user signals are fitted. Tensor dictionary manifold learning can take advantage of the inherent multi-domain properties of signals in the frequency, time, code and spatial domains to maintain inter-user correlation within a user group while reducing the high-dimensional channels of the user group. Using the convex relaxation property of the tensor alternating direction method, we propose a strategy to eliminate interference from other groups. And with the help of the multi-signal classification method, the channel parameters of user groups are obtained to improve the accuracy of multi-user channel estimation. This method can perform channel estimation for multiple users with only a few pilots, and improve the performance of the system. Numerical results confirm the good performance of this method.


I. INTRODUCTION
Massive multiple-input multiple-output (MIMO) is the key technology of the future 5G wireless communication system [1]. It configures a large number of antenna elements in a small space to obtain large multiplexing gain [2] and improve the capacity of wireless communication [3]. Therefore, the massive MIMO system supports communication with multiple users. However, the communication between multiple users needs to ensure strict service quality and The associate editor coordinating the review of this manuscript and approving it for publication was Li Zhang. requires advanced signal processing techniques [4], [5], such as channel estimation, channel equalization and channel coding. Channel estimation is particularly important for massive MIMO systems. With the increase of antennas, accurate and effective channel estimation is a research hot and challenging problem [6], [7].
Channel estimation is one of the most critical issues in wireless communication, and its ultimate goal is to estimate the wireless channel as accurately as possible under a limited number of pilot signals. Under different channel models and wireless systems, such as multi-user massive MIMO [8], millimeter-wave (mmWave) massive MIMO [9] and the VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Internet of Things (IoT) [10], the challenges are significantly different. Therefore, people use different channel estimation methods to apply in different scenarios. By exploiting the hidden joint sparsity in the multi-user channel matrix, [11] proposed a joint orthogonal matching tracking recovery algorithm, in which the channel matrix shares some common scatterers and common sparsity among geographically adjacent users. Multiple signal classification (MUSIC) [12] and rotation invariance technology (ESPRIT) [13] are used to process array signals to improve the performance of channel estimation [14]- [16]. These search signal methods are based on the location information of the user source. [17] proposes an Interior Point (IP) aided orthogonal matching pursuit (OMP) algorithm. It significantly improves the channel estimation accuracy by reducing the estimation error of angle of departure (AoD) and angle of arrival (AoA). A performance measure called the squared position error bound is proposed to characterize localization accuracy [18]. The authors consider the multicarrier signals for source localization in [19]. The time angle of departure (TAoD) estimates is obtained from the cyclic prefixes inserted before each orthogonal frequency division multiplexing (OFDM) data block. These estimates are then used for localization. Indoor radio positioning based on AoD has also been studied in [20] which applies the MUSIC algorithm on the signals from noncoherent antenna arrays, and in [19] the authors consider AoD estimation from channel state information in MIMO-OFDM systems. However, previous studies on massive MIMO communication [21]- [24] mainly focused on single-user channel estimation. The scenario where the base station (BS) estimates multiple user channels at the same time is not considered. The signal processing technology based on tensor can make use of the inherent multi-domain characteristics of the signal in the frequency, time, code and space domains. It obtains the optimal solution through tensor fitting on the basis of decomposition uniqueness [25]- [27]. At the same time, tensor decomposition will not destroy the internal relationship of each element, and can make full use of the spatial structure information of the signal, thereby improving the estimation accuracy. In addition, tensor modeling [28]- [30] benefits from multiple diversity. This feature is helpful to achieve multi-user signal separation, equalization and channel estimation under more relaxed identifiability conditions than the traditional matrix method. It is interesting to note that, low-rank tensor decomposition-aided channel estimation for mmWave MIMO-OFDM systems is developed in [31]. However, for each dimension, one dimensional search is required and the complexity of the spectral search may still be unacceptably high for real-time problems. In [32], a multilinear singular value decomposition (MSVD) method is used for channel parameter estimation. The Tucker form of the MSVD enables paths to be extracted based on signal energy. But this method cannot be used in a multi-user scenario. [33] derive a spatial-frequency channel model that incorporates the multipath parameters. And spatial smoothing method and structured CP decomposition are used for channel estimation.
The method does not benefit much from the phase rarefaction effect due to its frequency-independent property. In [34], the multi-user channel estimation problem was investigated through the lens of Bayesian tensor methods. [35] use a generalization of beamspace-ESPRIT method from matrix to tensor framework. The channel parameters are automatically associated. But the effect of the association is not satisfactory.
In this paper, we consider the tensor dictionary manifold learning for channel estimation and interference elimination of the multi-user mmWave massive MIMO system. With the multi-user mmWave massive MIMO system with a mixed digital-analog structure greatly increasing the system capacity, problems such as inaccurate channel estimation and low resolution for multi-users have emerged. Accurate channel estimation is crucial in multi-user communication, which drives our research and makes us turn to channel estimation of the user group to improve resolution. For multi-user scenarios, we developed a tensor manifold dictionary learning model. The received signals of all users are divided as a third-order low-rank tensor and clustered to form a dictionary learning (DL) model of the received signals of multiple user groups. Through manifold learning, the relationship between adjacent users in the group is analyzed, and the highdimensional channels in the group are embedded in the lowdimensional space. Using the convex relaxation properties of tensor Alternating direction method of multiplier (ADMM), the interference of other user groups after clustering can be eliminated. And with the help of the MUSIC method, the channel parameters of the user group are obtained. A highprecision channel estimation of the user group is realized. This method can perform channel estimation for multiple users with only a few pilots, and improve the performance of the system. Numerical results confirm the good performance of this method.
The remainder of this paper is organized as follows. Section II introduces the notions utilized and preliminary work in the paper. In Section III, the multi-user massive MIMO system and channel model are presented. Section IV describes segmentation and clustering of received signals for dictionary learning. Section V presents the manifold learning method for analyzing the relationship. Section VI gives the tensor ADMM method and the MUSIC method to get the channel estimation of the user groups. Some numerical results are provided in Section VII. Finally, we conclude this paper in Section VIII.

II. NOTIONS AND PRELIMINARY WORK
In this section, we introduce the notions utilized and the preliminary work related to tensors in the paper. More detailed instructions are in [36]- [38].
represent Frobenius norm and expectation.

B. PRELIMINARY WORK
A tensor of order N , corresponding to an N -dimensional data matrix, is denoted as A ∈ R I 1 ×···I n ×···I N . Elements of A are denoted as a i 1 ...i n ...i N , where 1 ≤ i n ≤ I n . The mode-n vectors of an N order tensor A are the I n dimensional vectors obtained from A by varying index i n while keeping the other indices fixed. The matrix A (n) ∈ R I n ×(I 1 ...I n−1 I n+1 ...I N ) is composed by taking the mode-n vectors of A as its columns. This matrix can also be naturally seen as the mode-n flattening of the tensor A. The n-rank of A, denoted as r n , is the dimension of the vector space spanned by the mode-n vectors of A.
The product of two matrices can be generalized to the pro-duct of a tensor and a matrix. The mode-n product of a tensor A ∈ R I 1 ×···I n ×···I N by a matrix B ∈ R J n ×I n , denoted by A × n B, is also an N order tensor C ∈ R I 1 ×···J n ×···I N . The mode-n product C = A × n B can also be calculated by the matrix multiplication C (n) = BA (n) , followed by a re-tensorization of undoing the mode-n flattening. The Frobenius norm of a tensor A is defined as:

III. SYSTEM AND CHANNEL MODEL
In this section, we introduce the multi-user massive mmWave MIMO system and channel model.

A. SYSTEM MODEL
We consider a multi-user mmWave massive MIMO system consisting of a base station (BS) and K multiple mobile stations (MSs), i.e., users. It is depicted in Fig. 1 be the transmission symbol. x k ∈ C N s ×1 represents the transmitted pilot symbol of the kth (k ∈ [1, K ]) MS, which satisfies E x k x H k = I N s P/N s , P is the average transmit power, I N s is an identity matrix with dimension N s ×N s . In the BS, x k is first precoded by the digital precoder F BB = F BB,1 , F BB,2 , . . . , F BB,K ∈ C N t RF ×KN s and then precoded by an analog precoder F RF ∈ C N r RF ×N t , which is implemented by an analog phase shifter array. Since the phase shifters in the analog precoder in Fig. 1 can only change the phase of the transmitted signal, each entry of F RF is of constant modulus. We normalize its entries to satisfy |F RF | = 1 √ N t . In addition, in order to meet the transmit power constraint, F BB is normalized to satisfy F RF F BB 2 = KN s . After the hybrid precoding, the transmitted signal becomes Therefore, the received signal of the kth user is given bȳ where H k ∈ C N r ×N t represents multipath fading wideband mmWave channels which are explained in section III-B, n k ∈ C N r ×1 is the additive complex Gaussian noise at the kth MS and each entry of n k follows the independent and identically distributed (i.i.d.) complex Gaussian distribution with zero mean and variance σ 2 . At the kth MS, firstly the received signals are processed by a RF combiner Q RF,k ∈ C N r ×N r RF which is implemented by analog phase shifters similar to the BS. We also normalize its entries to satisfy Q RF,k = 1/ √ N r . Then followed by a baseband combiner Q BB,k ∈ C N r RF ×N s to detect its symbol. Then, the detection signal of the kth MS is given by where the three consecutive terms respectively represent the kth user's desired signal, interference signals from other VOLUME 10, 2022 users, and additive Gaussian noise. By stacking all {Y k } K k=1 into a long vector, the multi-user signal becomes where

B. CHANNEL MODEL
Measurement campaigns in dense-urban Non line of sight (NLOS) environments reveal that mmWave channels typically exhibit limited scattering characteristics [39]. Also, considering the wideband nature of mmWave channels, we adopt a geometric wideband mmWave channel model with L scatterers between the MS and the BS. The lth (l ∈ [1, L]) scatterer has a time delay τ l . θ l ∈ [0, 2π ] and φ l ∈ [0, 2π ] represent azimuth and elevation angles of departure (AoDs) and arrival (AoAs) at the transceiver. Under this model, the kth MS channel matrix in the delay domain can be written as where α k,l is the complex path gain associated with the lth path. a r θ r,k,l , φ r,k,l ∈ C N r and a t θ t,k,l , φ t,k,l ∈ C N t are the antenna array response vectors of the MS and BS, respectively. θ r,k,l and φ r,k,l are azimuth and elevation AOAs of the lth path, respectively. θ t,k,l and φ t,k,l are the azimuth and elevation AODs of the lth path, respectively. f s denotes the sampling rate. For convenience, (5) can be given by where E k = diag α k,1 e −j2πτ k,1 f s , . . . , α k,L e −j2πτ k,L f s , (7a) A r,k = a r θ r,k,1 , φ r,k,1 , . . . , a r θ r,k,L , φ r,k,L , (7b) The antenna array response vectors of the BS and MS are given by where λ is the carrier wavelength, d indicates the distance between the antennas. Our objective is to estimate the channel matrices and from the received signals {Y i } K i=1 of all users. In particular, we wish to provide a reliable channel estimate by using as few measurements as possible because the number of measurements is linearly proportional to the number of RF chains at the MS, which is expected to be reduced in practice. The mmWave channel measurement results show that the mmWave has a diffuse scattering phenomenon on the surface of the rough scatterer, and the scattering range will increase as the wavelength decreases [39]. For scenarios where users are dense, when there is not enough space between users, diffuse scattering may cause adjacent users to receive signals of the same path. In practice, this scheme is generally applicable to the scenario where the positions of users are randomly distributed. Users in co-channel deployment may experience significant interference from other users. The channel matri- , are characterized by a set of common parameters It can be known from the above analysis that the channel matrices contain the spatial characteristics of each transmitter, and the azimuth Note that the problem of single-user mmWave channel estimation has been studied in [40], [41]. Specifically, to estimate the downlink channel, the BS employs M different beamforming vectors at M successive time frames, and at each time frame, the MS uses Q combining vectors to detect the signal transmitted over each beamforming vector. By exploiting the sparse scattering nature of mmWave channels, the problem of estimating the mmWave channel can be formulated as a sparse signal recovery problem and the training overhead can be considerably reduced. The above method can also be used to solve our uplink channel estimation problem if channels from users to the BS are estimated separately. Nevertheless, we will show that a joint estimation (of multi-user channels) scheme may lead to an additional training overhead reduction.

IV. SEGMENTATION AND CLUSTERING
In recent years, tensor has been proved to be an effective analysis tool in multi-user massive MIMO. To solve the problem of high complexity of channel estimation calculation in massive MIMO systems, this paper exploits the tensor strategy to analyze the received signal. We use the two-dimensional spatial and user domain of the signal to analysis and cluster, and then combine similar channel characteristics into sparse and low-rank tensor decomposition models to effectively detect and separate multi-user information of mixed signals. Dictionary learning can express the multi-source localization (MSL) problem as joint parameter dictionary learning (PDL) and sparse signal recovery (SSR). It is also possible to build a dictionary and sparse coding on the sub-band spectrum to improve the signal recognition ability. The wrong channel model can also be embedded in dictionary learning to improve accuracy [42]- [44]. After users are clustered into groups, we build a dictionary of user channel information in the user group. Then we can estimate the user group channel more efficiently.
Substituting the channel representation from (6) into (4), the multi-user signal becomes where [36]. N i ∈ C N r ×N t ×K is additive Gaussian noise tensor.
After constructing the received signals of all users into a tensor form, we perform convolutional segmentation and grouping of the received signals to prepare for the following channel estimation of the user group (Fig. 2).

A. CONVOLUTIONAL SEGMENTATION OF RECEIVED SIGNALS
In the received signal of Y, the 3D cube in the form of third-order tensor contains rich multi-dimensional structure information. Because different sources have different spatial characteristics, multi-source multipath grouping can be achieved by estimating the parameters of the spatial characteristics.
The received signal of users can be regarded as three dimensional tensor Y ∈ C N r ×N t ×K of transmitting antenna spatial characteristics, receiving antenna spatial characteristics and users. First, the received signal Y is divided into overlapping three-dimensional sub-tensors, where the sliding window step size ω = 1. Therefore, in the spatial dimension of the receiving antenna, the receiving antenna array is divided into (N r − n r + 1) sub-array blocks, and each sub-array element is n r . In the spatial dimension of the transmitting antenna, all transmitting antenna arrays are divided into (N t − n t + 1) sub-array blocks, each subarray element is n t . In the user dimension, all users are divided into (K − g + 1) groups. By convolutionally dividing the two-dimensional space of the signal tensor, we obtain (N r − n r + 1) (N t − n t + 1) sub-tensors. Subtensor Y ∈ C n r × n t × g of received signal with multidimensional space information is expressed as: . . , p + n r , n t = p, . . . , p + n t , g = g , . . . , g + g. Y represents the subtensor after convolutional segmentation. N g represents white noise tensor.
DL has been effectively applied to multi-user interference canceling and channel denoising by considering the nonlocal similarity property of users [45]. The basic idea is to firstly cluster the similar users into groups. and then to encourage each group share similar atoms in the dictionary. The DL model can be extended to the channel estimation of the user group. First, we construct user group patches for channel estimation of the user group as follows.
Receiving signal of all users with N r × N t spatial dimension and K users can be expressed as a three-order tensor Y ∈ C N r ×N t ×K . By sweeping all across the receiving signal of all users with overlaps, we can build a user group of patches Y (p,q,g) denotes the patch number. Each patch constructed in this way contains two spatial dimension and a user dimension. This can easily help us consider an important attribute of all users: the relevance of users with similar channels.
Based on this patch set {Y i } I i=1 , the user DL model can then be constructed to calculate the spatial and user dictionaries A n r ,g ∈ C n r ×L , A n t ,g ∈ C n t ×L , x g ∈ C g×K , implying the redundancy of these dictionaries, as follows: where E g ∈ C L×L×K corresponds to the coefficient tensor for Y which governs the affiliated interaction between the dictionaries.

B. CLUSTERING OF RECEIVED SIGNALS
We use the idea of dictionary learning to estimate the channel of the user group, that is, use the correlation of users with similar channel characteristics. Therefore, users with similar channel characteristics need to be clustered into groups. The established user channel model is based on a geometric stochastic channel model, using the concept of scattering clusters, which contains many stochastically varying multipath components. Scattering clusters can be used to cluster user groups. The starting point is a large number of multidimensional parametric channel estimation data, obtained from MIMO measurements. It has been investigated that these parameters appear in clusters [46]- [48], i.e. in groups of multipath components (MPCs) with similar parameters, such as delay, AoAs and AoDs. We show that using the wellknown K-means clustering algorithm [49] with multipath component distance (MCD) as distance measure improves performance considerably. VOLUME 10, 2022 K-means algorithm identifies each cluster by its centroid position in the parameter space, then it is assumed that there are G centroids. Each MPC is assigned to the cluster centroid with the smallest distance. The algorithm repeatedly optimizes the positions of the centroids to minimize the total distance from each MPC to its centroid, as: where x g denotes the parameter vector of gth g ∈ [1, . . . , G] MPC, c x g denotes the parameters of cluster centroid closest to the gth MPC, and d (·) represents the distance function between any two points in the parameter space. Distance measures: It was customary [49] to calculate the distance for each dimension (delay, AoAs, AoDs) separately. Clustering was subsequently done either sequentially (first delay domain, then subsequently clustering AoAs and AoDs) or jointly. We demonstrate that joint clustering is more promising as the cluster structure in the data is more visible in high-dimensional spaces, but the data in different dimensions (coming even in different units) has to be scaled. To identify clusters correctly, we use the multipath component distance (MCD) [46]. This metric scales the data to enable joint clustering. To assess the performance, we compare the squared Euclidean distance with the MCD.
The squared Euclidean distance between any two estimated MPCs i and j is given by The metric is extended for AoAs and AoDs to cope with the angular periodicity, according to where pv 2 (·, π) maps (·) to its principal value in the interval [−π, π). The metric reads similar for the AoDs. This metric is only useful for one-dimensional clustering, as it does not scale the data. Assuming that subtensor (p, q) th and subtensor (u, v) th have L (p,q) paths and L (u,v) paths, respectively, the user dimension is kept fixed at this time.
The distance of the AoA for the lth path is defined as: The distance of the AoD for the lth path is defined as: The delay distance for the lth path is defined as: where η is the scale factor that adjusts the delay weight in the distance function. η is a suitable delay scaling factor to give the delay more 'importance' when necessary. τ l is the range of delay, and τ l = max (p,q),(u,v) τ . τ sd is the standard deviation of delay.
According to (15), (16), and (17), the multi-path component distance metric between the two subtensors is (18), as shown at the bottom of the page.
In (18), the data can be scaled appropriately to conform to the MCD metric. L is the maximum number of paths in the two subchannel tensors, that is L = max L (p,q) , L (u,v) , when L (p,q) < L or L (u,v) < L, the missing bits are filled with zero.
Our data set is which is the MPC parameter group obtained from the (N r − n r + 1) (N t − n t + 1) (K − g + 1) subtensors. The clustering process is as follows: 1. Randomly initialize the group centroid µ 1 , µ 2 , . . . , µ g ⊂ , that is independently select G centroids from the data set as the clustering center of the subtensor. : arg min where g = 1, 2, . . . , G, c (i) represents the set index of the ith subtensor and gth group centroid cluster. d MCD x MPC i , µ g is the multipath component distance between the MPC parameter sample x MPC i of ith subtensor and centroid µ g . For specific solutions, see (15) and (16), (17) and (18).
3. Update the centroid of the cluster. The centroid of the gth group is update to: 4. Repeat steps 2 and 3 until convergence, namely the cluster centroid position remains unchanged. 5. Output a set of G group subtensors. The K-means algorithm iteratively optimizes the position of group centroid. Its clustering algorithm performance can be measured by the total MPC distance from each subtensor to its centroid, namely the total MCD square sum of all objects in the group to the group centroid, which is defined as: where d MCD x MPC i , µ c (i) represents the updated MCD of group centroid µ c (i) and objects x MPC i , and x MPC i ∈ µ c (i) . According to the above analysis and processing, users with similar channel characteristics are clustered into user groups. The number of user groups is G, assuming that the number of users in the gth g ∈ [1, . . . , G] user group is M g .
The received signal in gth group can be given bỹ where three consecutive terms represent the signal tensor of the gth user group, the signal tensor interference of other user groups, and the additive white Gaussian noise tensor. After clustering users with similar channel characteristics into groups. Y i g M g i=1 (g = 1, 2, . . . G) denotes the signals of all users in the group, where G is the group number, M g is the user number in the gth group and i g denotes the i user in the gth cluster. And then we attempt to enforce each group share the similar atoms in each of the spatial dictionaries A n r ,g , A n t ,g and user pilots x g . For convenience we combine the subtensors in the gth group together to formulate a fourorder tensor:Ỹ g ∈ C n r ×n t ×G×M g . The supplemental fourth dimension corresponds to the number of users in each group.
Analogously, we align all coefficient tensors corresponding to the gth user group to formẼ g ∈ C n r ×n t ×G×M g . Then the aim of the user tensor DL can be attained by the following group-block-sparsity regularizer.
Definition 1(Group-Block-Sparsity): For a coefficient ten-sorẼ g ∈ C n r ×n t ×G×M g , its group-block-sparsity with respect to the spatial and user mode is Note that the group-sparsity [45] can be seen as the degenerated case of the group-block-sparsity in user groups. Furthermore, when we set ψ = 1 (meaning only one user in a cluster), the group-block-sparsity so defined exactly corresponds to the concept of block sparsity proposed in [6], which has been substantiated to be capable of enhancing better recovery of the original high order signals since it implicitly incorporates valuable prior information on real signals and facilitates making full use of the dictionary atoms of each mode in signal representation.
Then we can construct the following DL model: The group-block-sparsity ofẼ g guarantees that each group Y g shares r MS g , r BS g , r s g atoms of the dictionariesÃ n r ,g ,Ã n t ,g , x g respectively, and thus the nonlocal similarity among these cluster samples can then be implied.

V. THE MANIFOLD LEARNING
We know from the previous section that there are M g users in gth group. Since the subtensors with similar channel characteristics in each group are clustered together, the subtensors in the same group have the strongest correlation. But the real relationship between users is not known, and the group tensor dimension at this time is high, which is inconvenient for group tensor channel estimation Let H M g denote the channel matrix between the BS and the M g user in group g. m g 1 ≤ m g ≤ M g denotes the m g th user in group g. Let's denote the matrix corresponding to the gth clusterỸ g as H g = H 1 , . . . , H m g , . . . , H M g ∈ C n r ×n t ×G×M g .
Manifold learning can reduce the dimensionality of highdimensional data [50]. The basic steps of local linear embedding (LLE) algorithm show that the linear combination relationship can only play a role near the neighborhood [51]. The basic steps of LLE algorithm are illustrated in Fig. 3. The high-dimensional receive data can be embedded in subspaces while retaining the geometric properties of the underlying channel manifold. When the low-dimensional VOLUME 10, 2022 receive data manifold are reconstructed, the subtensors receive data maintain the same local neighbor relationship in the corresponding intrinsic low-dimensional space. So that the high-dimensional receive data can be mapped to the globally unique low-dimensional coordinate system. Finally, the global multi-user high-dimensional receive data nonlinearity is transformed into local linearity to achieve dimensionality reduction. The inter-group interferences existing in (22) can be eliminated by the ADMM scheme [52], which will be explained in Section VI. According to the formula, the signal of any user of the same group can be approximated by the approximate linear combination of other nearby M g −1 users. The channel coefficients between the M g users of the group, H m g in (24) can be generalized as: H m g ≈ w m g 1 H 1 + · · · + · · · w m g n H n + · · · + w m g M g H M g , where w m g n is the channel correlation coefficient between the user m g and its neighbor n. In order to calculate the reconstruction weight coefficient w m g n M g m g =n,n=1 between the mth user and its adjacent user, the linear combination effect is optimal. Each high-dimensional channels H m g and its neighbor high-dimensional channels H M g are located in a linear or nearly linear local neighborhood of the manifold by manifold learning. The high-dimensional channels H m g in the neighborhood can be represented by its neighbor high-dimensional channels H M g . When the low-dimensional channel manifolds are reconstructed, the channels maintain the same local neighbor relationship in the corresponding intrinsic low-dimensional space. The objective function must be constructed and the error value of (24) is minimized: where, from the constraint conditions, 0 < w m g n , w m g n < 1 |n , n ∈ M g are any two weight coefficients corresponding to H n and H n respectively. Let α ∈ R, 0 < α < 1, 0 < αw m g n + (1 − α) w m g n < 1 n, n ∈ M g . Therefore, the constraint set of the weight coefficient is a convex. If the objective function ε (w) satisfies the properties of convex function, the constraint condition is a convex set. In order to achieve dimensionality reduction, the data of the high-dimensional space can be mapped to a lower-dimensional space by using the LLE method. Proof: Assuming any two points on the set w m g 1 , w m g 2 , α ∈ (0, 1), then as shown in (26).
Therefore, the objective function (25) is a convex under the constraint condition. In this case, the solution of the reconstruction weight coefficient can be transformed into a convex optimization problem, that is, the solution of the least convex optimization problem, that is, the solution of the least square problem under two constraints. Assuming that the channel matrices H m g , H n , H n of the gth group are all known, a local covariance matrixR m n,n is constructed: Combined with the constraint n∈M g w m g n = 1, the minimum matrix problem of the objective function is solved. At this time, w m g n has a closed solution, which is expressed as: After the LLE dimensionality reduction method, the projection in the low-dimensional space of the input channel matrix H m g of a user m g in the group g isH m g . And H n , H n the corresponding projections areH n ,H n , respectively.H m g can be reconstructed by the linear combination approximation: H m g ≈ w m g nH n + w m g n H n + · · · + w m g M gH M g , Therefore, the low-dimensional space representationH m g corresponding to H m g can be solved by the following kernel function as (30). In (30),H is composed of allH m g column vectors, I is an identity matrix with dimension M g × M g . I m g is the m g th column of I. And w m g denotes the m g th column of the matrix W g , with dimension ε H = min M g × M g , is given by: Since the relationship among the users in the group is known, then we can reconstruct all the users in the group into a new group tensor, and the gth group can be written aŝ whereŶ g ∈ C N t ×N r ×M g is the received signal of the gth group after the correlation coefficient of the channel matrix is known.

VI. CHANNEL ESTIMATION OF THE USER GROUPS
The tensor analysis method is used to analyze all users at the same time to improve work efficiency. But from (32), it can be seen that the desired signal of user groups is still interfered by other group tensors, so we use the tensor ADMM method to eliminate it. Then the tensor-MUSIC method is used to get our estimated channel parameters of user groups.

A. THE ADMM FOR ELIMINATING INTERFERENCE
For the convenience of calculation, we temporarily simplify (32) as followsŶ wherê (34d) X , Z, N represent the desired signal of gth user group, interference signals from other user groups and additive Gaussian white noise.
Our object is to obtain the desired signal of gth user group from (33). Without considering Gaussian noise, Lu et.al. [53] used tensor robust principal component analysis to restore low-rank tensors through convex optimization, is given by where ρ 2 = 1/ max (N t , N r ) M g [54], (35) can be solved by polynomial-time algorithms, e.g., the standard ADMM. We follow this convex optimization scheme to eliminate interference from other user groups and noise. In this paper, the eliminating interference is formulated as where λ 1 , λ 2 are the optimized values to achieve a satisfactory recovery result of the user group signal. The constrained problem defined in (36) can be addressed by a quadratic penalty approach, i.e., by solving min X ,Z,N where β is a factor that guarantees the convergence of the algorithm. The solution of (37) can be approached by alternating this minimization with respect variables X , N and Z. However, the intermediate minimization becomes increasingly illposed when β becomes large. The augmented Lagrangian method (ALM) provides another term to mimic Lagrange multiplier and to overcome the ill-posed problem caused by large value of β. The augmented Lagrangian function for problem defined in (36) is where 1 is the Lagrangian multipliers. ALM is used to minimize L (·) with respect to variables X , N , and Z while keeping 1 fixed. It then updates 1 according to the following rule: The ADMM method uses partial updates by keeping other variables fixed each time. Using the ADMM framework for (38), we can update the variables X , N , and Z in the VOLUME 10, 2022 (σ + 1) th iteration, by alternately minimizing the following function while keeping the value of 1 fixed at σ In the (σ + 1) th iteration, X σ +1 , N σ +1 , Z σ +1 and σ +1 1 are updated as follows: For the updating of X σ +1 ,we have The updating of X σ +1 has a closed-form solution.
For the updating of Z σ +1 ,we have Equation (42) can be solved by a soft-shinkage operator where shrink (·, ·) is an elementwise soft-shrinkage operator.
For each element a of tensor Y − X σ +1 The soft-shrinkage operator shrink (·, ·) is the proximity operator of the l 1 -norm. For N σ +1 , we have Similarly, we can obtain the closed-form solution for the updating of N σ +1 The update of σ +1 1 can be formulated as With a fixed β, ADMM converges slowly. Consequently, an adaptive updating strategy for the penalty parameter is adopted where β max is the upper bound for β, and η is the adaptive parameter. (49) where, ς is an infinitesimal number, e.g., 10 −6 .

B. THE STOPPING CRITERION FOR ADMM IS
After the ADMM method, interference signals from other user group tensors have been eliminated. The desired signal of gth user group is given bȳ

C. THE CHANNEL MATRIX AND SIGNAL MATRIX
After the tensor ADMM method, interference signals from other user group tensors have been eliminated.
Due to the sparsity of the antenna sub-array block, the ULA antenna has higher resolvability. In this paper, the spatial-multiple signal classification (SMUSIC) of spatialmultiple signal classification (SMUSIC) combined with angle delay estimation is derived by deriving the time and space covariance matrix. And Time-Multiple Signal Classification (TMUSIC) algorithm [55], using the orthogonality between the signal and noise subspace, MUSIC needs to perform a full-spectrum search in space and time to accurately obtain the path Angle and delay information. The actual wireless communication system has a pilot sequence embedded in its frame, which is used as a reference symbol for channel estimation, the pilot symbol occupies some subcarriers, and the rest is used for data transmission. In order to obtain the best estimate, it is assumed that each user in the network sends an orthogonal training symbol set.
According to the channel model in section III.Ȳ g in (50) is given bȳ g,lQr,g,l θ r,g,l , φ r,g,l •F t,g,l θ t,g,l , φ t,g,l • e −j2πτ g,l f sx g W g + N g (51) whereQ r,g,l θ r,g,l , φ r,g,l = Q H gâ r θ r,g,l , φ r,g,l , (52a) When the MUSIC algorithm is used to jointly estimate the AoA and AoD of MPC, the steering vector involving multiple parameters can be expressed aŝ C g θ r,g,l , φ r,g,l , θ t,g,l , φ t,g,l =Q r,g,l θ r,g,l , φ r,g,l •F t,g,l θ t,g,l , φ t,g,l . (53) The delay component in (51) is given bỹ whereĝ τ g,l = e −j2πτ g,l f s .
In practical applications, according to (59), the spatial covariance matrix corresponding to the tensor can be expressed as follows: where H . In (60), the spatial covariance matrix R s and spatial feature matrix C g θ r,g , φ r,g , θ t,g , φ t,g share the same column space. Similarly, the covariance matrix R t of the delay domain of tensor is: In (61), after excluding the noise subspace, the delay domain covariance matrix R t and the delay domain feature matrixG (τ ) share the same column space.
Eigenvalue decomposition of the spatial covariance matrix R s and the time-domain covariance matrix R t respectively: where the column vectors R s of U s,x are the eigenvectors of the signal subspace in the space domain, the column vectors R t of U t,x are the eigenvectors of the signal subspace in the space domain. U s,n and U t,n are the noise subspaces supplemented orthogonally by U s,x and U t,x column spaces, respectively.
Using the orthogonality of the signal subspace and noise subspace in the space and delay domain, the zero-spectral functions of S-MUSIC and T-MUSIC are respectively defined as (64) and (65), shown at the bottom of the page.
Estimate the angle and delay information by searching the peaks of the following spatial and delay spectra (66) and (67), as shown at the bottom of the page, where P s,n = U s,n U s,n H , P t,n = U t,n U t,n H .

A. COMPUTATIONAL COMPLEXITY
In our algorithm, the complexity is dominated by the construction of C g θ r,g , φ r,g , θ t,g , φ t,g andG g (τ ). Our algorithm is a gridless search with the complexity of P s θ r,g , φ r,g , θ t,g , φ t,g = 1 C g θ r,g , φ r,g , θ t,g , φ t,g H U s,n U s,n H C g θ r,g , φ r,g , θ t,g , φ t,g , .
(65) θ r,g ,φ r,g ,θ t,g ,φ t,g = arg max (θr,g,φr,g,θt,g,φt,g) 1 C g θ r,g , φ r,g , θ t,g , φ t,g H P s,n C g θ r,g , φ r,g , θ t,g , φ t,g , VOLUME 10, 2022 O (N t N r KG). As a comparison, the computational complexity of the IP-OMP [17] is of order O N t N r K + N x N y N z where N x N y N z denotes the number of columns of the overcomplete dictionary. The basic algorithm is the Alternating Least Squares (ALS) in MSVD [32], which has a complexity of the order O 3L 2 G + L 3 (N t + N r + K ) per iteration. The computational complexity of LRTD [31] is O N t N r K N x + N y + N z . In each iteration, the computational 3 where q is the number of iterations at convergence.

B. SIMULATION RESULTS
In this section, we present simulation results to examine the estimation performance of the proposed scheme. We use the antenna distance of the conventional antenna array. The distance between neighboring antenna elements is assumed to be half the wavelength of the signal. Also, in our experiments, the precoding matrix F and the combining matrixn Q are randomly generated with their entries uniformly chosen from a unit circle. Typical values are set as follows: the antenna array sizes for symbol transmission are set as N t = 64, N r = 32. the number of RF chains are set as N t RF = 12, N r RF = 6. the number of paths is set equal to L = 4; the number of subcarriers for transmitting pilot symbols is set as K 0 = 128; the sampling rate is set to f s = 0.32GHz. All simulation results represent an averaged over 5000 independent Monte Carlo runs. The MATLAB Tensor Toolbox [56] is used.
We first examine the estimation accuracy of the channel parameters θ r,g , φ r,g , θ t,g , φ t,g , τ g by the mean square error (MSE). To provide a benchmark, we derive the CRB of each parameter as a lower bound of unbiased estimators [57]. Fig. 4, Fig. 5 and Fig. 6 plot the MSE and CRB curves versus the receiving signal-to-noise ratio (SNR), and SNR is given by   The proposed method is compared with IP-OMP [17], LRTD [31] and SCPD [33]. Our method is abbreviated as TDML. It indicates that the performance of TDML continues to improve as SNR increases. Moreover, TDML consistently outperforms other algorithms, especially in terms of AoA and AoD. In terms of the estimation accuracy of time delay, the accuracy is slightly lower than SCPD at low signal-to-noise ratio. But as the SNR increases, the accuracy is better than SCPD.
Then, we focus on the overall estimation accuracy of channel matrices measured by the normalized MSE (NMSE) and NMSE is defined as where H g denotes the channel matrix associated with the user group, andĤ g is its estimate. R is number of monte carlo trials. In Fig. 7, our proposed method is compared with IP-OMP [17], MSVD [32], BCD-R [34], LRTD [31], and TS [35].  we set N t = 64, N r = 32, and the user K = 128. The results show that as the SNR increases, the performance of TDML has been improved and is better than other algorithms. This is mainly because other algorithms are limited by the complexity of multi-user high-dimensional channels. BCD-R and IP-OMP use a non-standard tensor decomposition model and the grid search method, so their performance is slightly lower than our solution. Our algorithm clusters users with similar channel characteristics into user groups, and uses manifold learning to reduce dimensionality. Therefore, the performance is higher than other algorithms. Fig. 8 plots the NMSE curves versus the number of transmit antennas. Fig. 9 plots the NMSE curves versus the number of receive antennas. The SNR is set to 30dB, and the user K is set to 128. It illustrates all the other methods can achieve enhanced estimation accuracy against the increasing antennas. With the increase of antennas, the beamforming of other algorithms and the reception of MS become better. Their performance is gradually improving. However, the dictionary constructed by our proposed algorithm after convolutional segmentation and clustering is more complete.  This is mainly due to the fact that the two dimensions of the tensor model we build are the transmitting antenna and the receiving antenna. With the increase of antennas, our algorithm uses a large number of antennas for multiple low-dimensional tensor dictionary manifolds after the segmentation of the antennas and the subsequent clustering of similar channels and the learning of manifolds. This allows our algorithm to improve its performance as the number of antennas increases. The accuracy of channel estimation is better than other algorithms.
In Fig. 10, we plot the NMSEs of respective methods vs. and the user K , where the SNR is set to 10dB. With the increase of users, the performance of LRTD and MSVD hardly changes. The performance of TS and BCD-R continues to decline. With the increase of users, the performance of our solution gradually declines. But it is always better than other solutions. When the number of users reaches a certain level, our algorithm stabilizes. This can be explained by the fact that our algorithm is user group specific and uses a manifold learning approach for dimensionality reduction of high dimensional channels. The performance of other algorithms VOLUME 10, 2022 is limited mainly by the nonlinear low resolution of the highdimensional channel for multiple users as the number of users increases. In a dense area of users, there are more similar channels between users, and our performance is better.
In Fig. 11 the comparison of the computational complexity is performed by comparing the run time of different algorithms. The user K is set to 128. The run time is chosen because iterative algorithms are used to achieve a tensor decomposition in MSVD, TS, LRTD, and BCD-R. As seen from the figure, the proposed TDML channel estimation algorithm has the lowest complexity. TS and MSVD yield a much higher complexity especially in the low to medium SNR regime because the tensor decomposition converges slowly.

VIII. CONCLUSION
In this paper, we propose a tensor dictionary manifold learning method for channel estimation and interference elimination of the multi-user mmWave massive MIMO system. This method uses user groups with similar channels to build tensor manifold dictionary learning of channel information. Specifically, the multi-user signal at the user end is established as a third-order tensor. We prove that users with similar channel characteristics can build a tensor pop dictionary of user group channels. Research on the extraction of channel information shows that even a few pilots can obtain the time delay and angle of the user group. We compare our proposed method with the channel estimation method based on the tensor method. The simulation results show that compared with other methods, this method has obvious performance advantages in terms of estimation accuracy.
In future work we will optimize the clustering of user groups and the method of building dictionaries, as this will directly affect the accuracy of our channel estimation for user groups. In addition, we may also consider more diverse user group scenarios, such as dense groups of people in small areas or distant groups of people in large areas. A more optimal manifold approach will be considered as users are constantly on the move. YANG WANG received the Ph.D. degree in information and communication engineering from the Shanghai Institute of Microsystems and Information Technology, Chinese Academy of Sciences, in 2015. Her current research interests include electromagnetic inverse scattering for imaging and computational processing.